Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance, 3rd Edition 9781789131499, 1789131499

Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra Key FeaturesW

118 77 11MB

English Pages 348 [338]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Title Page
Copyright and Credits
Packt Upsell
Foreward
Contributors
Table of Contents
Preface
Chapter 1: Quick Start
Introduction to Cassandra
High availability
Distributed
Partitioned row store
Installation
Configuration
cassandra.yaml
cassandra-rackdc.properties
Starting Cassandra
Cassandra Cluster Manager
A quick introduction to the data model
Using Cassandra with cqlsh
Shutting down Cassandra
Summary
Chapter 2: Cassandra Architecture
Why was Cassandra created?
RDBMS and problems at scale
Cassandra and the CAP theorem
Cassandra's ring architecture
Partitioners
ByteOrderedPartitioner
RandomPartitioner
Murmur3Partitioner
Single token range per node
Vnodes
Cassandra's write path
Cassandra's read path
On-disk storage
SSTables
How data was structured in prior versions
How data is structured in newer versions
Additional components of Cassandra
Gossiper
Snitch
Phi failure-detector
Tombstones
Hinted handoff
Compaction
Repair
Merkle tree calculation
Streaming data
Read repair
Security
Authentication
Authorization
Managing roles
Client-to-node SSL
Node-to-node SSL
Summary
Chapter 3: Effective CQL
An overview of Cassandra data modeling
[Cassandra storage model for versions 3.0 and beyond]
Cassandra storage model for versions 3.0 and beyond
Data cells
cqlsh
Logging into cqlsh
Problems connecting to cqlsh
Local cluster without security enabled
Remote cluster with user security enabled
Remote cluster with auth and SSL enabled
Connecting with cqlsh over SSL
Converting the Java keyStore into a PKCS12 keyStore
Exporting the certificate from the PKCS12 keyStore
Modifying your cqlshrc file
Testing your connection via cqlsh
Getting started with CQL
Creating a keyspace
Single data center example
Multi-data center example
Creating a table
Simple table example
Clustering key example
Composite partition key example
Table options
Data types
Type conversion
The primary key
Designing a primary key
Selecting a good partition key
Selecting a good clustering key
Querying data
The IN operator
Writing data
Inserting data
Updating data
Deleting data
Lightweight transactions
Executing a BATCH statement
The expiring cell
Altering a keyspace
Dropping a keyspace
Altering a table
Truncating a table
Dropping a table
Truncate versus drop
Creating an index
Caution with implementing secondary indexes
Dropping an index
Creating a custom data type
Altering a custom type
Dropping a custom type
User management
Creating a user and role
Altering a user and role
Dropping a user and role
Granting permissions
Revoking permissions
Other CQL commands
COUNT
DISTINCT
LIMIT
STATIC
User-defined functions
cqlsh commands
CONSISTENCY
COPY
DESCRIBE
TRACING
Summary
Chapter 4: Configuring a Cluster
Evaluating instance requirements
RAM
CPU
Disk
Solid state drives
Cloud storage offerings
SAN and NAS
Network
Public cloud networks
Firewall considerations
Strategy for many small instances versus few large instances
Operating system optimizations
Disable swap
XFS
Limits
limits.conf
sysctl.conf
Time synchronization
Configuring the JVM
Garbage collection
CMS
G1GC
Garbage collection with Cassandra
Installation of JVM
JCE
Configuring Cassandra
cassandra.yaml
cassandra-env.sh
cassandra-rackdc.properties
dc
rack
dc_suffix
prefer_local
cassandra-topology.properties
jvm.options
logback.xml
Managing a deployment pipeline
Orchestration tools
Configuration management tools
Recommended approach
Local repository for downloadable files
Summary
Chapter 5: Performance Tuning
Cassandra-Stress
The Cassandra-Stress YAML file
name
size
population
cluster
Cassandra-Stress results
Write performance
Commitlog mount point
Scaling out
Scaling out a data center
Read performance
Compaction strategy selection
Optimizing read throughput for time-series models
Optimizing tables for read-heavy models
Cache settings
Appropriate uses for row-caching
Compression
Chunk size
The bloom filter configuration
Read performance issues
Other performance considerations
JVM configuration
Cassandra anti-patterns
Building a queue
Query flexibility
Querying an entire table
Incorrect use of BATCH
Network
Summary
Chapter 6: Managing a Cluster
Revisiting nodetool
A warning about using nodetool
Scaling up
Adding nodes to a cluster
Cleaning up the original nodes
Adding a new data center
Adjusting the cassandra-rackdc.properties file
A warning about SimpleStrategy
Streaming data
Scaling down
Removing nodes from a cluster
Removing a live node
Removing a dead node
Other removenode options
When removenode doesn't work (nodetool assassinate)
Assassinating a node on an older version
Removing a data center
Backing up and restoring data
Taking snapshots
Enabling incremental backups
Recovering from snapshots
Maintenance
Replacing a node
Repair
A warning about incremental repairs
Cassandra Reaper
Forcing read repairs at consistency – ALL
Clearing snapshots and incremental backups
Snapshots
Incremental backups
Compaction
Why you should never invoke compaction manually
Adjusting compaction throughput due to available resources
Summary
Chapter 7: Monitoring
JMX interface
MBean packages exposed by Cassandra
JConsole (GUI)
Connection and overview
Viewing metrics
Performing an operation
JMXTerm (CLI)
Connection and domains
Getting a metric
Performing an operation
The nodetool utility
Monitoring using nodetool
describecluster
gcstats
getcompactionthreshold
getcompactionthroughput
getconcurrentcompactors
getendpoints
getlogginglevels
getstreamthroughput
gettimeout
gossipinfo
info
netstats
proxyhistograms
status
tablestats
tpstats
verify
Administering using nodetool
cleanup
drain
flush
resetlocalschema
stopdaemon
truncatehints
upgradeSSTable
Metric stack
Telegraf
Installation
Configuration
JMXTrans
Installation
Configuration
InfluxDB
Installation
Configuration
InfluxDB CLI
Grafana
Installation
Configuration
Visualization
Alerting
Custom setup
Log stack
The system/debug/gc logs
Filebeat
Installation
Configuration
Elasticsearch
Installation
Configuration
Kibana
Installation
Configuration
Troubleshooting
High CPU usage
Different garbage-collection patterns
Hotspots
Disk performance
Node flakiness
All-in-one Docker
Creating a database and other monitoring components locally
Web links
Summary
Chapter 8: Application Development
Getting started
The path to failure
Is Cassandra the right database?
Good use cases for Apache Cassandra
Use and expectations around application data consistency
Choosing the right driver
Building a Java application
Driver dependency configuration with Apache Maven
Connection class
Other connection options
Retry policy
Default keyspace
Port
SSL
Connection pooling options
Starting simple – Hello World!
Using the object mapper
Building a data loader
Asynchronous operations
Data loader example
Summary
Chapter 9: Integration with Apache Spark
Spark
Architecture
Installation
Running custom Spark Docker locally
Configuration
The web UI
Master
Worker
Application
PySpark
Connection config
Accessing Cassandra data
SparkR
Connection config
Accessing Cassandra data
RStudio
Connection config
Accessing Cassandra data
Jupyter
Architecture
Installation
Configuration
Web UI
PYSpark through Juypter
Summary
Appendix: References
Chapter 1 – Quick Start
Chapter 2 – Cassandra Architecture
Chapter 3 – Effective CQL
Chapter 4 – Configuring a Cluster
Chapter 5 – Performance Tuning
Chapter 6 – Managing a Cluster
Chapter 7 – Monitoring
Chapter 8 – Application Development
Chapter 9 – Integration with Apache Spark
Other Books You May Enjoy
Index

Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance, 3rd Edition
 9781789131499, 1789131499

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Recommend Papers