Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance, 3rd Edition
9781789131499, 1789131499
Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra Key FeaturesW
Table of contents : Cover Title Page Copyright and Credits Packt Upsell Foreward Contributors Table of Contents Preface Chapter 1: Quick Start Introduction to Cassandra High availability Distributed Partitioned row store Installation Configuration cassandra.yaml cassandra-rackdc.properties Starting Cassandra Cassandra Cluster Manager A quick introduction to the data model Using Cassandra with cqlsh Shutting down Cassandra Summary Chapter 2: Cassandra Architecture Why was Cassandra created? RDBMS and problems at scale Cassandra and the CAP theorem Cassandra's ring architecture Partitioners ByteOrderedPartitioner RandomPartitioner Murmur3Partitioner Single token range per node Vnodes Cassandra's write path Cassandra's read path On-disk storage SSTables How data was structured in prior versions How data is structured in newer versions Additional components of Cassandra Gossiper Snitch Phi failure-detector Tombstones Hinted handoff Compaction Repair Merkle tree calculation Streaming data Read repair Security Authentication Authorization Managing roles Client-to-node SSL Node-to-node SSL Summary Chapter 3: Effective CQL An overview of Cassandra data modeling [Cassandra storage model for versions 3.0 and beyond] Cassandra storage model for versions 3.0 and beyond Data cells cqlsh Logging into cqlsh Problems connecting to cqlsh Local cluster without security enabled Remote cluster with user security enabled Remote cluster with auth and SSL enabled Connecting with cqlsh over SSL Converting the Java keyStore into a PKCS12 keyStore Exporting the certificate from the PKCS12 keyStore Modifying your cqlshrc file Testing your connection via cqlsh Getting started with CQL Creating a keyspace Single data center example Multi-data center example Creating a table Simple table example Clustering key example Composite partition key example Table options Data types Type conversion The primary key Designing a primary key Selecting a good partition key Selecting a good clustering key Querying data The IN operator Writing data Inserting data Updating data Deleting data Lightweight transactions Executing a BATCH statement The expiring cell Altering a keyspace Dropping a keyspace Altering a table Truncating a table Dropping a table Truncate versus drop Creating an index Caution with implementing secondary indexes Dropping an index Creating a custom data type Altering a custom type Dropping a custom type User management Creating a user and role Altering a user and role Dropping a user and role Granting permissions Revoking permissions Other CQL commands COUNT DISTINCT LIMIT STATIC User-defined functions cqlsh commands CONSISTENCY COPY DESCRIBE TRACING Summary Chapter 4: Configuring a Cluster Evaluating instance requirements RAM CPU Disk Solid state drives Cloud storage offerings SAN and NAS Network Public cloud networks Firewall considerations Strategy for many small instances versus few large instances Operating system optimizations Disable swap XFS Limits limits.conf sysctl.conf Time synchronization Configuring the JVM Garbage collection CMS G1GC Garbage collection with Cassandra Installation of JVM JCE Configuring Cassandra cassandra.yaml cassandra-env.sh cassandra-rackdc.properties dc rack dc_suffix prefer_local cassandra-topology.properties jvm.options logback.xml Managing a deployment pipeline Orchestration tools Configuration management tools Recommended approach Local repository for downloadable files Summary Chapter 5: Performance Tuning Cassandra-Stress The Cassandra-Stress YAML file name size population cluster Cassandra-Stress results Write performance Commitlog mount point Scaling out Scaling out a data center Read performance Compaction strategy selection Optimizing read throughput for time-series models Optimizing tables for read-heavy models Cache settings Appropriate uses for row-caching Compression Chunk size The bloom filter configuration Read performance issues Other performance considerations JVM configuration Cassandra anti-patterns Building a queue Query flexibility Querying an entire table Incorrect use of BATCH Network Summary Chapter 6: Managing a Cluster Revisiting nodetool A warning about using nodetool Scaling up Adding nodes to a cluster Cleaning up the original nodes Adding a new data center Adjusting the cassandra-rackdc.properties file A warning about SimpleStrategy Streaming data Scaling down Removing nodes from a cluster Removing a live node Removing a dead node Other removenode options When removenode doesn't work (nodetool assassinate) Assassinating a node on an older version Removing a data center Backing up and restoring data Taking snapshots Enabling incremental backups Recovering from snapshots Maintenance Replacing a node Repair A warning about incremental repairs Cassandra Reaper Forcing read repairs at consistency – ALL Clearing snapshots and incremental backups Snapshots Incremental backups Compaction Why you should never invoke compaction manually Adjusting compaction throughput due to available resources Summary Chapter 7: Monitoring JMX interface MBean packages exposed by Cassandra JConsole (GUI) Connection and overview Viewing metrics Performing an operation JMXTerm (CLI) Connection and domains Getting a metric Performing an operation The nodetool utility Monitoring using nodetool describecluster gcstats getcompactionthreshold getcompactionthroughput getconcurrentcompactors getendpoints getlogginglevels getstreamthroughput gettimeout gossipinfo info netstats proxyhistograms status tablestats tpstats verify Administering using nodetool cleanup drain flush resetlocalschema stopdaemon truncatehints upgradeSSTable Metric stack Telegraf Installation Configuration JMXTrans Installation Configuration InfluxDB Installation Configuration InfluxDB CLI Grafana Installation Configuration Visualization Alerting Custom setup Log stack The system/debug/gc logs Filebeat Installation Configuration Elasticsearch Installation Configuration Kibana Installation Configuration Troubleshooting High CPU usage Different garbage-collection patterns Hotspots Disk performance Node flakiness All-in-one Docker Creating a database and other monitoring components locally Web links Summary Chapter 8: Application Development Getting started The path to failure Is Cassandra the right database? Good use cases for Apache Cassandra Use and expectations around application data consistency Choosing the right driver Building a Java application Driver dependency configuration with Apache Maven Connection class Other connection options Retry policy Default keyspace Port SSL Connection pooling options Starting simple – Hello World! Using the object mapper Building a data loader Asynchronous operations Data loader example Summary Chapter 9: Integration with Apache Spark Spark Architecture Installation Running custom Spark Docker locally Configuration The web UI Master Worker Application PySpark Connection config Accessing Cassandra data SparkR Connection config Accessing Cassandra data RStudio Connection config Accessing Cassandra data Jupyter Architecture Installation Configuration Web UI PYSpark through Juypter Summary Appendix: References Chapter 1 – Quick Start Chapter 2 – Cassandra Architecture Chapter 3 – Effective CQL Chapter 4 – Configuring a Cluster Chapter 5 – Performance Tuning Chapter 6 – Managing a Cluster Chapter 7 – Monitoring Chapter 8 – Application Development Chapter 9 – Integration with Apache Spark Other Books You May Enjoy Index