Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance, 3rd Edition 9781789131499, 1789131499

Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra Key FeaturesW

118 77 11MB

English Pages 348 [338]

Table of contents :
Cover
Title Page
Copyright and Credits
Packt Upsell
Foreward
Contributors
Table of Contents
Preface
Chapter 1: Quick Start
Introduction to Cassandra
High availability
Distributed
Partitioned row store
Installation
Configuration
cassandra.yaml
cassandra-rackdc.properties
Starting Cassandra
Cassandra Cluster Manager
A quick introduction to the data model
Using Cassandra with cqlsh
Shutting down Cassandra
Summary
Chapter 2: Cassandra Architecture
Why was Cassandra created?
RDBMS and problems at scale
Cassandra and the CAP theorem
Cassandra's ring architecture
Partitioners
ByteOrderedPartitioner
RandomPartitioner
Murmur3Partitioner
Single token range per node
Vnodes
Cassandra's write path
Cassandra's read path
On-disk storage
SSTables
How data was structured in prior versions
How data is structured in newer versions
Additional components of Cassandra
Gossiper
Snitch
Phi failure-detector
Tombstones
Hinted handoff
Compaction
Repair
Merkle tree calculation
Streaming data
Read repair
Security
Authentication
Authorization
Managing roles
Client-to-node SSL
Node-to-node SSL
Summary
Chapter 3: Effective CQL
An overview of Cassandra data modeling
[Cassandra storage model for versions 3.0 and beyond]
Cassandra storage model for versions 3.0 and beyond
Data cells
cqlsh
Logging into cqlsh
Problems connecting to cqlsh
Local cluster without security enabled
Remote cluster with user security enabled
Remote cluster with auth and SSL enabled
Connecting with cqlsh over SSL
Converting the Java keyStore into a PKCS12 keyStore
Exporting the certificate from the PKCS12 keyStore
Modifying your cqlshrc file
Testing your connection via cqlsh
Getting started with CQL
Creating a keyspace
Single data center example
Multi-data center example
Creating a table
Simple table example
Clustering key example
Composite partition key example
Table options
Data types
Type conversion
The primary key
Designing a primary key
Selecting a good partition key
Selecting a good clustering key
Querying data
The IN operator
Writing data
Inserting data
Updating data
Deleting data
Lightweight transactions
Executing a BATCH statement
The expiring cell
Altering a keyspace
Dropping a keyspace
Altering a table
Truncating a table
Dropping a table
Truncate versus drop
Creating an index
Caution with implementing secondary indexes
Dropping an index
Creating a custom data type
Altering a custom type
Dropping a custom type
User management
Creating a user and role
Altering a user and role
Dropping a user and role
Granting permissions
Revoking permissions
Other CQL commands
COUNT
DISTINCT
LIMIT
STATIC
User-defined functions
cqlsh commands
CONSISTENCY
COPY
DESCRIBE
TRACING
Summary
Chapter 4: Configuring a Cluster
Evaluating instance requirements
RAM
CPU
Disk
Solid state drives
Cloud storage offerings
SAN and NAS
Network
Public cloud networks
Firewall considerations
Strategy for many small instances versus few large instances
Operating system optimizations
Disable swap
XFS
Limits
limits.conf
sysctl.conf
Time synchronization
Configuring the JVM
Garbage collection
CMS
G1GC
Garbage collection with Cassandra
Installation of JVM
JCE
Configuring Cassandra
cassandra.yaml
cassandra-env.sh
cassandra-rackdc.properties
dc
rack
dc_suffix
prefer_local
cassandra-topology.properties
jvm.options
logback.xml
Managing a deployment pipeline
Orchestration tools
Configuration management tools
Recommended approach
Local repository for downloadable files
Summary
Chapter 5: Performance Tuning
Cassandra-Stress
The Cassandra-Stress YAML file
name
size
population
cluster
Cassandra-Stress results
Write performance
Commitlog mount point
Scaling out
Scaling out a data center
Read performance
Compaction strategy selection
Optimizing read throughput for time-series models
Optimizing tables for read-heavy models
Cache settings
Appropriate uses for row-caching
Compression
Chunk size
The bloom filter configuration
Read performance issues
Other performance considerations
JVM configuration
Cassandra anti-patterns
Building a queue
Query flexibility
Querying an entire table
Incorrect use of BATCH
Network
Summary
Chapter 6: Managing a Cluster
Revisiting nodetool
A warning about using nodetool
Scaling up
Adding nodes to a cluster
Cleaning up the original nodes
Adding a new data center
Adjusting the cassandra-rackdc.properties file
A warning about SimpleStrategy
Streaming data
Scaling down
Removing nodes from a cluster
Removing a live node
Removing a dead node
Other removenode options
When removenode doesn't work (nodetool assassinate)
Assassinating a node on an older version
Removing a data center
Backing up and restoring data
Taking snapshots
Enabling incremental backups
Recovering from snapshots
Maintenance
Replacing a node
Repair
A warning about incremental repairs
Cassandra Reaper
Forcing read repairs at consistency – ALL
Clearing snapshots and incremental backups
Snapshots
Incremental backups
Compaction
Why you should never invoke compaction manually
Adjusting compaction throughput due to available resources
Summary
Chapter 7: Monitoring
JMX interface
MBean packages exposed by Cassandra
JConsole (GUI)
Connection and overview
Viewing metrics
Performing an operation
JMXTerm (CLI)
Connection and domains
Getting a metric
Performing an operation
The nodetool utility
Monitoring using nodetool
describecluster
gcstats
getcompactionthreshold
getcompactionthroughput
getconcurrentcompactors
getendpoints
getlogginglevels
getstreamthroughput
gettimeout
gossipinfo
info
netstats
proxyhistograms
status
tablestats
tpstats
verify
Administering using nodetool
cleanup
drain
flush
resetlocalschema
stopdaemon
truncatehints
upgradeSSTable
Metric stack
Telegraf
Installation
Configuration
JMXTrans
Installation
Configuration
InfluxDB
Installation
Configuration
InfluxDB CLI
Grafana
Installation
Configuration
Visualization
Alerting
Custom setup
Log stack
The system/debug/gc logs
Filebeat
Installation
Configuration
Elasticsearch
Installation
Configuration
Kibana
Installation
Configuration
Troubleshooting
High CPU usage
Different garbage-collection patterns
Hotspots
Disk performance
Node flakiness
All-in-one Docker
Creating a database and other monitoring components locally
Web links
Summary
Chapter 8: Application Development
Getting started
The path to failure
Is Cassandra the right database?
Good use cases for Apache Cassandra
Use and expectations around application data consistency
Choosing the right driver
Building a Java application
Driver dependency configuration with Apache Maven
Connection class
Other connection options
Retry policy
Default keyspace
Port
SSL
Connection pooling options
Starting simple – Hello World!
Using the object mapper
Building a data loader
Asynchronous operations
Data loader example
Summary
Chapter 9: Integration with Apache Spark
Spark
Architecture
Installation
Running custom Spark Docker locally
Configuration
The web UI
Master
Worker
Application
PySpark
Connection config
Accessing Cassandra data
SparkR
Connection config
Accessing Cassandra data
RStudio
Connection config
Accessing Cassandra data
Jupyter
Architecture
Installation
Configuration
Web UI
PYSpark through Juypter
Summary
Appendix: References
Chapter 1 – Quick Start
Chapter 2 – Cassandra Architecture
Chapter 3 – Effective CQL
Chapter 4 – Configuring a Cluster
Chapter 5 – Performance Tuning
Chapter 6 – Managing a Cluster
Chapter 7 – Monitoring
Chapter 8 – Application Development
Chapter 9 – Integration with Apache Spark
Other Books You May Enjoy
Index

Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance, 3rd Edition
9781789131499, 1789131499

Author / Uploaded
Aaron Ploetz
Tejaswi Malepati
Nishant Neeraj

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Recommend Papers

Mastering Apache Cassandra 3.x: An expert guide to improving database scalability and availability without compromising performance [3 ed.] 9781789131499, 1524418542, 1524418547, 1524418548, 5085120999, 3105392851, 9781787127296, 9781787288867, 1789131499

Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra Key FeaturesW

103 57 26MB Read more

Expert Apache Cassandra Administration 9781484231265, 1489357081, 1484231260

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Ca

162 25 1MB Read more

Apache The Definitive Guide [3rd Edition]

Updated to cover the changes in Apache's latest release, 2.0, as well as Apache 1.3, this useful guide discusses ho

585 93 3MB Read more

MySQL Mastery: From Novice to Expert in Database Administration: Mastering MySQL: A Comprehensive Guide to Database Proficiency

Unveiling the Power of MySQL: Your Gateway to Database Mastery Dive into the realm of database management and harness th

100 24 548KB Read more

The Data Access Handbook: Achieving Optimal Database Application Performance and Scalability [1st edition] 9780137143931, 0137143931

The Data Access HandbookAchieving Optimal Database Application Performance and Scalability John Goodson - Robert A. Stew

359 91 4MB Read more

Mastering MongoDB 6.x: Expert techniques to run high-volume and fault-tolerant database solutions using MongoDB 6.x, 3rd Edition [3 ed.] 1803243864, 9781803243863

Design and build solutions with the most powerful document database, MongoDB Key FeaturesLearn from the experts about ev

98 54 31MB Read more

Mastering MySQL Administration - High Availability, Security, Performance, and Efficiency 9798868802522, 9798868802515

This book is your one-stop resource on MySQL database installation and server management for administrators. It covers i

112 19 23MB Read more

Oracle Performance Survival Guide: A Systematic Approach to Database Optimization [1st edition] 9780137011957, 0137011954

Oracle Performance Survival Guide A Systematic Approach to Database Optimization The fast, complete, start-to-finish gui

456 82 7MB Read more

Mastering Microsoft Excel From Beginner to Expert

Mastering Microsoft Excel: From Beginner to Expert stands as an authoritative resource in the pantheon of workplace tool

110 25 39MB Read more

NoSQL web development with Apache Cassandra 9781305576766, 1305576764, 1305576772

Apache Cassandra is the most commonly used NoSQL database written in Java and is renowned in the industry as the only No

422 14 12MB Read more