Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
9781838644130, 9781786463708, 9781788835367, 183864413X
Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily
Table of contents : Table of ContentsInstalling Pyspark and Setting up Your Development EnvironmentGetting Your Big Data into the Spark Environment Using RDDsBig Data Cleaning and Wrangling with Spark NotebooksAggregating and Summarizing Data into Useful ReportsPowerful Exploratory Data Analysis with MLlibPutting Structure on Your Big Data with SparkSQLTransformations and ActionsImmutable DesignAvoiding Shuffle and Reducing Operational ExpensesSaving Data in the Correct FormatWorking with the Spark Key/Value APITesting Apache Spark JobsLeveraging the Spark GraphX API