Fast Python: High performance techniques for large datasets
9781617297939
Master Python techniques and libraries to reduce run times, efficiently handle huge datasets, and optimize execution for
1,346
340
6MB
English
Pages 389
Year 2023
Report DMCA / Copyright
DOWNLOAD EPUB FILE
Table of contents :
logo
Hello, Welcome to EPUB Reader
Click button to select your book
Open EPUB book
This Online Web App is made by Neo Reader for experimental purpose, it is a very simple EPUB Reader. We recommend you try our Neo Reader for better experience.
Take a look now
neat reader pc
AD
Ultimate EPUB Reader
Totally free to try
Support multiple file types, such as EPUB, MOBI, AZW3, AZW, PDF and TXT.
Learn more about Neo Reader
General Ebook Solution
inside front cover
Fast Python
Copyright
contents
front matter
preface
acknowledgments
about this book
Who should read this book?
How this book is organized: A road map
About the code
liveBook discussion forum
Hardware and software
about the author
about the cover illustration
Part 1. Foundational Approaches
1 An urgent need for efficiency in data processing
1.1 How bad is the data deluge?
1.2 Modern computing architectures and high-performance computing
1.2.1 Changes inside the computer
1.2.2 Changes in the network
1.2.3 The cloud
1.3 Working with Python’s limitations
1.3.1 The Global Interpreter Lock
1.4 A summary of the solutions
Summary
2 Extracting maximum performance from built-in features
2.1 Profiling applications with both IO and computing workloads
2.1.1 Downloading data and computing minimum temperatures
2.1.2 Python’s built-in profiling module
2.1.3 Using local caches to reduce network usage
2.2 Profiling code to detect performance bottlenecks
2.2.1 Visualizing profiling information
2.2.2 Line profiling
2.2.3 The takeaway: Profiling code
2.3 Optimizing basic data structures for speed: Lists, sets, and dictionaries
2.3.1 Performance of list searches
2.3.2 Searching using sets
2.3.3 List, set, and dictionary complexity in Python
2.4 Finding excessive memory allocation
2.4.1 Navigating the minefield of Python memory estimation
2.4.2 The memory footprint of some alternative representations
2.4.3 Using arrays as a compact representation alternative to lists
2.4.4 Systematizing what we have learned: Estimating memory usage of Python objects
2.4.5 The takeaway: Estimating memory usage of Python objects
2.5 Using laziness and generators for big-data pipelining
2.5.1 Using generators instead of standard functions
Summary
3 Concurrency, parallelism, and asynchronous processing
3.1 Writing the scaffold of an asynchronous server
3.1.1 Implementing the scaffold for communicating with clients
3.1.2 Programming with coroutines
3.1.3 Sending complex data from a simple synchronous client
3.1.4 Alternative approaches to interprocess communication
3.1.5 The takeaway: Asynchronous programming
3.2 Implementing a basic MapReduce engine
3.2.1 Understanding MapReduce frameworks
3.2.2 Developing a very simple test scenario
3.2.3 A first attempt at implementing a MapReduce framework
3.3 Implementing a concurrent version of a MapReduce engine
3.3.1 Using concurrent.futures to implement a threaded server
3.3.2 Asynchronous execution with futures
3.3.3 The GIL and multithreading
3.4 Using multiprocessing to implement MapReduce
3.4.1 A solution based on concurrent.futures
3.4.2 A solution based on the multiprocessing module
3.4.3 Monitoring the progress of the multiprocessing solution
3.4.4 Transferring data in chunks
3.5 Tying it all together: An asynchronous multithreaded and multiprocessing MapReduce server
3.5.1 Architecting a complete high-performance solution
3.5.2 Creating a robust version of the server
Summary
4 High-performance NumPy
4.1 Understanding NumPy from a performance perspective
4.1.1 Copies vs. views of existing arrays
4.1.2 Understanding NumPy’s view machinery
4.1.3 Making use of views for efficiency
4.2 Using array programming
4.2.1 The takeaway
4.2.2 Broadcasting in NumPy
4.2.3 Applying array programming
4.2.4 Developing a vectorized mentality
4.3 Tuning NumPy’s internal architecture for performance
4.3.1 An overview of NumPy dependencies
4.3.2 How to tune NumPy in your Python distribution
4.3.3 Threads in NumPy
Summary
Part 2. Hardware
5 Re-implementing critical code with Cython
5.1 Overview of techniques for efficient code re-implementation
5.2 A whirlwind tour of Cython
5.2.1 A naive implementation in Cython
5.2.2 Using Cython annotations to increase performance
5.2.3 Why annotations are fundamental to performance
5.2.4 Adding typing to function returns
5.3 Profiling Cython code
5.3.1 Using Python’s built-in profiling infrastructure
5.3.2 Using line_profiler
5.4 Optimizing array access with Cython memoryviews
5.4.1 The takeaway
5.4.2 Cleaning up all internal interactions with Python
5.5 Writing NumPy generalized universal functions in Cython
5.5.1 The takeaway
5.6 Advanced array access in Cython
5.6.1 Bypassing the GIL’s limitation on running multiple threads at a time
5.6.2 Basic performance analysis
5.6.3 A spacewar example using Quadlife
5.7 Parallelism with Cython
Summary
6 Memory hierarchy, storage, and networking
6.1 How modern hardware architectures affect Python performance
6.1.1 The counterintuitive effect of modern architectures on performance
6.1.2 How CPU caching affects algorithm efficiency
6.1.3 Modern persistent storage
6.2 Efficient data storage with Blosc
6.2.1 Compress data; save time
6.2.2 Read speeds (and memory buffers)
6.2.3 The effect of different compression algorithms on storage performance
6.2.4 Using insights about data representation to increase compression
6.3 Accelerating NumPy with NumExpr
6.3.1 Fast expression processing
6.3.2 How hardware architecture affects our results
6.3.3 When NumExpr is not appropriate
6.4 The performance implications of using the local network
6.4.1 The sources of inefficiency with REST calls
6.4.2 A naive client based on UDP and msgpack
6.4.3 A UDP-based server
6.4.4 Dealing with basic recovery on the client side
6.4.5 Other suggestions for optimizing network computing
Summary
Part 3. Applications and Libraries for Modern Data Processing
7 High-performance pandas and Apache Arrow
7.1 Optimizing memory and time when loading data
7.1.1 Compressed vs. uncompressed data
7.1.2 Type inference of columns
7.1.3 The effect of data type precision
7.1.4 Recoding and reducing data
7.2 Techniques to increase data analysis speed
7.2.1 Using indexing to accelerate access
7.2.2 Row iteration strategies
7.3 pandas on top of NumPy, Cython, and NumExpr
7.3.1 Explicit use of NumPy
7.3.2 pandas on top of NumExpr
7.3.3 Cython and pandas
7.4 Reading data into pandas with Arrow
7.4.1 The relationship between pandas and Apache Arrow
7.4.2 Reading a CSV file
7.4.3 Analyzing with Arrow
7.5 Using Arrow interop to delegate work to more efficient languages and systems
7.5.1 Implications of Arrow’s language interop architecture
7.5.2 Zero-copy operations on data with Arrow’s Plasma server
Summary
8 Storing big data
8.1 A unified interface for file access: fsspec
8.1.1 Using fsspec to search for files in a GitHub repo
8.1.2 Using fsspec to inspect zip files
8.1.3 Accessing files using fsspec
8.1.4 Using URL chaining to traverse different filesystems transparently
8.1.5 Replacing filesystem backends
8.1.6 Interfacing with PyArrow
8.2 Parquet: An efficient format to store columnar data
8.2.1 Inspecting Parquet metadata
8.2.2 Column encoding with Parquet
8.2.3 Partitioning with datasets
8.3 Dealing with larger-than-memory datasets the old-fashioned way
8.3.1 Memory mapping files with NumPy
8.3.2 Chunk reading and writing of data frames
8.4 Zarr for large-array persistence
8.4.1 Understanding Zarr’s internal structure
8.4.2 Storage of arrays in Zarr
8.4.3 Creating a new array
8.4.4 Parallel reading and writing of Zarr arrays
Summary
Part 4. Advanced Topics
9 Data analysis using GPU computing
9.1 Making sense of GPU computing power
9.1.1 Understanding the advantages of GPUs
9.1.2 The relationship between CPUs and GPUs
9.1.3 The internal architecture of GPUs
9.1.4 Software architecture considerations
9.2 Using Numba to generate GPU code
9.2.1 Installation of GPU software for Python
9.2.2 The basics of GPU programming with Numba
9.2.3 Revisiting the Mandelbrot example using GPUs
9.2.4 A NumPy version of the Mandelbrot code
9.3 Performance analysis of GPU code: The case of a CuPy application
9.3.1 GPU-based data analysis libraries
9.3.2 Using CuPy: A GPU-based version of NumPy
9.3.3 A basic interaction with CuPy
9.3.4 Writing a Mandelbrot generator using Numba
9.3.5 Writing a Mandelbrot generator using CUDA C
9.3.6 Profiling tools for GPU code
Summary
10 Analyzing big data with Dask
10.1 Understanding Dask’s execution model
10.1.1 A pandas baseline for comparison
10.1.2 Developing a Dask-based data frame solution
10.2 The computational cost of Dask operations
10.2.1 Partitioning data for processing
10.2.2 Persisting intermediate computations
10.2.3 Algorithm implementations over distributed data frames
10.2.4 Repartitioning the data
10.2.5 Persisting distributed data frames
10.3 Using Dask’s distributed scheduler
10.3.1 The dask.distributed architecture
10.3.2 Running code using dask.distributed
10.3.3 Dealing with datasets larger than memory
Summary
Appendix A. Setting up the environment
A.1 Setting up Anaconda Python
A.2 Installing your own Python distribution
A.3 Using Docker
A.4 Hardware considerations
Appendix B. Using Numba to generate efficient low-level code
B.1 Generating optimized code with Numba
B.2 Writing explicitly parallel functions in Numba
B.3 Writing NumPy-aware code in Numba
index