50 Algorithms Every Programmer Should Know: An unbeatable arsenal of algorithmic solutions for real-world problems [2 ed.] 1803247762, 9781803247762

Solve classic computer science problems from fundamental algorithms, such as sorting and searching, to modern algorithms

116 42 13MB

English Pages 538 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Cover
Copyright
Contributors
Table of Contents
Preface
Section 1: Fundamentals and Core Algorithms
Chapter 1: Overview of Algorithms
What is an algorithm?
The phases of an algorithm
Development environment
Python packages
The SciPy ecosystem
Using Jupyter Notebook
Algorithm design techniques
The data dimension
The compute dimension
Performance analysis
Space complexity analysis
Time complexity analysis
Estimating the performance
The best case
The worst case
The average case
Big O notation
Constant time (O(1)) complexity
Linear time (O(n)) complexity
Quadratic time (O(n2)) complexity
Logarithmic time (O(logn)) complexity
Selecting an algorithm
Validating an algorithm
Exact, approximate, and randomized algorithms
Explainability
Summary
Chapter 2: Data Structures Used in Algorithms
Exploring Python built-in data types
Lists
Using lists
Modifying lists: append and pop operations
The range() function
The time complexity of lists
Tuples
The time complexity of tuples
Dictionaries and sets
Dictionaries
Sets
Time complexity analysis for sets
When to use a dictionary and when to use a set
Using Series and DataFrames
Series
DataFrame
Creating a subset of a DataFrame
Matrices
Matrix operations
Big O notation and matrices
Exploring abstract data types
Vector
Time complexity of vectors
Stacks
Time complexity of stack operations
Practical example
Queues
Time complexity analysis for queues
The basic idea behind the use of stacks and queues
Tree
Terminology
Types of trees
Practical examples
Summary
Chapter 3: Sorting and Searching Algorithms
Introducing sorting algorithms
Swapping variables in Python
Bubble sort
Understanding the logic behind bubble sort
Optimizing bubble sort
Performance analysis of the bubble sort algorithm
Insertion sort
Performance analysis of the insertion sort algorithm
Merge sort
Shell sort
Performance analysis of the Shell sort algorithm
Selection sort
Performance analysis of the selection sort algorithm
Choosing a sorting algorithm
Introduction to searching algorithms
Linear search
Performance analysis of the linear search algorithm
Binary search
Performance analysis of the binary search algorithm
Interpolation search
Performance analysis of the interpolation search algorithm
Practical applications
Summary
Chapter 4: Designing Algorithms
Introducing the basic concepts of designing an algorithm
Concern 1: correctness: will the designed algorithm produce the result we expect?
Concern 2: performance: is this the optimal way to get these results?
Characterizing the complexity of the problem
Exploring the relationship between P and NP
Introducing NP-complete and NP-hard
Concern 3 – scalability: how is the algorithm going to perform on larger datasets?
The elasticity of the cloud and algorithmic scalability
Understanding algorithmic strategies
Understanding the divide-and-conquer strategy
A practical example – divide-and-conquer applied to Apache Spark
Understanding the dynamic programming strategy
Components of dynamic programming
Conditions for using dynamic programming
Understanding greedy algorithms
Conditions for using greedy programming
A practical application – solving the TSP
Using a brute-force strategy
Using a greedy algorithm
Comparison of Three Strategies
Presenting the PageRank algorithm
Problem definition
Implementing the PageRank algorithm
Understanding linear programming
Formulating a linear programming problem
Defining the objective function
Specifying constraints
A practical application – capacity planning with linear programming
Summary
Chapter 5: Graph Algorithms
Understanding graphs: a brief introduction
Graphs: the backbone of modern data networks
Real-world applications
The basics of a graph: vertices (or nodes)
Graph theory and network analysis
Representations of graphs
Graph mechanics and types
Ego-centered networks
Basics of egonets
One-hop, two-hop, and beyond
Applications of egonets
Introducing network analysis theory
Understanding the shortest path
Creating a neighborhood
Triangles
Density
Understanding centrality measures
Degree
Betweenness
Fairness and closeness
Eigenvector centrality
Calculating centrality metrics using Python
1. Setting the foundation: libraries and data
2. Crafting the graph
3. Painting a picture: visualizing the graph
Social network analysis
Understanding graph traversals
BFS
Constructing the adjacency list
BFS algorithm implementation
Using BFS for specific searches
DFS
Case study: fraud detection using SNA
Introduction
What is fraud in this context?
Conducting simple fraud analytics
Presenting the watchtower fraud analytics methodology
Scoring negative outcomes
Degree of suspicion
Summary
Section 2: Machine Learning Algorithms
Chapter 6: Unsupervised Machine Learning Algorithms
Introducing unsupervised learning
Unsupervised learning in the data-mining lifecycle
Phase 1: Business understanding
Phase 2: Data understanding
Phase 3: Data preparation
Phase 4: Modeling
Phase 5: Evaluation
Phase 6: Deployment
Current research trends in unsupervised learning
Practical examples
Marketing segmentation using unsupervised learning
Understanding clustering algorithms
Quantifying similarities
Euclidean distance
Manhattan distance
Cosine distance
k-means clustering algorithm
The logic of k-means clustering
Initialization
The steps of the k-means algorithm
Stop condition
Coding the k-means algorithm
Limitation of k-means clustering
Hierarchical clustering
Steps of hierarchical clustering
Coding a hierarchical clustering algorithm
Understanding DBSCAN
Creating clusters using DBSCAN in Python
Evaluating the clusters
Application of clustering
Dimensionality reduction
Principal component analysis
Limitations of PCA
Association rules mining
Examples of use
Market basket analysis
Association rules mining
Types of rules
Trivial rules
Inexplicable rules
Actionable rules
Ranking rules
Support
Confidence
Lift
Algorithms for association analysis
Apriori algorithm
Limitations of the apriori algorithm
FP-growth algorithm
Populating the FP-tree
Mining frequent patterns
Code for using FP-growth
Summary
Chapter 7: Traditional Supervised Learning Algorithms
Understanding supervised machine learning
Formulating supervised machine learning problems
Understanding enabling conditions
Differentiating between classifiers and regressors
Understanding classification algorithms
Presenting the classifiers challenge
The problem statement
Feature engineering using a data processing pipeline
Scaling the features
Evaluating the classifiers
Confusion matrices
Understanding recall and precision
Understanding the recall and precision trade-off
Understanding overfitting
Specifying the phases of classifiers
Decision tree classification algorithm
Understanding the decision tree classification algorithm
The strengths and weaknesses of decision tree classifiers
Use cases
Understanding the ensemble methods
Implementing gradient boosting with the XGBoost algorithm
Differentiating the Random Forest algorithm from ensemble boosting
Using the Random Forest algorithm for the classifiers challenge
Logistic regression
Assumptions
Establishing the relationship
The loss and cost functions
When to use logistic regression
Using the logistic regression algorithm for the classifiers challenge
The SVM algorithm
Using the SVM algorithm for the classifiers challenge
Understanding the Naive Bayes algorithm
Bayes’ theorem
Calculating probabilities
Multiplication rules for AND events
The general multiplication rule
Addition rules for OR events
Using the Naive Bayes algorithm for the classifiers challenge
For classification algorithms, the winner is...
Understanding regression algorithms
Presenting the regressors challenge
The problem statement of the regressors challenge
Exploring the historical dataset
Feature engineering using a data processing pipeline
Linear regression
Simple linear regression
Evaluating the regressors
Multiple regression
Using the linear regression algorithm for the regressors challenge
When is linear regression used?
The weaknesses of linear regression
The regression tree algorithm
Using the regression tree algorithm for the regressors challenge
The gradient boost regression algorithm
Using the gradient boost regression algorithm for the regressors challenge
For regression algorithms, the winner is...
Practical example – how to predict the weather
Summary
Chapter 8: Neural Network Algorithms
The evolution of neural networks
Historical background
AI winter and the dawn of AI spring
Understanding neural networks
Understanding perceptrons
Understanding the intuition behind neural networks
Understanding layered deep learning architectures
Developing an intuition for hidden layers
How many hidden layers should be used?
Mathematical basis of neural network
Training a neural network
Understanding the anatomy of a neural network
Defining gradient descent
Activation functions
Step function
Sigmoid function
ReLU
Leaky ReLU
Hyperbolic tangent (tanh)
Softmax
Tools and frameworks
Keras
Backend engines of Keras
Low-level layers of the deep learning stack
Defining hyperparameters
Defining a Keras model
Choosing a sequential or functional model
Understanding TensorFlow
Presenting TensorFlow’s basic concepts
Understanding Tensor mathematics
Understanding the types of neural networks
Convolutional neural networks
Convolution
Pooling
Generative Adversarial Networks
Using transfer learning
Case study – using deep learning for fraud detection
Methodology
Summary
Chapter 9: Algorithms for Natural Language Processing
Introducing NLP
Understanding NLP terminology
Text preprocessing in NLP
Tokenization
Cleaning data
Cleaning data using Python
Understanding the Term Document Matrix
Using TF-IDF
Summary and discussion of results
Introduction to word embedding
Implementing word embedding with Word2Vec
Interpreting similarity scores
Advantages and disadvantages of Word2Vec
Case study: Restaurant review sentiment analysis
Importing required libraries and loading the dataset
Building a clean corpus: Preprocessing text data
Converting text data into numerical features
Analyzing the results
Applications of NLP
Summary
Chapter 10: Understanding Sequential Models
Understanding sequential data
Types of sequence models
One-to-many
Many-to-one
Many-to-many
Data representation for sequential models
Introducing RNNs
Understanding the architecture of RNNs
Understanding the memory cell and hidden state
Understanding the characteristics of the input variable
Training the RNN at the first timestep
The activation function in action
Training the RNN for a whole sequence
Calculating the output for each timestep
Backpropagation through time
Predicting with RNNs
Limitations of basic RNNs
Vanishing gradient problem
Inability to look ahead in the sequence
GRU
Introducing the update gate
Implementing the update gate
Updating the hidden cell
Running GRUs for multiple timesteps
Introducing LSTM
Introducing the forget gate
The candidate cell state
The update gate
Calculating memory state
The output gate
Putting everything together
Coding sequential models
Loading the dataset
Preparing the data
Creating the model
Training the model
Viewing some incorrect predictions
Summary
Chapter 11: Advanced Sequential Modeling Algorithms
The evolution of advanced sequential modeling techniques
Exploring autoencoders
Coding an autoencoder
Setting up the environment
Data preparation
Model architecture
Compilation
Training
Prediction
Visualization
Understanding the Seq2Seq model
Encoder
Thought vector
Decoder or writer
Special tokens in Seq2Seq
The information bottleneck dilemma
Understanding the attention mechanism
What is attention in neural networks?
Basic idea
Example
Three key aspects of attention mechanisms
A deeper dive into attention mechanisms
The challenges of attention mechanisms
Delving into self-attention
Attention weights
Encoder: bidirectional RNNs
Thought vector
Decoder: regular RNNs
Training versus inference
Transformers: the evolution in neural networks after self-attention
Why transformers shine
A Python code breakdown
Understanding the output
LLMs
Understanding attention in LLMs
Exploring the powerhouses of NLP: GPT and BERT
2018’s LLM pioneers: GPT and BERT
Using deep and wide models to create powerful LLMs
Bottom of Form
Summary
Section 3: Advanced Topics
Chapter 12: Recommendation Engines
Introducing recommendation systems
Types of recommendation engines
Content-based recommendation engines
Determining similarities in unstructured documents
Collaborative filtering recommendation engines
Issues related to collaborative filtering
Hybrid recommendation engines
Generating a similarity matrix of the items
Generating reference vectors of the users
Generating recommendations
Evolving the recommendation system
Understanding the limitations of recommendation systems
The cold start problem
Metadata requirements
The data sparsity problem
The double-edged sword of social influence in recommendation systems
Areas of practical applications
Netflix’s mastery of data-driven recommendations
The evolution of Amazon’s recommendation system
Practical example – creating a recommendation engine
1. Setting up the framework
2. Data loading: ingesting reviews and titles
3. Merging data: crafting a comprehensive view
4. Descriptive analysis: gleaning insights from ratings
5. Structuring for recommendations: crafting the matrix
6. Putting the engine to test: recommending movies
Finding movies correlating with Avatar (2009)
10,000 BC (2008) -0.075431 Understanding correlation
Evaluating the model
Retraining over time: incorporating user feedback
Summary
Chapter 13: Algorithmic Strategies for Data Handling
Introduction to data algorithms
Significance of CAP theorem in context of data algorithms
Storage in distributed environments
Connecting CAP theorem and data compression
Presenting the CAP theorem
CA systems
AP systems
CP systems
Decoding data compression algorithms
Lossless compression techniques
Huffman coding: Implementing variable-length coding
Understanding dictionary-based compression LZ77
Advanced lossless compression formats
Practical example: Data management in AWS: A focus on CAP theorem and compression algorithms
1. Applying the CAP theorem
2. Using compression algorithms
3. Quantifying the benefits
Summary
Chapter 14: Cryptography
Introduction to cryptography
Understanding the importance of the weakest link
The basic terminology
Understanding the security requirements
Step 1: Identifying the entities
Step 2: Establishing the security goals
Step 3: Understanding the sensitivity of the data
Understanding the basic design of ciphers
Presenting substitution ciphers
Cryptanalysis of substitution ciphers
Understanding transposition ciphers
Understanding the types of cryptographic techniques
Using the cryptographic hash function
Implementing cryptographic hash functions
An application of the cryptographic hash function
Choosing between MD5 and SHA
Using symmetric encryption
Coding symmetric encryption
The advantages of symmetric encryption
The problems with symmetric encryption
Asymmetric encryption
The SSL/TLS handshaking algorithm
Public key infrastructure
Blockchain and cryptography
Example: security concerns when deploying a machine learning model
MITM attacks
How to prevent MITM attacks
Avoiding masquerading
Data and model encryption
Summary
Chapter 15: Large-Scale Algorithms
Introduction to large-scale algorithms
Characterizing performant infrastructure for large-scale algorithms
Elasticity
Characterizing a well-designed, large-scale algorithm
Load balancing
ELB: Combining elasticity and load balancing
Strategizing multi-resource processing
Understanding theoretical limitations of parallel computing
Amdahl’s law
Deriving Amdahl’s law
CUDA: Unleashing the potential of GPU architectures in parallel computing
Bottom of form
Parallel processing in LLMs: A case study in Amdahl’s law and diminishing returns
Rethinking data locality
Benefiting from cluster computing using Apache Spark
How Apache Spark empowers large-scale algorithm processing
Distributed computing
In-memory processing
Using large-scale algorithms in cloud computing
Example
Summary
Chapter 16: Practical Considerations
Challenges facing algorithmic solutions
Expecting the unexpected
Failure of Tay, the Twitter AI bot
The explainability of an algorithm
Machine learning algorithms and explainability
Presenting strategies for explainability
Understanding ethics and algorithms
Problems with learning algorithms
Understanding ethical considerations
Factors affecting algorithmic solutions
Considering inconclusive evidence
Traceability
Misguided evidence
Unfair outcomes
Reducing bias in models
When to use algorithms
Understanding black swan events and their implications on algorithms
Summary
Packt page
Other Books You May Enjoy
Index

50 Algorithms Every Programmer Should Know: An unbeatable arsenal of algorithmic solutions for real-world problems [2 ed.]
 1803247762, 9781803247762

  • Commentary
  • Published: September 2023
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers