345 105 13MB
English Pages [126] Year 2023
Computer Science, Technology and Applications
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
Computer Science, Technology and Applications Speech Recognition Technology and Applications Vasile-Florian Păiș (Editor) 2022. ISBN: 978-1-68507-929-1 (Hardcover) 2022. ISBN: 979-8-88697-179-8 (eBook) Internet of Everything: Smart Sensing Technologies T. Kavitha, V. Ajantha Devi, S. Neelavathy Pari and Sakkaravarthi Ramanathan (Editors) 2022. ISBN: 978-1-68507-865-2 (Hardcover) 2022. ISBN: 978-1-68507-943-7 (eBook) A Beginner’s Guide to Virtual Reality (VR) Modeling in Healthcare Applications with Blender Yuk Ming Tang, Ho Lun Ho, Ka Yin Chau and Yan Wan (Authors) 2022. ISBN: 978-1-68507-811-9 (Softcover) 2022. ISBN: 978-1-68507-945-1 (eBook) Applying an Advanced Information Search and Retrieval Model in Organisations: Research and Opportunities Maria del Carmen Cruz Gil (Author) 2022. ISBN: 978-1-68507-560-6 (Softcover) 2022. ISBN: 978-1-68507-914-7 (eBook) Neural Network Control of Vehicles: Modeling and Simulation Igor Astrov (Author) 2022. ISBN: 978-1-68507-757-0 (Hardcover) 2022. ISBN: 978-1-68507-916-1 (eBook)
More information about this series can be found at https://novapublishers.com/product-category/series/computer-sciencetechnology-and-applications/
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon and Sandip Dutta
A Guide to Design and Analysis of Algorithms
Copyright © 2023 by Nova Science Publishers, Inc. DOI: https://doi.org/10.52305/HVZD7283 All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected].
NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the Publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.
Library of Congress Cataloging-in-Publication Data ISBN: H%RRN
Published by Nova Science Publishers, Inc. † New York
Contents
Preface
.......................................................................................... vii
Chapter 1
Introduction to the Design of Algorithms ........................1
Chapter 2
Divide and Conquer ........................................................11
Chapter 3
Greedy Algorithms ..........................................................19
Chapter 4
Dynamic Programming ...................................................27
Chapter 5
Backtracking ....................................................................45
Chapter 6
Branch and Bound ...........................................................51
Chapter 7
Introduction to the Analysis of Algorithms ...................67
Chapter 8
Randomized Algorithms .................................................81
Chapter 9
Master Theorem ..............................................................91
Chapter 10
A Note on Empirical Complexity Analysis ....................97
References
.........................................................................................109
About the Authors ....................................................................................111 Index
.........................................................................................113
Preface
As there can be more than one algorithm for the same problem, designing and analysing an algorithm becomes important in order to make it efficient and robust as far as possible. This book would serve as a guide to design and analysis of computer algorithms. Chapter 1 gives an overview of different algorithm design techniques and the various applications of the discussed techniques. Brute force approach, divide and conquer design approach, greedy algorithms, dynamic programming, branch and bound technique, backtracking and randomized algorithms are discussed in this chapter. Chapter 2 discusses the divide and conquer strategy and some of the algorithms that employs this technique such as recurrence relation, binary search and merge sort. Chapter 3 gives an insight into the greedy algorithms and some problems that can be solved using the greedy approach like the job sequencing problem with deadline and the Dijkstra algorithm. Chapter 4 discusses in depth the dynamic programming approach. Problems such as the stagecoach problem, optimal binary search tree, 0/1 knapsack problem and the subset sum problem are discussed in this chapter. Chapter 5 deals with the backtracking approach with a solution to the N – Queens problem using this approach. Chapter 6 throws some light on the branch and bound technique. Solution to three famous problems viz. assignment problem, 0/1 knapsack problem and travelling salesman problem are discussed using the branch and bound technique. Chapter 7 introduces the second part of the book – the analysis of algorithms. Two different approaches to the analysis of algorithm viz. the asymptotic analysis and the empirical analysis are discussed in the chapter. Chapter 8 discusses randomized algorithms with an empirical analysis touch. Randomized quick sort and randomized binary search are discussed and analysed empirically. Chapter 9 deals with Master Theorem. Many problems that can be solved using Master Theorem are dealt with in this chapter. Chapter 10 gives a note on the empirical complexity analysis of algorithms. Empirical
viii
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
complexity of four prominent sorting algorithms – merge sort, quick sort, bubble sort and selection sort have been discussed in depth in this chapter. The authors thank Nova Science Publishers for accepting the challenge to publish this work in a book form.
Soubhik Chakraborty Prashant Pranav Naghma Khatoon Sandip Dutta
Chapter 1
Introduction to the Design of Algorithms An algorithm is a sequence of instructions that execute program, data processing and automated reasoning in order to solve a problem. An algorithm design is an efficient method that can we expressed in a finite amount of space and time. Different approaches can be used to solve a problem. In terms of memory, some of them can be efficient, whereas with respect to time some other approaches can be efficient. However, it is important to note that memory usage and time consumption cannot be optimized at the same time. More memory required if user need an algorithm to run in less time and if user need an algorithm to run faster than less memory required.
1.1. Brute Force Approach A brute force approach is a method for solving a problem by finding all possible option available to find a suitable solution to a given problem. The brute force method explores every possibility until a satisfactory solution is not found. For problem-solving, brute-force algorithms rely mostly on sheer computing power.
Example of Brute Force algorithms are: Sequential search, BreadthFirst search, Depth-First search, Bubble sort, Selection sort, ConvexHull problem, Closest-Pair problem, and many more. Algorithms like BFS and DFS are examples of Exhaustive search. Exhaustive search is basically a Brute-Force approach to combinatorial problems. The following are some features of brute force algorithms: o A brute force algorithm is a method of problem solving that is direct, intuitive and straightforward technique that provides all of the possible solutions to a given problem. o The brute force approach is used to solve a number of problems that come up in daily life, such as finding the shortest route to a
2
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
o
nearby park, or organizing the books in a rack to optimize the available rack space. Even though optimum algorithms are also a possibility, many tasks that we perform on a daily basis are of the brute force kind.
1.2. Divide and Conquer Divide and conquer is one of the most important algorithms in computer science. In divide and conquer algorithm the complex problem is split into two or more part or similar sub-problems, these sub-problems are then split into smaller sub-problems, and so on, until the last sub-problem can be solved directly and simply. Basically, it is a combination of solutions of the solution of the original problem. This algorithm is used to solve the problems which need an exact solution in shorter time.
Figure 1.1. Basic flow of Divide and Conquer approach.
To design an algorithm for a given problem dive and conquer uses 3-steps to solve the problem: a.
Divide the problem into a smaller instance (number of sub-problems) of the same problem.
Introduction to the Design of Algorithms
3
b. Conquer the sub-problems by solving them recursively. Solve the sub-problems as base cases if they are small enough. c. Combine the solutions to the sub-problems into the solution for the given problem. Several effective algorithms such as Fourier transform (fast Fourier transform), sort algorithm (merge sort, quick sort) and many more are built on divide and conquer approach. Figure 1.1 shows a step, of divide and conquer approach (assuming that each step (problem) is divided into two sub-problems but it can be created more than two): Divide and conquer approach is explained in the following steps: DC(m,n) { If (small(m,n)); { return (solutions(m,n)); } Else { p = divide(m,n); DC(m,p); DC(p+1, n); Combine(DC(m,p)DC(p+1, n)); } } There are several algorithms which is the application of Divide and Conquer concept. Following are some applications of divide and conquer algorithm:
Searching an element in a sorted array Sorting an unordered array (merge sort, quick sort) Finding the power of an element Strassen’s Matrix multiplication Maximum and Minimum Problem Tower of Hanoi Karatsuba Algorithm.
4
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
1.3. Greedy Technique The greedy approach is the straightforward and simplest method and also it is a technique not an algorithm. Greedy technique uses recursive/iterative approach to solve an optimization problem and selects a best solution. The two main characteristics of the greedy technique are as follows:
It builds two sets, one including all the selected items and the other containing the rejected items, in order to construct the solution in an optimal way. A greedy technique makes good local choices in the assumption that the solution will be either optimal or feasible.
The five components of greedy technique are as follows: a. Candidate Set: initial step from where solution is created. b. Selection Function: Selects the best candidate to be added to the solution. c. Feasibility Function: Used to determine, if a candidate can be used to contribute to a solution. d. Objective Function: Assigns a value to a solution or a partial solution. e. Solution Function: Indicates when we have discovered a complete solution. Greedy techniques are simple to create and use, and also in several cases they are efficient. In various instances, there is no guarantee that making locally optimal improvements to a locally optimal solution will result in the optimal global solution. However, there are some problems that cannot be solved successfully using this technique. Here, some of the algorithms that make use of greedy technique:
Prim’s algorithm Knapsack problem Dijkstra’s Algorithm Huffman tree building Kruskal’s Algorithm Travelling salesman problem, etc.
Introduction to the Design of Algorithms
5
1.4. Dynamic Programming As name suggests dynamic means taking decisions at numerous stages and programming refers to just set the actions or planning the best possible way. Dynamic programming is a multi-stage optimization process or a recursive optimization problem. A complex problem can be solved using the dynamic programming method by dividing the given problem into a number of subproblems and solving each of them once, and collecting the solutions to these sub-problems in a table. Dynamic programming reduces the number of computations by moving systematically from one side to the other, building the best solution as it goes. This approach is used when there is an overlapping between sub-problems of the same problem. This algorithm is an essential tool in solving problems of stochastic and dynamic controls in economic analysis. It is one of the most popular algorithms used in bioinformatics. Dynamic programming can be used in both bottom-up and top-down manner. This algorithm is used several processes such as gene recognition, sequence comparison and so on. The following are the steps that the dynamic programming follows: Step 1: Identify the decision variable Step 2: Break the complex problem into a number of sub-problems. Determine the state variable at each stage and then find the transformation function as a function of the stage and decision variables at the next stage. Step 3: Create a general recursive relationship for computing the optimal policy. Then according to solve the problem select forward or backward method. Step 4: Construct suitable stages to show the required values of the return function at each stage. Step 5: Determine the overall optimal policy or decisions and its value at each stage. There may one optimal such policy. Dynamic programming algorithm used recursive equation to solve the problem starting from the first through the last stage, that is obtaining the sequence f1 → f2 →…. → fn, this is called the forward computational procedure. If recursion equation is formulated in a different way so as to obtain the sequence fn → fn-1 →…. → f1, this is called the backward computational procedure.
6
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Some of the computer problems are listed below that can be solved by dynamic programming approach:
Knapsack problem Project scheduling Fibonacci number series All pair shortest path by Floyd-Warshall Tower of Hanoi Shortest path by Dijkstra.
1.5. Branch and Bound Branch and bound is one of the techniques used for finding the optimal solution for discrete, combinatory and general mathematical optimization problems. This algorithm is a widely used methodology for producing exact solutions to NP-hard optimization problems. Branch and bound is a space search technique in which all the node’s children are generated before any of its children are expanded. It is alike to backtracking approach but uses BFSlike search. Concept of Branch and Bound Algorithm Step 1: Traverse the root node. Step 2: Traverse any neighbor of the root node that is keeping a minimum distance from the root node. Step 3: Traverse any neighbor of the root node neighbor that is keeping the shortest distance from the root node. Step 4: This process will continue up till the goal node is obtained.
1.6. Randomized Algorithms A randomized technique is an algorithm that works on a logic of randomness. Basically, this approach is used to reduce either time complexity, or space complexity, or the memory used, or the running time in a standard algorithm. The algorithm works by generating a ‘r’ (random number) within a specified series/range of numbers, and making the decisions based on r’s value. The deterministic algorithm is used to solve a problem quickly (in polynomial
Introduction to the Design of Algorithms
7
time) and correctly. The flow chart of deterministic algorithm is shown below in Figure 1.2.
Figure 1.2. Flow chart of deterministic algorithm.
On comparing the above flow chart with what one for a randomized algorithm looks like as shown in Figure 1.3.
Figure 1.3. Flow chart of randomized algorithm.
As shown in above flow chart, in addition to input randomized technique uses random numbers to make random choices during execution. The output is different for the same input, as its behavior changes with different runs. This technique depends upon Random Number Generator.
8
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
This approach is used whenever we are working with memory or time constraints, and when an average case solution is acceptable. Randomized algorithms are used in the following fields:
Graph algorithms – Finding shortest paths, minimum cuts and minimum spanning trees Parallel and distributed computing – Distributed consensus, Deadlock avoidance Number theoretic algorithms – Primality Testing Data structures – Computational geometry, sorting and searching Mathematical programming – Faster linear programming algorithms Algebraic identities – Matrix identities and verification of polynomial Enumeration and counting – Matrix permanent counting combinatorial structures. Load balancing Sorting Probabilistic existence proofs - Show that a combinatorial object emerges with non-zero probability among objects drawn from an appropriate probability space.
The classification of Randomized algorithms is in two categories a.
Las Vegas: Las Vegas algorithms always give the best results possible. Time complexity of these algorithms is based on a random value and time complexity is assessed as expected value. As an example, Randomized quicksort always sorts an input array and quicksort’s estimated worst case time complexity is O (nlogn). b. Monte Carlo: Produce optimal or effective result with a certain degree of probability. Since the running times of these algorithms are deterministic, and determining their worst-case time complexity is typically easier. For example, Karger’s algorithm produces minimum cut with probability greater than or equal to 1/n2 (n is number of vertices) and has worst-case time complexity as O(E) (E being the number of edges). The Fermet Method for primality testing is another example.
Introduction to the Design of Algorithms
9
1.7. Backtracking Algorithm A backtracking algorithm is a problem-solving algorithm that finds desired output by using a brute force approach. In Brute force method all the possible solution are tried and then select the best desired solutions. As the name backtracking says that if the current solution is inadequate, then backtrack and try another solution. Thus, this approach uses recursion method. When multiple solutions occur in a case then this approach is used to solve a problem. The pseudo code of the algorithm is shown below BT (n) If (is not a solution) return false; If (is a new solution) add to list of solution; BT (expand n) Backtracking is an algorithm to solve the problem recursively. In this approach there are no of solutions/options and among them one of the best solutions is chosen. For instance: Let us take the following space state tree. Start for X which is the starting point of the problem. Then move to B1 to find a solution from intermediate point A1 and we see that B1 is not a feasible solution. Therefore, as the name suggests we backtrack and go back to X via B1, A1 and then again try to find out optimal/feasible solution by another path that we take X → A2 → B2 and again we find that B2 is not a feasible solution. So, we will again go back to A2 via B2 then go back to X and again we move to next path and ultimately reach out to feasible solution B3 by the following the path X → A3 → B3 as shown in Figure 1.4 below: Here, summarizes the steps of backtracking algorithm: 1. Initially, select one possible path and try to move forward towards finding a feasible solution. 2. If we will reach a point from where we can`t move towards the solution then we will back track. 3. Again, we will try to find out feasible solution by following other possible paths.
10
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Figure 1.4. Steps of backtracking approach.
Some of the applications of backtracking approach are:
To find the Hamilton paths present in a graph. Maze solving problem To solve the N-queen problem Subset sum problem Word break problem The knight tour problem
Chapter 2
Divide and Conquer This is a strategy for solving a problem. If a problem (say P) of some size (say n) is given and if this problem is large, then this problem can be broken down into smaller sub-problems (P1…. Pk). These sub-problems are then solved individually to obtain their solutions. The solutions of all sub-problems are combined to get a solution for the main problem (S) as shown in Figure 2.1 below.
Figure 2.1. A general overview of Divide and Conquer strategy.
The Divide and Conquer strategy say that if a problem is large, divide that problem into sub-problems, solve those sub-problems and combine the solutions of sub-problems to get the solution for main problem. Thus, if a problem is large break, it into sub-problems and solve it. If the sub-problem is again large then apply the same strategy of Divide and Conquer on subproblems and so on. While dividing a problem, one important thing to note
12
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
down is the sub-problems must be same as the problem. For example, if a problem is to sort a given numbers and the numbers are very large, then as per the Divide and Conquer strategy it can be divided into sub-problems which itself are to sort the numbers (smaller numbers). The sub-problems must not be different task. Thus, if a problem is sorting a number, its sub-problem is also sorting a number i.e., if a problem is too big it is solved recursively. The important points to be remembered while applying Divide and Conquer strategy on any problem are as given below: a.
While breaking a problem, the sub-problems must also be same then only divide and conquer can be applied. b. There must be a method for combining the solutions of sub-problems to get the solution for the main problem. The general method for divide and conquer strategy is given below: DAC (P) { If (small (P)) { S (P) } else { Divide P into P1, P2, P3… Pk Apply DAC (P1), DAC (P2), DAC (P3)… DAC (Pk) Combine DAC (P1), DAC (P2), DAC (P3)… DAC (Pk) } }
2.1. Recurrence Relation Consider a function given below: void test (int n) { If (n > 0) {
Divide and Conquer
13
Printf (%d, n); Test (n-1); } } The tracing tree (recursive tree) for this recursive function when n=3 is shown in Figure 2.2 below.
Figure 2.2. A general overview of solution to a recurrence relation.
The amount of work done by the function is printing a value for three times and calling itself for four times. Thus, for the value n the total number of calls is (n+1) and n times printf is executed. Therefore, the time complexity for the above recursive function is 𝑂(𝑛). The recurrence relation for the above function: Let us assume the time taken for this function is T(n) void test (int n) { if (n > 0) { printf (%d, n); test (n-1); } }
T(n)
1 T(n-1)
The time taken to check the condition if (n > 0) will be constant and can be taken as constant c or 1.
14
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Therefore, the total time taken for this function is: 𝑇(𝑛) = 𝑇(𝑛 − 1) + 1 The recurrence relation for this function is: 1𝑛 =0 𝑇(𝑛) = { 𝑇(𝑛 − 1) + 1 𝑛 > 0
2.2. Binary Search Binary search is a searching algorithm used to find out position of an element in a sorted array using the approach of divide and conquer. In order to understand how binary search use divide and conquer strategy we assume a given list of elements A [15] in sorted order. Array A [15]
In order to perform binary search, we use two pointers low and high (L and H) pointing to the first and the last element respectively. If we want to search an element 67 in the array (key element), we iteratively find the mid element and compare with the key element. If the key element is smaller than the mid element, we proceed our search in the left part and if it is greater than the mid element, we continue the search in the right half of the sorted array. If the key element is greater than element at Mid, change pointer L as (Mid+1) and if key element is smaller than element at Mid, change pointer H as (Mid1). This searching by splitting the array and adjusting the two pointers continues till we have mid element same as key element illustrated as below. Key= 67 ⌊𝐿+𝐻⌋
Iteration-I: L
H
𝑀𝑖𝑑 =
1
15
(1+15)/2=8
2
Divide and Conquer
15
Key > A [Mid] => Yes ⌊𝐿+𝐻⌋
Iteration-II: L
H
𝑀𝑖𝑑 =
9
15
(9+15)/2=12
2
Key < A [Mid] => Yes Iteration-III:
⌊𝐿+𝐻⌋
L
H
𝑀𝑖𝑑 =
9
11
(9+11)/2=10
2
Key > A [Mid] => Yes Iteration-IV:
⌊𝐿+𝐻⌋
L
H
𝑀𝑖𝑑 =
11
11
(11+11)/2=11
2
16
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
In the fourth comparison we get the key element. Thus, the algorithm for binary search is as below: int BSearch (A, n, key) { L=1, H=n; while (L < = H) { Mid=(L+H)/2; if (key==A[Mid]) return Mid; if (key < A[Mid]) H = Mid – 1; else L = Mid + 1; } return 0; } The tree for this example of binary search is as below in Figure 2.3.
Figure 2.3. Tree for Binary Search Example Problem.
The maximum comparison for searching any element in the sorted array depends on the height of tree formed. The time taken by binary search is log (n), where n is the total number of elements present in the sorted array A.
Divide and Conquer
17
2.3. Merge Sort Merge sort also use the strategy of divide and conquer to sort the elements and it is a recursive algorithm. It divides a given list of elements into sub list by taking the mid position of the array and dividing the array from that mid position and merge the sub lists into one sorted list. MergeSort (L, H) { if (L < H) { Mid = (L+H) / 2; MergeSort (L, Mid); MergeSort (Mid+1, H); Merge (L, Mid, H); } } In order to understand the logic of merge sort let us consider an example of an array consisting of eight elements and need to be sorted using divide and conquer strategy of merge sort. Array A [8]
The tree formed by using the algorithm of merge sort is as given below in Figure 2.4.
Figure 2.4. Tree for merge sort example problem.
18
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
When we reached a list with single element then it means a list with single element is sorted. Then merging of elements of two single lists is done in backward direction. Thus, when a single array is given, recursively it will be divided into sub array or sub list to reach sub list with one element. Then merging is done in backward direction. The time complexity of merge sort is 𝜃 (𝑛 log 𝑛).
Chapter 3
Greedy Algorithms Greedy method is also an approach for solving a problem. It is used for solving an optimization problem which requires the result to be either maximum or minimum. It follows local optimum choice at each stage with intend to find the global optimum. As for example if we have n number of options to reach a particular destination, then at each stage we need to select the optimal value and move to the next stage as shown in the graph given below in Figure 3.1.
Figure 3.1. A general overview of greedy approach.
Out of all the solutions, the solutions which satisfy the given constraints or conditions are called the feasible solution. For example, if the problem is to travel from S to D, there are many options to reach the destination D. However, if any constraint is imposed like the distance should be covered within 45 minutes, then those set of solutions in which the distance is covered within 45 minutes are called as feasible solution.
20
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Out of the set of feasible solutions, there is one optimal solution. For example, if we say that while reaching from S to D, it should incur minimum cost, then out of all three feasible solutions i.e., S3, S4 and S5 if S4 incurs minimum cost, then S4 is the optimal solution of the above problem. Hence, for any problem, there can be more than one solution, more than one feasible solution but there can be only one optimal solution and the problem which requires either minimum or maximum result then, that type of problem is called as optimization problem.
3.1. Job Sequencing Problem with Deadline This problem uses greedy method to solve when n number of jobs are given with associated deadline and profit. The problem is to find out a set of jobs which give maximum profit while satisfying the associated deadline. Suppose there are a given set of jobs (say n jobs) associated with some deadline and profit as given below. 1. n jobs 2. deadline 3. profit
J1, J2, J3, ……………, Jn associated with d1, d2, d3, ………….., dn and p1, p2, p3, …………..., pn
If job completes before deadline, then profit will be earned. The objective of job sequencing problem is to earn maximum profit when only one job will be scheduled or processed at any given time. The pseudo code for job sequencing algorithm with deadline is given below (here n is the total number of jobs and dmax is the maximum deadline): for i to n do set k = min (dmax, deadline(i)) while k >= 1 do if timeslot[k] is empty then timeslot[k] = job[i] break end if set k= k-1 end while end for
Greedy Algorithms
21
Example: suppose there are 5 jobs associated with given profit and deadline. n=5 Jobs: Profit: Deadline
J1 20 2
J2 15 2
J3 10 1
J4 5 3
J5 1 3
Solution of the above example using Greedy method: Suppose each job requires one unit of time to complete. Then there are three slots of time in which the above five jobs should be processed to achieve maximum profit. So, this problem is a maximization problem. J2 0
J1 1
J4 2
3
Table 3.1. Job sequencing problem Job considered ------J1 J2 J3 × J4 J5 ×
Slot assigned ------[1,2] [0,1] [1,2] [0,1] [1,2] [0,1] [1,2] [2,3] [0,1] [1,2] [2,3]
Solution ᶲ J1 J1, J2 J1, J2 J1, J2, J4 J1, J2, J4
Profit 0 20 20+15 20+15 20+15+5 20+15+5 = 40
Hence, the solution to the problem is [J1, J2, J4] or [J2, J1, J4] with maximum profit of 40.
3.2. Dijkstra Algorithm This algorithm is for single source shortest path problem. It uses greedy method to solve. If a weighted graph is given, then we need to find out the shortest path from a given starting vertex to all other vertices. In the above graph we must find out the shortest path from vertex 1 to all other vertices. This is a minimization problem because we must find out shortest path and thus it is an optimization problem. Optimization problem is solved using greedy method. It says that a problem can be solved in stages by
22
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
taking one stage at a time and taking one input at a time to get the optimal solution. In greedy method there are predefined procedures which we follow to get the optimal solution. Dijkstra algorithm is one of the procedures of greedy method to get an optimal solution which is the minimum result of shortest path. Dijkstra algorithm can work on directed as well as non-directed graph. The Dijkstra algorithm always select a vertex with shortest path and then it will update the shortest path with other vertices if there is possibility of doing so. This updating is called as relaxation. The algorithm for relaxation is as follows: Relaxation: if (d(u) + c(u, v) < d(v)) then d(v) = d(u) + c (u, v) It means that if the summation of distance of u and the cost of vertices u to v is less than the distance of v, then updating is made as distance of v is equal to the summation of distance of u and cost of vertices u to v. Dijkstra algorithm works only if positive weights are given in graph not for negative weights. Albeit it works for both directed as well as undirected graph to find the shortest distance between two points. We can understand this with the help of given problem T. Example 1:
Figure 3.2. Dijkstra’s problem.
Greedy Algorithms
23
To solve the above problem, we form a table as given below. The row of this table indicates all the vertices in the graph from A to I. Initially taking the source vertex as A, the distance of A to A is 0 and the distance of A to all other vertices are considered as ∞. The row of this table indicates the smallest distance which is 0 for the first row which is for vertex A. From vertex A vertices B and C are directly connected. So, the distance to reach B and C are calculated from A with the help of relaxation algorithm as discussed above. For calculating distance from A to B i.e., d(B) = 0 + 5 = 5 which is less than ∞. So, as per relaxation rule if the new distance is less than the previous one, then it will be replaced with the minimum distance which in this case is 5 for vertex B. Similarly, for vertex C, the distance becomes 2. Since, no other vertices are directly reachable from A, the distance ∞ remains same in the next row. Again, we find the minimum distance out of 5, 2 and ∞ which is 2 for vertex C. So, the next vertex C is selected. From C we can reach to vertices C as well as G. The new cost to reach to B is 2+2=4, which is less than previous cost 5 so, as per relaxation 5 is replaced by 4. And the cost to reach to vertex G is 2+9= 11 which is less than ∞ so, again as per relaxation ∞ is replaced by 11. No other vertices are accessible from C so, the cost remains same. Again, we select the minimum distance out of 4, ∞ and 11 which is 4 for vertex B. So, vertex is selected next. This process continues till we reach the last vertex i.e., I and we get the table as below. The vertex once selected should not be considered in the next iteration. Table 3.1. Dijkstra’ problem (example 1)
Source vertex -> A.
24
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
By seeing the above table, shortest path and the shortest distance can be calculated. The shortest distance can easily be predicted from the table like the distance from A to A is 0 A to B is 4 A to C is 2 A to D is 7 A to E is 9 A to F is 16 A to G is 11 A to H is 13 A to I is 20 The shortest path is calculated from the table. For example, the shortest path from starting vertex A to I need to find out. For this we start from I, we go one step further i.e., in row above it, 20 is changed to ∞ so in that row the shortest value is 16 for the vertex F. Above 16 the value changed from 16 to 17, so in that row again we find the shortest value which is 13 for vertex H. Above 13, again there is a change in value from 13 to 16, so in that row the smallest value is 11 for vertex G. Above 11 the value remains same and it is changing in the first third row of the table from 11 to ∞. So, as we see whenever there is a change in value, we find the smallest value in that row and its respective vertex. So, here also the smallest value is 2 for vertex C hence, C is considered. And finally, vertex A is considered. In this way the shortest path from vertex A to I becomes A→C→G→H→F→I. Example 2: Let us consider the graph below to find out the shortest path using Dijkstra algorithm.
Figure 3.3. Dijkstra’s problem.
Greedy Algorithms
25
To find the shortest path and shortest distance, following Table has been created taking all the vertices. Table 3.2. Dijkstra’ problem (example 2)
Source vertex -> A.
The shortest distance can easily be predicted from the table like the distance from A to A is 0 A to B is 8 A to C is 5 A to D is 9 A to E is 7 And the shortest path from A to B is A→C→B (5+3=8) A to C is A→C (5) A to D is A→C→B→D (5+3+1=9) A to E is A→C→E (5+2=7)
Chapter 4
Dynamic Programming Dynamic programming is used to solve optimization problem and uses the principle of optimality. It divides the problem into several stages & solve the problem at each stage i.e., dynamic programming is used when the problem can be solved in stages. Here, we usually use a backward recursive approach where we make decisions from the end stage rather than from beginning stage. It is an algorithm designing method that can be used when the solution to a problem can be viewed as the result of sequence of decisions. In this method we consider all the possible solutions and select the best solution that is the optimal solution. It is not a deterministic approach of problem solving. So, dynamic programming applies to optimization problem in which we make a set of choices in order to arrive at optimal solution. It is a way of making our algorithm more efficient by storing some of the intermediate results. We illustrate the concept of dynamic programming, the idea of recursive approach and the terminology using an example called as the Stage Coach Problem. The Stage Coach Problem is essentially the problem of finding the shortest path from a given source to a destination, which is nothing but the shortest path problem.
Figure 4.1. The Road System and distance for The Stagecoach Problem.
28
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
4.1. Stagecoach Problem This is the problem where we can easily decompose it into several stages. We use this problem to find the concept behind dynamic programming and apply the backward recursive approach for this. In the above Figure 4.1, there are several cities numbering from 1 to 10 are given. The cities are interconnected and the distance between them is also given. We assume that to reach the destination city i.e., city 10, the person should start from city 8 or 9. We now find the best i.e., least cost path to reach city 10, if we were at city 8 or 9. To reach city 10, a person has to either start from city 8 or 9 and go to city 10. The node or place where a person is, is called the state of the system. Given the state of the system we try to find out the best decision that the person must make at that stage to go from one stage to another stage as given below. n = 1 (One more stage to go): S 8 9
x1 10 10
f1(s) 8 9
x1* 10 10
In the above table s represents the state variable. The person can be either at state 8 or 9. The variable x1 represents the decision variable. The variable f1(s) is the distance function i.e., the distance a person will cover to reach from s to x1. Finally, the variable x1* represents the best decision. For example, if a person is at city 8, he would have only one choice to reach destination city 10 which is the best decision and if he is at 9 then also one choice to reach to 10. Thus, 𝑓1(𝑠, 𝑥1 ) = 𝑑s, 𝑥1 𝑓1∗ (s) = 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓1(𝑠, 𝑥1 ) n = 2 (Two more stages to go) s 5 6 7
x2 8 6+8=14 9+8=17 5+8=13
9 8+9=17 7+9=16 7+9=16
f2*(s)
x2*
14 16 13
8 9 8
Dynamic Programming
29
𝑓2(𝑠, 𝑥2 ) = 𝑑s, 𝑥2 + 𝑓1∗ (𝑥2 ) 𝑓2∗ (s) = 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓2(𝑠, 𝑥2 ) x2* is the corresponding value of x2. n = 3 (Three more stages to go) s 2 3 4
5 4+14=18 8+14=22 4+14=18
x3 6 7+16=23 10+16=26 5+16=21
f3*(s)
x3*
18 18 18
5 7 5
f4*(s)
x4*
23
2, 3
7 8+13=21 5+13=18 7+13=20
𝑓3(𝑠, 𝑥3 ) = 𝑑s, 𝑥3 + 𝑓2∗ (𝑥3 ) 𝑓3∗ (s) = 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓3(𝑠, 𝑥3 ) x3* is the corresponding value of x3. n = 4 (Four more stages to go) s 1
2 5+18=23
x4 3 5+18=23
4 6+18=24
𝑓4(𝑠, 𝑥4 ) = 𝑑s, 𝑥4 + 𝑓3∗ (𝑥4 ) 𝑓4∗ (s) = 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓4(𝑠, 𝑥4 ) x4* is the corresponding value of x4. The shortest distance is 23 for the path 1→2→5→8→10 and 1→3→7→8→10. Hence, there are two paths which have the shortest distance equals to 23.
4.2. Optimal Binary Search Tree (Optimal BST) A binary search tree is a binary tree which is one of the most important data structures in computer science. A binary search tree is nothing but a binary
30
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
tree with special arrangement of the keys inside each node. Such tree has the following properties: a.
The left sub-tree of a node has a key less than or equal to its parent node’s key. b. The right sub-tree of a node has a key greater than its parent node’s key. For example, given the keys = {25, 23, 14, 36, 29, 19, 52}, if we arrange the keys in individual nodes from left to right it is the binary tree as shown in the Figure 2.2. In the second figure (Figure 2.3) the keys are arranged in a particular pattern. From the root node everything in its left sub-tree should have values smaller than the root node while all the values in the right subtree are larger than the root node and this should be followed for every subtree. This is binary search tree.
Figure 4.2. Binary tree.
Figure 4.3. Binary search tree.
Dynamic Programming
31
The cost of searching any key is dependent on comparisons required for searching any key element in the tree. For the keys {10, 20, 30}, the different binary trees possible for three keys are as given in Figure 4.4 (for n keys, number of binary trees possible = 2nCn/n+1):
Figure 4.4. Different binary trees possible for 3 keys.
Key 10 20 30 Avg.
C 1 2 3 6/3=2
Key 10 20 30 Avg.
C 1 3 2 6/3=2
Key 10 20 30 Avg.
C 2 1 2 5/3=1.66
Key 10 20 30 Avg.
C 3 2 1 6/3=2
Key 10 20 30 Avg.
C 2 3 1 6/3=2
* C= Number of comparisons.
Figure 4.5. Possible binary trees using three keys and their average cost of searching.
In the above Figure 4.5, we could see that for every successful search the number of comparisons required is the level at which that key is present. The average searching cost of third tree is 1.66 which is minimum among others. This is so because the third tree is a balanced tree with less average comparisons and less height.
32
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Optimal binary search tree is a binary tree also called as weight balanced binary tree which gives smallest possible search time for a given sequence of accesses. For example, if the frequency of searching the keys is as given below: Keys Frequency of searching
10 3
20 2
30 6
Figure 4.6. Cost of searching for different binary trees with three keys and given frequencies.
In the above Figure 4.6, the total cost of searching is calculated with the given frequencies. The last tree has the minimum cost of searching which implies that it is the optimal binary search tree. Although the third tree is the height balanced tree, but its searching cost is 20. However, based on the frequencies, the fifth tree gives the minimum cost of 18 albeit it is not a height balanced tree. This is optimal binary search tree with highest frequency key as the root node. The time taken in searching a binary search tree is log n. So, the problem is, if the keys are given along with their frequencies how to find out which tree organizations is optimal based on their frequencies? This problem of finding optimal binary search tree is solved using dynamic programming as shown in the example below. Example: A set of four keys and their frequencies are given as below: Key number Key element Frequencies for searching
1 10 4
2 20 2
3 30 6
4 40 3
Dynamic Programming
33
Figure 4.7. Table for Constructing an Optimal Binary Search Tree Using Dynamic Programming for the Given Problem.
At first, we must find those values for which (j-i) = 0 i.e., the diagonal cells of the table given in Figure 4.7 and find their respective cost as below. i.e., j-i = 0. 0-0=0 1-1=0 2-2=0 3-3=0 4-4=0
C [0, 0] C [1, 1] C [2, 2] C [3, 3] C [4, 4]
For (j-i) = 0, it means zero element is considered for finding cost. So, all values are zero for the diagonal elements of (j-i) = 0 because there are no such elements. Now, find the next upper diagonal for which (j-i) = 1. i.e., j-i =1 1-0=1 2-1=1 3-2=1 4-3=1
C [0, 1] C [1, 2] C [2, 3] C [3, 4]
34
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al. Key number Value Frequency (Cost)
C[0, 1] 10 4
C[1, 2] 20 2
C[2, 3] 30 6
C[3, 4] 40 3
Similarly, we proceed for j-i = 2 i.e., j-i =2 2-0=2 3-1=2 4-2=2
C [0, 2] C [1, 3] C [2, 4]
Since here (j-i) value is 2, it means we must consider two keys to find the optimal cost. For example, to find cost C [0, 2] ignore the value 0 and the second value 2 means start from 1 and go to 2 i.e., two keys, key 1 with value 10 and cost/frequency = 4 and key 2 with value 20 and cost/frequency = 2 to be considered to find the optimal cost in between them i.e., we need to find out what is optimal cost binary search tree generated by using two keys only. By using two keys, the possible binary search trees are as follows:
The possible BST with their costs is shown in Figure 4.7.
Figure 4.8. Possible binary search trees with their cost for two keys {10, 20}.
Dynamic Programming
35
Thus, minimum cost is 8 for key number 1. So, first key becomes root with cost 8. This value is put in the table given in Figure 4.8 in cell (0, 2) as 81 implies minimal cost is 8 for root node 1. Similarly, find the cost C [1, 3] (two keys, key 2 with value 20 and cost/frequency = 2 and key 3 with value 30 and cost/frequency = 6) as,
The possible BST with their costs is shown in Figure 4.9.
Figure 4.9. Possible binary search trees with their cost for two keys {20, 30}.
Thus, minimum cost is 10 for key number 3. So, third key becomes root with cost 10. Similarly, find the cost C [2, 4] (two keys, key 3 with value 30 and cost/frequency=6 and key 4 with value 40 and cost/frequency=3) that is,
The possible BST with their costs is shown in Figure 4.10.
36
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Figure 4.10. Possible binary search trees with their cost for two keys {30, 40}.
Thus, minimum cost is 12 for key number 3. So, third key becomes root with cost 12. In this way specified diagonal is filled up by taking two keys. Similarly, we fill up the next above diagonal by considering three keys at a time. We proceed for j-i = 3 i.e., j-i =3 3-0=3 4-1=3
C [0, 3] C [1, 4]
Since here (j-i) value is 3, it means we must consider three keys to find the optimal cost. Find the cost C [0, 3] (three keys, key 1 with value 10 and cost/frequency=4, key 2 with value 20 and cost/frequency=2 and key 3 with value 30 and cost/frequency= 6) that is,
The possible BST with their costs is shown in Figure 4.11.
Dynamic Programming
37
Figure 4.11. Possible binary search trees with their cost for three keys {10, 20, 30}.
Thus, minimum cost is 20 for key number 3. So, third key becomes root with cost 20. Similarly, find the cost C [1, 4] (three keys, key 2 with value 20 and cost/frequency=2, key 3 with value 30 and cost/frequency=6 and key 4 with value 40 and cost/frequency= 3). By using three keys, the possible binary search trees with their costs are shown in Figure 4.12.
Figure 4.12. Possible Binary Search Trees with Their Cost for Three Keys {20, 30, 40}.
Thus, minimum cost is 16 for key number 3. So, third key becomes root with cost 16. Finally, to fill the last diagonal with only one cell we have to consider four keys. We proceed for j-i = 4 i.e., j-i =4 4-0=4
C [0, 4]
38
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
We can find the root node directly by using a formula so that there’s no need to draw all possible binary search trees we have drawn in above cases given as given below: 𝑗
𝑜𝑝𝑡𝑐𝑜𝑠𝑡(𝑖, 𝑗) = ∑ 𝑘=𝑖
𝑗
freq[k] + 𝑚𝑖𝑛𝑟=𝑖 [𝑜𝑝𝑡𝑐𝑜𝑠𝑡(𝑖, 𝑟 − 1) +
𝑜𝑝𝑡𝑐𝑜𝑠𝑡(𝑟 + 1, 𝑗)]
By directly applying the above formula we get,
Hence, minimum cost is 26 for key number 3. So, third key becomes root with cost 26.
Figure 4.13. Optimal binary search tree with minimal cost of searching.
Dynamic Programming
39
Now, we can generate the optimal binary search tree from the data filled in table given in Figure 4.6. For this, the vertex with cell (0, 4) calculated above in the last stage becomes the root node that is the third key with cost 26 becomes the root node as shown below in Figure 4.13. The algorithm for optimal binary search tree is given below: Algorithm: Optimal Binary Search Tree (Optimal BST) // Find an optimal binary search tree using dynamic programming // Input: An array P[1..n] of search probabilities for a sorted list of n keys //Output: Average number of comparisons in successful searches in the optimal BST and table R of sub-tree’s roots in the optimal BST for i ← 1 to n do C [i, i-1] ← 0 C [i, i] ← P[i] R [i, i] ← i C [n+1, n] ← 0 for d ← 1 to n-1 do j←i+d minval ←∞ for k ← i to j do if C [i, k-1] + C [k+1, j] < minval minval ← C [i, k-1] + C [k+1, j]; kmin ← k R [i, j] ← kmin Sum ← P [i]; for s ← i+1 to j do sum ← sum + P [s] C [i, j] ← minval + sum return C [1, n], R
4.3. Subset Sum Problem In this problem, a set of integer elements is given. We need to discover a subset of the given set whose sum is equivalent to the given sum value. For example, the approach of subset sum problem consists of, Input: This algorithm takes a set of numbers, and a sum value. The Set: {10, 7, 5, 18, 12, 20, 15} The sum Value: 35
40
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Output: All possible subsets of the given set, where sum of each element for every subsets is same as the given sum value, That is: {10, 7, 18} {10, 5, 20} {5, 18, 12} {20, 15} Here, a set will be given, and we have to find out a subset of that set. The sum of the subset would be a specific number which is given. For example, let A be a set of n non-negative integer numbers. Find a subset X of set A such that the sum of all elements of X = S, where S is the sum of subset given as an input. For example, if A = {1, 2, 5, 9, 4} and Sum (S) = 18 Now we have to find out if there is any subset of A whose sum is 18. There are two methods to solve subset sum problem. The first method is the Brute Force Method. In this we need to find out the subset of the given set and then check if there is any subset whose sum is equal to the number given in the question. It seems to be easy but it will be too much complex as the number of elements given in the set is increased. In our example given above when the number of elements given in set A is 5, the number of attempts needed in Brute Force Attack will be 25 which will increase as the number of elements in A increased. So, this is not a good approach to solve subset sum problem and we go for the second approach of using dynamic programming to solve this problem. Example: A = {2, 3, 5, 7, 10} Sum(S) = 14 For this draw a table as given below. In this table the column indicates all the elements given in the set or array and row indicates sum from 0 to14 for the solutions of sub-problems. Now, we fill each cell of elements and their respective sum. If the combination of elements in the array yields the sum value, we put True (T) otherwise False (F). In the column with sum 0, we have all True. Next, for getting sum = 1 and we have only one element in the array
Dynamic Programming
41
i.e., 2, it is not possible to get sum 1 from only one element with weight 2. So fill the cell with False. Next, in order to get sum =2 and we have one element in the array that is 2, then it is possible to get the sum 2. So, fill that cell with True. In this way we have completed the Table 4.1 as below. Table 4.1. Binary matrix/table for subset sum problem
Finally, since in the last cell we got a value True, it means that we can say that there is a subset in the original set which sums to 14. Now, what is that subset? We can get that subset from the table we have drawn above. For this we have to trace back from the last cell and go back to find the subset. We get True in the last cell coming from True in the cell just above it corresponding to the set element 7, so in the subset we first include the value 7 i.e., {7}. The True value in the row 7 and column 14 is coming by moving 7 cells left from the cell above it corresponding to the value 5 in the given set. So, the next number in the required subset is 5 i.e., {7, 5}. Similarly, the True value in the cell with element value 5 and sum value 7 is coming because of the True value in the cell with element value 3 and sum value 2 which is ultimately coming because there is a True in its above cell corresponding to the set value 2. So, lastly, we include 2 in our required subset which finally looks like {7, 5, 2}. Thus, the required subset of the original set whose sum is equal to 14 is {7, 5, 2}. Time complexity: In the Brute Force method of subset sum problem, the time complexity will be 2n. However, if we are using Dynamic Programming for subset sum problem, the time complexity is θ (n × sum), where n is the number of elements given in the original set.
42
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
4.4. 0/1 Knapsack Problem In such problems, some items with their weights and profits are given. A knapsack with specified size is given to store the items. Now items are selected and put inside knapsack in such a way to achieve the maximum profit. There are two types of knapsack problem i.e., 0/1 knapsack problem and fractional knapsack problem. In 0/1 knapsack the items should be picked completely as a whole i.e., either we pick an item completely or we can leave that item. On the other hand, in fractional knapsack problem we can divide the item. In other words, we can say that in fractional knapsack problem some parts of an item can be picked and other can be left i.e., it is not mandatory to pick the item. The 0/1 knapsack problem is solved with dynamic programming method and fractional knapsack problem is solved with greedy method. In this chapter we are concentrating on how to solve 0/1 knapsack problem using dynamic programming concept. Example: suppose we have four objects the weights of these objects as well as their corresponding profits are given as: Weights = {3, 4, 6, 5} Profits = {2, 3, 1, 4} And there is a bag of capacity say w = 8. Now the objective is to fill this bag with the given objects. We can’t put all the objects in the bag because of limited capacity of the bag. So, we have to carry some of those objects in such a way that the total profit incurred is maximum. We are going to solve this problem using dynamic programming approach where we divide the complicated problem into sub-problems. We solve the sub-problems and use their result in ultimately finding the solution of the complicated problem i.e., we are not going to solve the complete problem at once rather than we are going to divide the problem into sequences of stages as we have seen in above examples. In order to solve the above 0/1 knapsack problem, we draw a matrix/table as shown below in Table 4.2. For example, to find the value of the cell M (4, 7), the above formula is applied as follows: M [4, 7] = Max {(M[3, 7]}, {M[3, 7 − 6]) + 1} = Max {5, 1} =5
Dynamic Programming
43
Table 4.2. Binary matrix/table for 0/1 knapsack problem
Thus, the maximum profit value we get is in the last cell i.e., [4, M 7] = 6. Now our aim is to find out the actual items that have been picked with total of weight less than or equal to the maximum knapsack capacity i.e., 8. For this we need to trace back from the solution got in the last stage like we have done in subset sum problem. In the last cell value 6 is coming from its top cell and so the last row item with weight 6 is not selected. Now, 6 in the last row is not coming from its above cell so, the third item with weight 5 is selected. Now we go up and move 5 (i.e., the weight of the current item) steps left and we reach cell (2, 3) with value 2. Since this value is coming from its above cell, we would not include item of that particular row. Now the value 2 is coming directly from its above cell with item number 1 and respective weight 3. So, it will be included in the knapsack. Thus, the weights of the items selected is {5, 3} giving the maximum profit 6.
Chapter 5
Backtracking Backtracking is a problem-solving approach that uses brute force approach. Brute force approach says that, for any given problem try all possible solutions and select the desired solution. Backtracking is used when there are multiple solutions, and we want to pick up one of the best solutions. The problems that are solved using backtracking poses some constraints. Those constraints need to be checked by each solution and that solutions will be accepted which satisfy the constraints. To understand the concept of backtracking, the best example is let us consider there are three students two boys and a girl (B1, B2 and B3). The problem here is in how many ways these students can be arranged to sit on three chairs. The number of possible ways three students can occupy three chairs is 3! The state space tree for the arrangements of three students can be drawn as below in Figure 5.1.
Figure 5.1. A general overview of backtracking approach.
So, the six possibilities in which two boys and a girl can sit is shown by the above space tree. Backtracking is used to solve such problems in which a sequence of objects is chosen from a specified set so that the sequence satisfies some criteria. It is a systematic method of trying out various sequences of
46
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
decisions until we find out the solution which satisfy the given constraint or criteria. If in the first sequence we did not get the solution that satisfy the constraints, then go back and check another sequence of solution and so on. In the above problem, when we impose the criteria or constraint as girl’s position should not be in between two boys, the state space tree for the possible solutions is as below in Figure 5.2.
Figure 5.2. A state space tree.
So, by imposing the constraint (bounding function) we get three possible solutions.
5.1. N – Queens Problem One of the important problems that is solved using backtracking is N-Queens problem. In this problem there is an NxN board is given and there are N-queens. We have to find the positions of these queens on the board so that they do not attack each other. In order to understand this, let us consider an example of 4x4 board and the 4 queens are positioned on this board in such a way so that they do not attack each other using backtracking. We say that two queens attack each other when they are on same rank, same file and diagonally. If a queen is placed at square (1, 2) as shown below. All the squares shown by dotted line will be under attack by this queen. All the squares which are under same row will be under attack by the queen i.e., those squares that are having same row value as queen’s row value ((1,0), (1,1) and (1,3)). Similarly, all the squares which are having same column as the queen’s column value are
Backtracking
47
under attack i.e., ((0,2), (2,2) and (3,2)). Also, we have two kinds of diagonals. One diagonal going from top left to bottom right and the second going from bottom left to top right. The first diagonal going from top left to bottom right consists of two squares ((0,1) and (2,3)). To check whether this is the diagonal
Row (1, 0) (1, 1) (1, 3)
Column (0, 2) (2, 2) (3, 2)
Diagonal1 (0, 1) (2, 3)
Diagonal2 (3, 0) (2, 1) (0, 3)
the squares of which are under attack, the formula is row-column. Any square whose row-column value is -1 will be on same diagonal as the queen and so those are also under attack. For the other diagonal we use the formula as row + column. Any square for which the sum of row and column is same as the sum of row and column value of the queen’s square value will be on the same diagonal and are under attack. This has been summarized in the table above. In order to understand this, we are now solving 4-Queens problem consisting of 4x4 board and 4 queens using backtracking. For 4x4 board the recursion is 4 level deep. On the 1st level of recursion, we have to place the first queen on the 1st row. On the 2nd level of recursion, the 2nd queen is placed on the 2nd row of the board in such a way that it doesn’t attack the first queen placed above it and so on. The important point to remember here is that at any time if we don’t find a square for a queen at a particular row, then we will return a false to its calling function. The calling function then find out another square for its queen on that row and if that does not work out, the function itself return false to its calling function above it for re-positioning of queen and so on. This is the idea of backtracking to solve the problem of N-Queen. So, the final position of 4 queens is (0,1), (1,3), (2,0) and (3,2).
.
48
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
In the above example we are taking 4 queens and so there are four level of recursion. At 0th level, the first queen is placed in the 0th row. The second queen is placed in the 1st row at 1st level so that she doesn’t attack the first
Backtracking
49
queen placed above it. Similarly, the third queen is placed at 2nd level in the 2nd row in such a way that she doesn’t attack the queens at 1st and 0th level and so on. Here, first queen goes to the first column of first row that is at position (0,0). Then using recursion, the second queen is placed in second row such that it does not attack the first queen. The second queen cannot be placed at first and second square of second row i.e., neither at (1, 0) not (1,1) which are under attack. So, the safe square for second queen is at (1, 2). The third queen is placed using recursion in the third row. In this row, the first position will be under attack by the first queen because they are in same column. The second position of the row is also under attack by the second queen because the difference of row + column for this position i.e., (2+1=3) is same as row + column of the queen above it i.e., (1+2=3) so this position is also under attack. The position (2, 2) is also under attack because of same column and the position (2,3) is again under attack (diagonal attack). So, the third queen cannot be positioned at any square of row2 and so it will return false to its previous calling function. The position of the second queen is shifted to next available square i.e., (1, 3). Again, recursion is used to find the position for the third queen at row2 and it is (2, 1). Now, the fourth queen position is searched in row3 and there is no position for the fourth queen in the last row as all positions are under attack either column attack or diagonal attack. So, recursively false is returned to its calling function which ultimately return false to its calling function. Recursively the position of the first queen is changed to next available square i.e., (0,1). The final position of all four queens is then recorded as (0, 1), (1, 3), (2, 0) and (3, 2).
Chapter 6
Branch and Bound In case of optimization problems, a feasible solution can be said to be a solution that satisfies the entire problem’s constraint in its specific search space. When the feasible solution of a problem gives the best possible value of the objective function, it can be termed as an optimal solution. For example, a feasible solution can be to find a Hamiltonian circuit in the Travelling Salesman Problem while an optimal solution can be to find the shortest Hamiltonian circuit in the Travelling Salesman Problem. Branch and Bound algorithm design technique extends the basic feature of the backtracking algorithm design approach (see Chapter 5) with additional two features. Firstly, it can provide, for every node of the state – space tree a bound on the best value of the objective function. The bound provided to the objective function should be a lower bound for a minimization problem and upper bound for the maximization problem. Second, branch and bound algorithms should be able to provide the value of the best solution which it has encountered so far. Using the above two available information, one can trace the bound value of a node with the value of the best solution so far. If on comparison, the bound value of a node is not better than the best available value, the node is not going to give any better result than the already available and hence can be pruned or terminated. For example, for a minimization problem the bound value is not smaller than the already available best solution seen so far and for a maximization problem, if the bound value is not greater than the already available best value, then the specific node in consideration can be pruned. Here we discuss two important optimization problems and how branch and bound design technique can solve these problems in an efficient way.
6.1. Assignment Problem These problems belong to the class of transportation problem wherein the main aim is to assign available resources to equal number of activities in such a way that the total cost of transportation is minimized, or the total profit is
52
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
maximized. Assigning of resources to activities is an important task in optimization engineering as the efficiency of the available resources such as humans or machines for performing a particular task is different and so the total cost or profit or loss is different for different activities. Some of the applications of these types of problems can be assignment of workers to machines or salesman to different sales area. Problem Definition: Suppose for performing n number of jobs, n number of persons is available, and each available person can perform each job at a time but with different efficiency. Let 𝑐𝑖𝑗 be the cost of assigning 𝑖 𝑡ℎ person to the 𝑗𝑡ℎ . The problem is to find an assignment (which job should be assigned to which person one on-one basis) so that the total cost of performing all jobs is minimum. The problem can be stated mathematically as an n x n matrix as shown in the following table:
The mathematical equation to represent an assignment problem is: Minimize the Total Cost 𝑍 = ∑𝑛𝑖=1 ∑𝑛𝑗=1 𝑐𝑖𝑗 . 𝑥𝑖𝑗 where, 𝑥𝑖𝑗 = 1, 𝑖𝑓 𝑖 𝑡ℎ 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖𝑠 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑗 𝑡ℎ 𝑗𝑜𝑏 = 0, 𝑖𝑓 𝑖 𝑡ℎ 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑗 𝑡ℎ 𝑗𝑜𝑏 Subject to the constraints
∑𝑛𝑖=1 𝑥𝑖𝑗 = 1, 𝑗 = 1, 2, 3, … … . , 𝑛
i.e., the 𝑖 𝑡ℎ 𝑝𝑒𝑟𝑠𝑜𝑛 does only one job for 𝑖 = 1, 2, 3, … … . , 𝑛
∑𝑛𝑗=1 𝑥𝑖𝑗 = 1, 𝑖 = 1, 2, 3, … … . , 𝑛
Branch and Bound
53
i.e., only one person should be assigned to the 𝑗 𝑡ℎ job for 𝑗 = 1, 2, 3, … … . , 𝑛, Assignment problem can be solved using many methods like Hungarian Method, Backtracking algorithm design approach and the Branch and Bound algorithm design approach. Below, we discuss the Branch and Bound technique to solve an Assignment Problem.
6.1.1. Branch and Bound Technique to Solve Assignment Problem The selection criterion for the next node in BFS and DFS is “blind.” This selection rule stated does not give any edge to a node which has a very good probability of obtaining the search to an answer node quickly. With the employment of “Intelligent” ranking function also called approximate cost function the search for an optimal solution can be expedite. This process of search, avoid search in sub- trees that do not contain an optimal solution. This is very much like BFS search but here instead of employing FIFO order, a live node with least cost is employed. With this process, we may not reach an optimal solution, but it will definitely give a good probability of obtaining a search to an answer node quickly. Following are the approaches to calculate the cost function: 1. To every single worker, we will assign a job with minimum cost from the list of unassigned jobs. 2. To every single job, we will assign a worker with least cost for that particular job from the list of unassigned workers. With an example we will try to get a least cost when job 2 is assigned to worker A.
A B C D
Job 1 9 6 5 7
Job 2 2 4 8 6
Job 3 7 3 1 9
Job 4 8 7 8 4
As Job 2 has been assigned to worker A (green), cost incurred is 2 and job 2 and worker A becomes unavailable (red).
54
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
A B C D
Job 1 9 6 5 7
Job 2 2 4 8 6
Job 3 7 3 1 9
Job 4 8 7 8 4
Now, we will assign job 3 to worker B as it has least cost from the list of unassigned jobs. Cost has become 2+3=5 leading to job 3 and worker B unavailable.
A B C D
Job 1 9 6 5 7
Job 2 2 4 8 6
Job 3 7 3 1 9
Job 4 8 7 8 4
Job 1 has been assigned to worker C as it has least cost among the list of unassigned jobs and ultimately job 4 has been assigned to worker D as it is only the job left. Total cost now becomes 2+3+5+4= 14.
A B C D
Job 1 9 6 5 7
Job 2 2 4 8 6
Job 3 7 3 1 9
Job 4 8 7 8 4
6.2. O/1 Knapsack Problem 0/1 Knapsack Problem is also one of the most discussed optimization problems where in one is required to maximize the total weight in a Knapsack or bag of weight W by putting n items of different weights and values into the Knapsack. Two examples of the 0/1 Knapsack Problem are the allocation of an advertising budget to the promotions of individual products and the allocation of an individual’s effort to the preparation of final exams in different subjects. Problem Definition: Suppose, we are given the following parameters:
Branch and Bound
55
𝑤𝑘 = 𝑡ℎ𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑘 𝑡𝑦𝑝𝑒 𝑜𝑓 𝑖𝑡𝑒𝑚 𝑓𝑜𝑟 𝑘 = 1, 2, 3, … … … … . , 𝑛 𝑣𝑘 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑤𝑖𝑡ℎ 𝑒𝑎𝑐ℎ 𝑘 𝑡𝑦𝑝𝑒 𝑜𝑓 𝑖𝑡𝑒𝑚 𝑓𝑜𝑟 𝑘 = 1, 2, 3, … … … … . , 𝑛 c = the capacity of the Knapsack
Then, mathematically the 0/1 Knapsack Problem can be represented as: Minimize the Total Weight 𝑍 = ∑𝑛𝑘=1 𝑣𝑘 . 𝑥𝑘 Subject to the constraint
𝑍 = ∑𝑛𝑘=1 𝑣𝑘 . 𝑤𝑘 ≤ c
where, 𝑥1 , 𝑥2 , ............., 𝑥𝑛 are nonnegative integer variables, defined by 𝑥𝑘 = the number of k type items loaded into the Knapsack or the bag. As like the assignment problem, the 0/1 Knapsack Problem can be solved using many different approaches such as the Greedy Approach, Dynamic Programming approach, Brute force, the Backtracking approach and the Branch and Bound algorithm design approach. Here, we discuss the Branch and Bound technique to solve the 0/1 Knapsack Problem. Branch and bound is a paradigm for designing algorithms that is frequently applied to solve the problem of combinatorial optimization. In worst-case scenarios, these issues may necessitate investigating every potential permutation because they are typically exponential in terms of time complexity. They provide relatively quick solution. To solve the knapsack problem through branch and bound, number of items, profit, value and capacity of the bag will be given. The following steps are to be followed to solve the problem 1. Construct a state space tree. 2. For each node in the state space tree, calculate the cost function and upper bound. 3. Kill that node if the value cost function exceeds the upper bound. 4. If not, choose a node with the lowest cost. 5. Till every node has been examined, repeat steps 3 and 4.
56
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
6. The answer node is the node with the lowest cost. To locate the solution tuple, follow the route from leaf to root in the reverse direction. Example Let’s look at the 0/1 Knapsack problem in below example to better understand Branch and Bound. Here some items are given that represent profits and values associated with each item. And let’s assume the capacity of bag is 10. Items Profit Values
1 7 1
2 8 3
3 8 4
4 12 6
Figure 6.1. Solution to 0/1 Knapsack problem using Branch and Bound Technique.
Branch and Bound
57
The objective of the problem is to fill the bag with those items such that the total values of objects included in the bag should be less than or equal to 10. Although this is a maximization problem, branch and bound only resolve minimization problems. Therefore, first maximization problem will be converted into minimization and then again result will be converted into maximization. Also, we will take value of cost with fraction and value of upper bound without fraction. A pictorial representation is shown above in Figure 6.1. We will process with fixed size solution, where: 1 = included 0 = not included. Node A- Include all items Profit Value
7 1
8 3
8 4
2 X 12/6= 4 2
8 4
2 X 12/6= 4 2
Cost= 7 + 8 + 8 + 4 = -27 Upper bound = 7 + 8 + 8 = -23 Node B- 1st item is included Profit Value
7 1
8 3
Cost = 7 + 8 + 8 + 4 = -27 Upper bound = 7 + 8 + 8 = -23 Node C= 1st item is not included Profit Value
8 3
8 4
3 X 12/6= 6 3
Cost = 8 + 8 + 6 = -22 Upper bound= 8 + 8 = -16 Now node B will be explored further to node D and node E as it’s cost is less than node C. Node D= 2nd item is included
58
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al. Profit Value
7 1
8 3
8 4
2 X 12/6= 4 2
Cost= 7 + 8 + 8 + 4 = -27 Upper bound = 7 + 8 + 8 = -23 Node E= 2nd item is not included Profit Value
7 1
8 4
5 X 12/6= 10 5
Cost = 7 + 8 + 10 = -25 Upper bound = - 15 Now node D will be explored further to node F and node G as its cost is less than node E. Node F = 3rd item is included Profit Value
7 1
8 3
8 4
2 X 12/6= 4 2
Cost= 7 + 8 + 8 + 4 = -27 Upper bound = 7 + 8 + 8 = -23 Node G = 3rd item is not included Profit Value
7 1
8 3
12 6
Cost = 7 + 8 + 12 = -27 Upper bound= 7 + 8 + 12 = -27 Now the upper bound is -27, so node C and node E will be killed, and node F will have infeasible solution as if we included 4th item the capacity will be more than 10. So, node G will be explored further to node H and node I. Node H- 4th item is included. Profit Value
7 1
Cost = 7 + 8 + 12 = -27 Upper bound= 7 + 8 + 12 = -27 Node I- 4th item is not included
8 3
12 6
Branch and Bound Profit Value
7 1
59 8 3
Cost = -15 Upper bound= -15 Cost of Node H is less, it is the final node. So, the final node = {1,1,0,1} Final node path= A-B-D-G-H Profit= 7 + 8 + 12 = 27 Value= 1 + 3 + 6= 10.
6.3. Travelling Salesman Problem Travelling Salesman Problem or TSP, is a well-known problem in Computer Science and has attracted researchers and academicians alike. In TSP, one is given a set of cities and the distance between every pair of cities. The problem is to find the shortest route for a salesman who visits every city exactly once and returns to the starting city. More precisely, the problem is to find the Minimum Weight Hamiltonian Cycle. TSP is a NP – hard problem and many different ways has been proposed to find an optimum solution to the TSP. We discuss below, how branch and bound technique can be used to find a solution to the TSP.
Figure 6.2. An Instance of TSP.
60
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
For example, weighted graph of 5 vertices is given and we have to find the shortest tour that is going through all the vertices once. For that cause adjacency matrix is also given here. Now, let us understand how this problem will be solved using branch and bound. ∞ 30 45 15 16 22 ∞ 24 6 3 A= 4 8 ∞ 3 6 28 9 27 ∞ 4 24 6 10 24 ∞ First, we will reduce the matrix, by selecting minimum values from each rows and columns, then subtract that minimum values from all the values of rows and columns. After minimizing rows we get following matrix, ∞ 15 30 0 1 19 ∞ 21 3 0 1 5 ∞ 0 3 24 5 23 ∞ 0 18 0 4 18 ∞ Then, we will minimize the column and we get following matrix ∞ 15 26 0 1 18 ∞ 17 3 0 0 5 ∞ 0 3 23 5 19 ∞ 0 17 0 0 18 ∞ So, the reduced cost = 31 + 5 = 36 We will expand the nodes and calculate the cost of each node. Note: We will further expand only those nodes whose cost is minimum than the others node.
Branch and Bound
61
Figure 6.3. Solution to Travelling Salesman Problem using Branch and Bound Technique.
Node A= Reduced cost of the matrix = 36 Node B (1, 2) = ∞ ∞ ∞ ∞ ∞ ∞ ∞ 17 3 0 0 ∞ ∞ 0 3 23 ∞ 19 ∞ 0 17 ∞ 0 18 ∞ Cost of node B= C (1, 2) + r + ř = 15 + 36 + 0 = 51 Node C (1, 3) = ∞ ∞ ∞ ∞ ∞ 18 ∞ ∞ 3 0 ∞ 5 ∞ 0 3 23 5 ∞ ∞ 0 17 0 ∞ 18 ∞ We will have to minimize this matrix first and cost of reduction be will 17
62
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
∞ ∞ ∞ ∞ ∞ 1 ∞ ∞ 3 0 ∞ 5 ∞ 0 3 6 5 ∞ ∞ 0 0 0 ∞ 18 ∞ Cost of Node C= C (1, 3) + r + ř = 26 + 36 + 17 = 79 Node D (1, 4) = ∞ ∞ ∞ ∞ ∞ 18 ∞ 17 ∞ 0 0 5 ∞ ∞ 3 ∞ 5 19 ∞ 0 17 0 0 ∞ ∞ Cost of node D= C (1, 4) + r + ř = 0 + 36 + 0= 36 Node E (1, 5) = ∞ ∞ ∞ ∞ 18 ∞ 17 3 0 5 ∞ 0 23 5 19 ∞ ∞ 0 0 18
∞ ∞ ∞ ∞ ∞
Cost of Node E= C (1, 5) + r + ř = 1 + 36 + 8 = 45 Now, since Node D has minimum cost, so this Node will be explored further. Node F (4, 3) ∞ ∞ ∞ ∞ ∞ ∞ ∞ 17 ∞ 0 0 ∞ ∞ ∞ 3 ∞ ∞ ∞ ∞ ∞ 17 ∞ 0 ∞ ∞ Cost of Node F= C (4, 2) + C (D) + ř = 15 + 36 + 0 = 41
Branch and Bound
Node G (4, 3) ∞ ∞ ∞ 18 ∞ 17 ∞ 5 ∞ ∞ ∞ ∞ 17 0 0
63
∞ ∞ ∞ 0 ∞ 3 ∞ ∞ ∞ ∞
First, we will have to minimize this matrix. Following is a result after minimizing row ∞ ∞ ∞ ∞ ∞ 18 ∞ 17 ∞ 0 ∞ 2 ∞ ∞ 0 ∞ ∞ ∞ ∞ ∞ 17 0 0 ∞ ∞ Then column will be minimized. Following will be the matrix after minimizing column and the cost of minimization is 3 + 17 = 20 ∞ ∞ ∞ ∞ ∞ 1 ∞ 17 ∞ 0 ∞ 2 ∞ ∞ 0 ∞ ∞ ∞ ∞ ∞ 0 0 0 ∞ ∞ Cost of Node G= C (4, 3) + C (D) + ř = 19 + 36 + 20= 75 Cost of Node F is minimum so this node will be explored further Node H (4, 5) ∞ ∞ ∞ ∞ ∞ 18 ∞ 17 ∞ ∞ 0 5 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 0 0 ∞ ∞ We will minimize this matrix and cost of reduction is 17 and following will be the matrix after reduction:
64
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
∞ ∞ ∞ ∞ ∞ 1 ∞ 0 ∞ ∞ 0 5 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 0 0 ∞ ∞ Cost of Node H= C (4, 5) + C (D) + ř = 0 + 36 + 17 = 53 Node I (2, 3) ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 17 ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ 3 ∞ ∞
We will minimize this node also and cost of reduction is 20. Following will be the matrix after reduction ∞ ∞ ∞ ∞ 0
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ 0 ∞ ∞
Cost of Node I= C (2, 3) + C (F) + ř = 17 + 41 + 20 = 78 Node J (2, 5) ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 17 ∞ 0
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
Cost of Node J= C (2, 5) + C (F) + ř = 0 + 41 + 0 = 41 Cost of Node J is minimum, so this node will be explored further. Node K (5, 3) =
Branch and Bound
∞ ∞ 0 ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
Cost of Node K = C (5, 3) + C (J) + ř = 0 + 41 + 0 = 41 So, final path of the node = A- D – F – J – K OR 1 – 4 – 2 – 5 – 3.
65
Chapter 7
Introduction to the Analysis of Algorithms That computer science is a facet of an experimental science with a strong uprising in empirical analysis especially on algorithms is now a widely acknowledged fact. In the process of its teaching, a lot of focus is laid on creating good laboratories which could supplement as well as complement the theories covered under the lectures. The concept of analysis on algorithms can be employed to experimentally verify the theory in an arena like those of physics and chemistry laboratories. Apparently, it is observed that an opportunity to link theory to an experiment is generally not used to the fullest in pedagogy. What can be called a good algorithm depends mainly on two factors – the running time of the algorithm and the space used by it. An algorithm may be called efficient based on its running time in three scenarios viz. best case, average case, and worst-case running time. The three cases depend on the input given to the program. For example, in case of insertion sort, the best case occurs when the input array is already sorted as it does not require any swapping operation in this scenario. Worst case occurs when the array is reverse sorted as both swapping, and comparison are required for every element of the array. Average case can be calculated by giving different possible input and then dividing the sum of time for every possible input by total number of times different input were taken. Finding the average case of an algorithm is very difficult.
7.1. Asymptotic Analysis Asymptotic analysis is a very powerful analyzing method which simplifies the analysis of running time by eliminating minute details such as rounding the input and approximating the run time in some terms of the input size. It guesses efficiently how the running time of an algorithm varies with the size of the input in limit. Three different notations are used to measure the time complexity of an algorithm asymptotically - the big – oh notation (O), the theta notation () and the big – omega notation ().
68
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
7.1.1. Big – Oh It is the formal way to represent the upper bound of an algorithm run time and measures the longest amount of time an algorithm can take to complete. For a function f (n), we can write f (n) = O (g (n)) positive constants c and n0, such that n n0, we have 0 f (n) cg (n)} This implies, for the run time of an algorithm represented by f (n) cannot be worse than the closest O (g (n)). Precedence of Functions 1 < log n < √n < n < n log n< 𝑛2 < 𝑛3 < ………. < 2𝑛 < 3𝑛 < 𝑛𝑛
Example: f (n) = 2 n + 3 2 n + 3 c n n 1 Here, f (n) = 2 n + 3 and g (n) = n (for the constant c greater than or equal to 5 as the right side of the equation will always be greater than 2 n + 3). So, we can say f (n) = O (n) i.e., the upper bound on the run time of the above function is n and hence it cannot take anytime greater than n. Figure 7.1 below shows the graphical representation of a function when bounded by the big – O notation.
Figure 7.1. Big – O bound of a function.
Introduction to the Analysis of Algorithms
69
7.1.2. Big - Omega It is the formal way to represent the lower bound of an algorithm run time and measures the best amount of time an algorithm can take to complete. For a function f (n), we can write f (n) = (g (n)) positive constants c and n0, such that n n0, we have 0 cg (n) f (n)} Example: f (n) = 2 n + 3 2 n + 3 c n n 1 Here, f (n) = 2 n + 3 and g (n) = n (for the constant c less than or equal to a as the right side of the equation will always be less than 2 n + 3) So, we can say f (n) = (n) i.e., the lower bound on the run time of the above function is n and hence it cannot take anytime lesser than n. Figure 7.2 below shows the graphical representation of a function when bounded by the big – notation.
Figure 7.2. Big – bound of a function.
7.1.3. Theta It is the formal way to represent both the lower bound and upper bound of an algorithm run time and measures the average amount of time an algorithm can take to complete. For a function f (n), we can write
70
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
f (n) = (g (n)) positive constants c1, c2, and n0, such that n n0, we have 0 c1g (n) f (n) c2g (n) Example:
f (n) = 2 n + 3 c1 n 2 n + 3 c2 n n 1 f (n) = (n)
Figure 7.3 below shows the graphical representation of a function when bounded by the theta notation.
Figure 7.3. bound of a function.
7.2. Empirical Analysis of Computer Algorithms: Why Statistics? Empirical analysis of computer algorithms refers to a computer experiment whose response variable could be either the output or it could well be, depending on the investigator’s interest, even the complexity of the underlying algorithm. By complexity, we are implying the resources such as time, space, computational effort consumed in order to execute the algorithm. There are two questions that we shall address in this note: 1. Why should program run time, which is deterministic, be characterized by a statistical model?
Introduction to the Analysis of Algorithms
71
2. Why is it relevant to design and analyze a computer experiment statistically especially when the response variable is the complexity of the underlying algorithm? The answer to the first question is that algorithmic complexity is never expressed as a function of the input; rather it is expressed as a function of the input parameter (s) characterizing the input size. Now, what is deterministic for a fixed input may be taken as stochastic for a fixed input size and randomly varying input elements which is the traditional argument advocated by computer scientists (see e.g., Mahmoud, 2000). While this argument holds good for algorithms such as sorting, where fixing the input size does not fix all the computing operations, it may not hold for algorithms such as classical matrix multiplication where fixing the input size does fix all the computing operations. In such cases, there is a stronger untraditional argument defending stochastic modelling which says that since our experiment is a computer experiment, so we are looking for a predictor that is cheap and efficient. We want to predict the response without having to feed the input and without having to run the code and that too for such huge volume of the input for which it is computationally cumbersome to run the code, not to speak of the time lost in feeding the input. This untraditional argument is credited to the statistician Prof. Jerome Sacks and his research team (Sacks et al., 1989). Now, we shall proceed to answer the second question which demands a detailed discussion. Nowadays experiments are performed almost everywhere as a tool for studying and optimizing processes and systems with an objective to improve process yields; improve the quality of products, such as to reduce variability and increase the reliability; reduce the development time; and reduce the overall costs. Design refers to the method of conducting an experiment. A good experimental design should minimize the number of runs needed to acquire as much information as possible. Experimental design, a branch of statistics, has enjoyed a long history of theoretical development as well as applications (Koehler and Owen (1996), Fang and Lin (2003)). Experiments can be classified as: Physical Experiments, or Computer (or simulation) Experiments. Traditionally, an experiment is implemented in a laboratory, a factory, or an agricultural field. This is called physical experiment or actual experiment, where the experimenter physically carries out the experiment. There always exist random errors in physical experiments so that we might obtain different outputs under the identical experimental setting. Existence of random errors creates complexity in data analysis and
72
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
modelling. Therefore, the experimenter may choose one or few factors in the experiment so that it is easy to explore the relationship between the output and input; alternatively, the experimenter may use some powerful statistical experimental designs. The statistical approach to the design of experiments is usually based on a statistical model. A good design is one which is optimal with respect to a statistical model under our consideration. There are many statistical models for physical experiments, among which the fractional factorial design – based on an ANOVA (Analysis of Variance) model and the optimum design – based on a regression model, are the most widely used in practice. These models involve unknown parameters such as main effects, interactions, regression coefficients, and variance of random error. Good designs (e.g., orthogonal arrays and various optimal designs) may provide unbiased estimators of the parameters with smaller or even the smallest variance-covariance matrix under a certain sense. Many physical experiments can be expensive and time consuming because physical processes are often difficult or even impossible to study by conventional experimental methods. As computing power is rapidly increasing and is accessible, it has become possible to model some of these processes by sophisticated computer code (Santner, et. al., (2003)). In the past decades computer experiments or computer-based simulation have become topics in statistics and engineering that have received a lot of attention from both practitioners and the academic community. The underlying model in a computer experiment is deterministic and given, but it is often too complicated to manage and to analyze. One of the goals of computer experiment is to find an appropriate model (“metamodel” for simplicity) that is much simpler than the true one. Simulation experiments study the underlying process by simulating the behavior of the process on a computer. The true model is deterministic and given in a computer experiment, but errors on the inputs are considered and assumed. The simulation of the random process is conducted by incorporating random inputs into the deterministic model (Fang, Li and Sudjianto (2006)). The unique characteristics of computer experiments compared to physical experiments are as follows:
Computer experiments often involve larger number of variables compared to those of typical physical experiments. Larger experiment domain or design space is frequently employed to explore complicated nonlinear functions.
Introduction to the Analysis of Algorithms
73
Computer experiments are deterministic. That is, samples with the same input setting will produce identical outputs.
7.2.1. Computer Experiments and Algorithmic Complexity A computer experiment is a series of runs of a code for various inputs. Most computer experiments are deterministic in the sense that the response is identical when the code is re-run with the same inputs. This is true irrespective of whether the response variable represents an output (such as the product matrix in a matrix multiplication algorithm) or a complexity such as run time. The nature of the response depends on the investigator’s interest. While complexity bounds are asymptotically true, they may not hold over a finite range. We strongly feel that the finite range concept in algorithm analysis is important as it is over a finite range of the input size only that a computer experiment can be conducted. Is it not true, in addition to this, that the computer scientists have used the mathematical bounds to determine average complexity whereas the term “average” is a statistical term? This challenging science could have been assigned far greater meaning if only we realized that just as arithmetic mean (the most popular average) is only a special case of weighted mean where the weights are frequencies, similarly the bounds in average case complexity ought to be based on operation weights rather than operation counts and that all of them should have been collectively taken into consideration for determining the (asymptotic) bound (because, when an algorithm is implemented, the computing operations perform collectively). This is the concept which gives rise to a non-trivial and conceptual statistical bound.
7.2.2. Statistical (Complexity) Bound (Definition) If wij is the weight of a computing operation of type i in the j-th repetition (generally time is taken as a weight) and y is a “stochastic realization” (which may or may not be stochastic***) of the deterministic T= Ʃ 1. wij where we count one for each operation repetition irrespective of the type, the statistical bound of the algorithm is the asymptotic least upper bound of y expressed as a function of n where n is the input parameter characterizing the algorithm’s input size. If interpreter is used, the measured time will involve both the translation time and the execution time but the translation time being
74
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
independent of the input will not affect the order of complexity. The deterministic model in that case is T= Ʃ 1. Wij + translation time. For parallel computing, summation should be replaced by maximum. Empirical O (written as O with a subscript emp) is an empirical estimate of the statistical bound over a finite range, obtained by supplying numerical values to the weights obtained by running computer experiments (Chakraborty and Sourabh, 2010). Empirical-O: y=Oemp(g(n)) means the following: The simplest model one can fit to the algorithmic time complexity data that prevents ill-conditioning by sacrificing a minimum amount of predictive power, if need be, must have a leading functional term g(n) in the model. (Chakraborty and Sourabh, 2007) --------------------------------------------------------------------***compare the traditional school of Mahmoud (2000) with the untraditional school of Sacks et al. (1989) discussed earlier! The weight of an operation may be taken as the corresponding time, but this is only a rule of thumb. The meaning of the term conceptual bound is that we conceptualize that there is a statistical bound and then set out to estimate it by supplying numerical values to these weights, obtained by conducting a computer experiment, which has complexity (such as time) rather than the output as its response, and then make an intelligent guess as to where this bound goes albeit confining ourselves to a finite but desirably feasible range [a, b]. The finite range concept for the bound estimate comes into play, as mentioned earlier, as we cannot run a code for infinite input size. This gives rise to another concept, namely, an empirical-O which is the empirical estimate of the non-trivial and conceptual statistical bound. The credibility of this estimate depends on how well our computer experiment was designed and how the same was analyzed. As it happens, this boils down to a staggering of exploratory and confirmatory data analysis (EDA/CDA). Although a subjective estimate, depending on the personal bias of empirical model selection by the statistician, the very presence of an empirical- O nevertheless suggests that there must be something that we are estimating for an estimate is only an intelligent guess and surely, we cannot guess for nothing! Since in real implementation operations perform collectively and since time can be arguably taken as a weight in some sense, we claim that empirical-O is an estimate of the statistical bound which is expected to be more realistic than the mathematical bounds in that unlike the former, the latter are obtained from operation counts where a separate bound is supplied to a specific operation type. For simple codes, in case of average time complexity, we expect (i) The statistical bound to coincide with the mathematical expectation of that
Introduction to the Analysis of Algorithms
75
operation or portion of the code which is “pivotal” (e.g., comparisons in Quick sort) provided further that (ii) The probability distributional assumption while taking expectation is realistic over the problem domain. Since either or both of these conditions may fail in the case for a complex code in which it is hard to guess the pivotal region or operation and it is the complex and arbitrary codes that are of concern, we strongly propose to apply the concept of statistical bound and its empirical estimate, empirical O, for determining average case complexity of complex codes such as the ones in partial differential equations. Although statistical bounds were initially created to make average complexity a better science (computer scientists consider only worst-case complexity to be a useful science as the bounds have a guarantee), they are also useful in giving a certificate on the level of conservativeness of the guarantee giving mathematical bounds in worst case. In this way, worst case complexity, an acknowledged strong area of theoretical computer science, can be made more meaningful. Finally, statistical bounds can easily nullify a tall mathematical claim in best case (for examples, see Chakraborty and Sourabh, 2010). The moral is that we must channelize our research in a direction so as to increase the credibility of the empirical-O, the bound estimate of the realistic statistical bound which will continue to hold conceptually when the idea of identifying the “pivotal” operation and applying mathematical expectation (not to speak of the probability distribution’s realistic appeal being questioned over the problem domain) may fail. Such a channelization amounts to a proper design and analysis of our computer experiment whose response variable is a complexity (i.e., some resource consumed to get the output) such as time that can be interpreted as the weighted sum of the computing operations, the weights being the corresponding times whose numerical values are supplied while conducting the computer experiment (a computer experiment is defined as a series of runs of a code for various inputs; a deterministic computer experiment is one in which re-running the code with the same inputs leads to identical observations). Note that most computer experiments are deterministic irrespective of whether the response is its output or a complexity. This in turn amounts to asking (in our case):
At what points of n should the observations be taken? (Algorithmic complexity is never expressed for a specific input. It is expressed as a function of the input parameter(s) characterizing the size of the input). This is a question of optimality design.
76
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
How many observations (equal or unequal) are taken at each point of n? As a rule of thumb, greater the “noise” at each point of n, more should be the number of observations. Some optimality designs require equal observations at each point of n, e.g., D-optimality that minimizes the generalized inverse of least square estimates in (say) polynomial regression. However observations are not equal if the variance of some specific least square estimate (such as the parameter estimate of the highest polynomial term which affects the big O) is only to be minimized. How many data sets should we have, one or more than one? (e.g., one strategy involves splitting the data in three random groups: one for pattern recognition in the data, one for actual fitting, yet one for verifying the model assumptions) How do we analyze the data once the observations are taken? This is the analysis problem. More specifically, should research move in a traditional or an untraditional direction? Here is our suggestion: Divide algorithms in two broad groups: one in which fixing n does not fix the computing operations (e.g., sorting) and the other in which it does (e.g., n × n classical matrix multiplication). Both involve computer experiments, with complexity as a response, but research clearly flows in a traditional direction in the first case and in an untraditional direction in the second. In the first case, we are using the stochastic model y = f (n) + error term while in the second case y is a “stochastic realization” (and not stochastic!) only to achieve cheap and efficient prediction. The second idea holds, in fact, in both situations and is stronger, therefore. How do we ensure the range [a, b] to be feasible? A range is feasible if it covers a large number of data sets where b should be large enough to make the bound estimate look like a bound estimate (remember all bounds extend to -∞ or ∞ whether from below or above respectively and irrespective of whether they are mathematical or statistical. Further the one we are estimating is also asymptotic, i.e., true for higher n only; it is the estimate that is observed over a finite range; we hope there will be no confusion)
Introduction to the Analysis of Algorithms
77
Table 7.1. Comparison table: mathematical vs statistical complexity bounds Mathematical Bounds Operations are counted A separate bound is provided for a specific operation type (mixing operations of different types is not permitted) Theoretically derivable
They may be unrealistic at times They are system independent
They are ideal for analyzing worst case behavior
They are exact Determining a precise mathematical bound for a complex and arbitrary code can be intriguing
Statistical Bounds Operations are weighed Weighing permits collective consideration of all operations for determining the bound (mixing different types of operations permitted)
They are only conceptual. However, their empirical estimates are certainly gettable in every system (the so called empirical-O) Guaranteed to be realistic They can only be system invariant (as suggested by identical empirical-O in different systems) and may vary from system to system depending on the language and translator which may suit some specific operations and inputs They are particularly suited to study average case behavior especially for complex codes. They are arguably ideal for parallel computing as, when a processor is changed, it is the weight of the operation that changes. Hence if the bound is itself based on weights, it should be ideal for parallel computing. They are exact provided only they are system invariant. The basic methodology does not change no matter how arbitrary or complex the code may be. It is taken as a problem of designing and analyzing a computer experiment in which the response is complexity such as time. Selection of weights is another problem.
But this is only a sample of a juicy set of questions. The statistician must select his own set of questions and seek answers to them, involving a staggering of exploratory and confirmatory data analysis including, but not limited to, applied regression analysis including optimality design, probabilistic analysis (for the first category of algorithms), bootstrapping and spatial statistics (if required). An open and challenging research problem is whether Bayesian analysis is possible in a computer experiment if the response is a complexity. Note that Bayesian analysis in computer experiments is nothing new but only with output as the response.
78
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
The comparison table (Table 7.1) compares mathematical and statistical algorithmic complexity bounds. Remark: So far as average case complexity is concerned, we expect the mathematical bound and the statistical bound to coincide provided only the following conditions are met: a. The statistical bound has a system invariant property b. The algorithm does have a pivotal operation or there is some portion of the code which has been successfully identified as pivotal for applying the expectation c. The probability distribution over which expectation is taken is realistic over the problem domain. This means inputs are likely to come from this probability distribution only or, even if they do not, the expression derived mathematically will still stand. If you are applying statistical bound in worst case, their estimates can surely clash with the mathematical bound if the statistical bound lacks the system invariance property. Moreover, the mathematical bound itself can be a conservative one and the algorithm can perform better than what the bound says over an appreciably large but finite n as indicated by empirical O. Note that there is enough room for this to happen for, if T(n) represents worst case time complexity, we define T(n) =O(f(n)) if there are positive constants c and N such that T(n)N. Since no further restriction is placed on c and N, they can take values beyond the scope of practical interest whereas empirical O, the practical estimate of the conceptual statistical bound, is guaranteed to be realistic. Similar arguments can be given for best case where a tall mathematical claim can easily be nullified statistically. In case of parallel computing, none of the mathematical bounds are meaningful. It is very easy to see why. Constants such as the c in big O depends on the processor but you are using multiple processors and that too, of different types sometimes as in a distributed system! In contrast, statistical bounds are ideal as they are based on operation weights and the weights would take care of the change of processor. For further literature on applications of statistical bounds and empirical O, the reader is referred to Chakraborty and Sourabh (2010). For example, an empirical average O(n2) complexity is certainly gettable in matrix multiplication of two square matrices of order nxn using Amir Schoor’s algorithm for two dense matrices, although this algorithm was developed for sparse matrices, because (i) the n2 comparisons are “heavier” than the
Introduction to the Analysis of Algorithms
79
multiplications whose order is O(d1d2n3), d1 and d2 being the densities of the pre factor and the post factor matrix respectively (Chakraborty and Sourabh, 2010) and further because (ii) it is faster to work with rows only as in Schoor’s algorithm than with both rows and columns as in nxn classical matrix multiplication (Chakraborty and Sourabh, 2007). These concepts have been recently applied successfully in cryptographic algorithms (AES-128 and RSA) by Pranav et al. (2021). For a seminal paper on design and analysis of computer experiments the reader is referred to Sacks et al. (1989).
Chapter 8
Randomized Algorithms Out of many available algorithm design approaches, using a random number anywhere in the steps of an algorithm which may eventually help in reducing the time or the space complexity of the algorithm is widely used these days. Such an algorithm which employs the use of random variables anywhere in the steps of the algorithm is called as a Randomized Algorithm. The most prominent use of a random variable can be seen in the quick sort algorithm in which the next pivot element can also be selected by using a random number. Random variables are the numerical description of the outcome of an experiment whose values are unpredictable. For example, tossing a dice is random experiment since the outcome as being either heads or tail entirely depends on chance. Two types of random variables are their namely “Discrete Random Variable” and “Continuous Random Variables.” Discrete random variables can take on countable number of random values while continuous one can take any value in a defined interval. Random variables are unpredictable but the use of probability theory to deal with such indeterminacy cannot be denied. Probability theory makes use of the fact that although the result of an experiment cannot be determined prior to the experiment being conducted but a long sequence of such experiment reveals stability that can then be used to make a prior and precise decision on the virtue of the outcome of the random experiment. Use of the concept of relative frequency (the ratio of total number of anticipated events to the total number of times the experiment was conducted) facilitates making a prior decision of a random experiment. With the use of a specific probability distribution which gives the probabilities of occurrence of an event in any experiment, the randomness of the result can be stabilized. For example, suppose we have an experiment as the tossing of coin (and the coin is fair). If we take a random variable X to mark the outcome of the experiment, then the probability distribution of X will take X = 0.5 for both heads and tails. A probability distribution is a statistical function that describes all possible values and likelihood that a random variable will take in a given range. The plotting of the possible values on the specific probability distribution depends on factors such as mean, variance,
82
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
skewness, and kurtosis of the distribution. Probability distributions are of two types of viz. discrete probability distribution which deals with discrete random variables and continuous probability distribution which deals with continuous random variables. There are many existing algorithms whose randomized version is also a possibility. If we talk about the time complexity of such randomized algorithms, then it generally falls into two classes. The first class consists of Monte Carlo Algorithms in which the time complexity of the randomized algorithm is deterministic and computing the worst case run time of such classes of algorithms is comparatively easier. The other class consists of Las Vegas Algorithm in which the time complexity of the randomized algorithm solely depends on the value of random variable. Such algorithm are typically analyzed for expected worst case which can be computed by taking all possible values of the used random variables in the worst case scenario and the time taken for each instance needs to be noted. The expected worst case then can be calculated by taking the average of all evaluated time. In this chapter we will analyze the time complexity of the following 4 randomized algorithm empirically (more on empirical complexity of an algorithm in Chapter 10): 1. Randomized Quick Sort 2. Randomized Binary Search Execution time estimate of an algorithm is a theoretical classification that measures and tries to predict the overall increase in the run time of any specific algorithm as the input size increases. A given problem can have many different algorithmic solutions to it and depending upon the specific algorithm implemented through any programming language to solve the problem, the solution can have a run time measured in seconds, hours, days or even in some cases years. Run time analysis of an algorithm can be classified into two categories viz. theoretical analysis and empirical analysis. The theoretical analysis approximates the overall execution time of the algorithm for all possible infinitely large inputs and bounds a theoretical gauge to its complexity. The bounded gauge can be measured in terms of best case, average case, and worst-case scenarios. Computer scientists in general have not been very much interested in finding the best-case time complexity of an algorithm and there has been a lot of debate around the average case scenario.
Randomized Algorithms
83
Hence, the worst-case time complexity of an algorithm is of more interest to the computer scientists and researchers, which measures the run time of an algorithm in the worst-case scenario. The worst-case scenario here means the worst possible input combination that can be given as an input to the algorithm. In an empirical analysis, the program constituting the algorithm is run for varying and increasing input size and then through some proper statistical model fitting, the time complexity of the algorithm is predicted. Although, the approach is more of a platform dependent, the statistical analysis predicts the equation that can be fit well to the complexity of the algorithm. In a real scenario, we are not interested in input size approaching infinity rather we are interested in some real input for which the algorithm can be utilized for any specific application. Theoretical analysis of an algorithm gives the time complexity of an algorithm for infinitely large input size. Empirical analysis predicts a statistical model through which the time complexity can be estimated for real and achievable input of an algorithm.
8.1. Randomized Quick Sort Quick Sort is one of the most used sorting algorithms and is always preferred over Merge Sort as it requires comparatively lesser memory space. The average time complexity of Quick Sort is O(nlogn). In Quick Sort, we partition the arrays in place using a Pivot Element. After partitioning the array, the elements to the left of the pivot element are always less than the value of the pivot element and those to the right of the pivot element are greater. Selection of the pivot element in Quick Sort eventually decides the time Complexity of the algorithm. The pivot element can be selected in many ways. Two of the most popular ways of partitioning the array Lomuto’s Partitioning in which the pivot element is assumed to be the last element of the array and Hoare’s Partitioning in which partition is done by selecting two indexes of the array that at starts at opposite ends. The two indexes move towards each other until an inversion is found. When the algorithm gets in version, the two values at the opposite indexes are swapped and the process is repeated. Lomuto’s Partitioning is comparatively easier to implement. Using a randomized pivot element, the time complexity of Quick Sort can further be reduced. Below is the algorithm for Randomized Quick Sort used in the Lomuto’s Partitioning scheme.
84
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al. partition(arr[], low, high) pivot = arr[high] i = low // place for swapping for j := low to high – 1 do if arr[j] key Set right = pivot – 1 Pivot = RandBetween (left, pivot - 1) Else If array[pivot] < key Set left = pivot + 1 Pivot = RandBetween (pivot – 1, right) If left > right Return element is not found We will analyze the empirical complexity of the Randomized Binary Search through the Fundamental Theorem of Finite Difference. We have implemented the algorithm and noted the execution time for varying and increasing input. Three different trials are taken to avoid any misreading due to cache hit as shown in table 8.4 below. The mean of the execution time of the three trials depicts the execution time of the algorithm for varying and increasing input. The first difference of the execution time of the algorithm is shown in table 8.5 below. As can be seen from the table, the first difference itself is constant so a first-degree polynomial can exactly fit the empirical run time of the Randomized Binary Search algorithm. Consequently, the empirical complexity of Randomized Binary Search can be termed to be Oemp (n).
88
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Table 8.4. Execution time of randomized binary search Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Trial 1 0.01720 0.04789 0.08389 0.11778 0.15257 0.18536 0.21895 0.25485 0.28562 0.31949
Trial 2 0.0296 0.0579 0.0846 0.1379 0.1574 0.1725 0.1899 0.2476 0.2787 0.3348
Trial 3 0.0242 0.0552 0.0876 0.1152 0.1368 0.1745 0.2167 0.2656 0.2926 0.2944
Mean (Execution Time) 0.07100 0.16099 0.25609 0.37088 0.44677 0.53236 0.62555 0.76805 0.85692 0.94869
Table 8.5. Difference table of randomized binary search Input
Mean (Execution Time)
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.07100 0.16099 0.25609 0.37088 0.44677 0.53236 0.62555 0.76805 0.85692 0.94869
First Difference Δy 0.08999 0.11479 0.08559 0.14250
0.09177
Using the principle of least squares, we will fit a first-degree polynomial of the form y = a + bx as shown in Figure 8.2 below. ̂2 is the residual sum of squares which estimates: S = ∑(𝑦 − 𝑦) 𝑦̂ = 𝑎̂ + 𝑏̂ x’ The below Table 8.6 shows the values of ‘𝑎̂’ and ‘𝑏̂’ in the equation 𝑦̂ = 𝑎̂ + 𝑏̂ x.’
Randomized Algorithms
89
Figure 8.2. Fitted Line Plot and Residual Plot of the Execution Time of Randomized Binary Search.
Table 8.6. Residual analysis of randomized binary search Linear model Poly1: 𝑦̂ = 𝑎̂ + 𝑏̂ x’ where x is normalized by mean 2750 and std 1514 Coefficients (with 95% confidence bounds): 𝑎̂ = 0.2969 (0.2851, 0.3087) 𝑏̂ = 0.5037 (0.4925, 0.5149) Goodness of fit: SSE: 0.00188 R-square: 0.9976 Adjusted R-square: 0.9973 RMSE: 0.01533
Chapter 9
Master Theorem The different algorithms that we have can be categorized as either iterative or recursive/recurrence. In Masters Theorem we are talking about the recursive type of algorithms where a function calls itself until the base condition is satisfied. For analyzing such type of recurrence relations, the simplest and most effective way is Masters Theorem. The Masters Theorem gives a formula for solving the given recurrence relation of the form: 𝑛
𝑇(𝑛) = 𝑎𝑇 (𝑏 ) + 𝑓(𝑛), where, 𝑛 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑖𝑛𝑝𝑢𝑡 𝑎 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑝𝑟𝑜𝑏𝑙𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑜𝑛 𝑛 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑢𝑏𝑝𝑟𝑜𝑏𝑙𝑒𝑚. 𝐴𝑙𝑙 𝑠𝑢𝑏𝑝𝑟𝑜𝑏𝑙𝑒𝑚𝑠 𝑎𝑟𝑒 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑏 𝑡𝑜 ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑠𝑖𝑧𝑒 𝑓(𝑛) = 𝑐𝑜𝑠𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑤𝑜𝑟𝑘 𝑑𝑜𝑛𝑒 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝑡ℎ𝑒 𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒 𝑐𝑎𝑙𝑙, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑠 𝑡ℎ𝑒 𝑐𝑜𝑠𝑡 𝑜𝑓 𝑑𝑖𝑣𝑖𝑑𝑖𝑛𝑔 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 𝑎𝑛𝑑 𝑐𝑜𝑠𝑡 𝑜𝑓 𝑚𝑒𝑟𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 𝑎𝑛𝑑 𝑐𝑜𝑠𝑡 𝑜𝑓 𝑚𝑒𝑟𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 𝐻𝑒𝑟𝑒, 𝑎 ≥ 1, 𝑏 > 1 𝑎𝑛𝑑 𝑓(𝑛) 𝑖𝑠 𝑎𝑛 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 The three cases used to solve recurrence relation using Masters Theorem are as follows: 𝑎
𝑎
Case 1: 𝐼𝑓 𝑓(𝑛) < 𝑛𝑙𝑜𝑔𝑏 , then 𝑇(𝑛) = 𝜃(𝑛𝑙𝑜𝑔𝑏 ) 𝑎 𝑎 Case 2: 𝐼𝑓 𝑓(𝑛) = 𝑛𝑙𝑜𝑔𝑏 , then 𝑇(𝑛) = 𝜃(𝑛𝑙𝑜𝑔𝑏 . log 𝑛) 𝑎 Case 1: 𝐼𝑓 𝑓(𝑛) > 𝑛𝑙𝑜𝑔𝑏 , then 𝑇(𝑛) = 𝜃 𝑓(𝑛)
92
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Solved Examples of Masters Theorem Example 1 𝑛
𝑇(𝑛) = 4𝑇 ( 2) + 𝑛 Solution: 𝑎=4 𝑏=2 𝑓(𝑛) = 𝑛 𝑎
4
Therefore, 𝑛𝑙𝑜𝑔𝑏 = 𝑛𝑙𝑜𝑔2 = 𝑛2 𝑎 That is, 𝑓(𝑛) < 𝑛𝑙𝑜𝑔𝑏 (case 1) 𝑎 Therefore, 𝑇(𝑛) = 𝜃(𝑛𝑙𝑜𝑔𝑏 ) Hence, 𝑻(𝒏) = 𝜽(𝒏𝟐 ) Example 2 𝑛
𝑇(𝑛) = 2𝑇 ( ) + 𝑛 2 Solution: 𝑎=2 𝑏=2 𝑓(𝑛) = 𝑛 𝑎 2 Therefore, 𝑛𝑙𝑜𝑔𝑏 = 𝑛𝑙𝑜𝑔2 = 𝑛 𝑎 That is, 𝑓(𝑛) = 𝑛𝑙𝑜𝑔𝑏 (case 2) 𝑎 Therefore, 𝑇(𝑛) = 𝜃(𝑛𝑙𝑜𝑔𝑏 . log 𝑛) Hence, 𝑻(𝒏) = 𝜽(𝒏) Example 3 T (n) = a T (n / b) + f (n) where a 1 and b > 1 f (n) = (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝 ) We need to find two things
Master Theorem
𝑙𝑜𝑔𝑏𝑎 k 𝑎 Case 1: If 𝑙𝑜𝑔𝑏𝑎 > k, then T (n) = (𝑛𝑙𝑜𝑔𝑏 ) Case 2: If 𝑙𝑜𝑔𝑏𝑎 = k, then we have three sub cases If p > -1, then (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝+1 ) If p = -1, then (𝑛𝑘 𝑙𝑜𝑔 𝑙𝑜𝑔 𝑛 ) If p < -1, then (𝑛𝑘 ) Case 3: If 𝑙𝑜𝑔𝑏𝑎 < k, then we have two sub cases If p 0, then (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝 ) If p < 0, then O (𝑛𝑘 ) Example 4 T (n) = 2 T (n / 2) + 1 a=2 b=2 f (n) = 1 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔22 = 1 k = 0, p =0 Falls in 1st case as 𝑙𝑜𝑔𝑏𝑎 > k 𝑎 So, T (n) = (𝑛𝑙𝑜𝑔𝑏 ) = (𝑛1 ) = (n) Example 5 T (n) = 4 T (n / 2) + n a=4 b=2 f (n) = n 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔24 = 2 k = 1, p =0 Falls in 1st case as 𝑙𝑜𝑔𝑏𝑎 > k 𝑎 So, T (n) = (𝑛𝑙𝑜𝑔𝑏 ) = (𝑛2) Example 6 T (n) = 2 T (n / 2) + n a=2 b=2
93
94
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
f (n) = n 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔22 = 1 k = 1, p =0 Falls in 2nd case as 𝑙𝑜𝑔𝑏𝑎 = k If p > -1, then (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝+1 ) = (𝑛 𝑙𝑜𝑔𝑛0+1 ) = (𝑛 log 𝑛 ) Example 7 T (n) = 4 T (n / 2) + 𝑛2log n a=4 b=2 f (n) = 𝑛2log n 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔24 = 2 k = 2, p = 1 Falls in 2nd case as 𝑙𝑜𝑔𝑏𝑎 = k If p > -1, then (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝+1) = (𝑛2 𝑙𝑜𝑔𝑛1+1) = (𝑛2 𝑙𝑜𝑔𝑛2) Example 8 T (n) = 2 T (n / 2) + n / log n a=2 b=2 f (n) = n / log n 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔22 = 1 k = 1, p = - 1 Falls in 2nd case as 𝑙𝑜𝑔𝑏𝑎 = k If p = -1, then (𝑛𝑘 𝑙𝑜𝑔 𝑙𝑜𝑔 𝑛 ) = (𝑛 𝑙𝑜𝑔 𝑙𝑜𝑔 𝑛 ) Example 9 T (n) = 2 T (n / 2) + 𝑛 𝑙𝑜𝑔𝑛− 2 a=2 b=2 f (n) = 𝑛 𝑙𝑜𝑔𝑛− 2 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔22 = 1 k = 1, p = - 2 Falls in 2nd case as 𝑙𝑜𝑔𝑏𝑎 = k If p < -1, then (𝑛𝑘 ) = (𝑛2)
Master Theorem
95
Example 10 T (n) = T (n / 2) + 𝑛2 a=1 b=2 f (n) = 𝑛2 𝑙𝑜𝑔𝑏𝑎 = 𝑙𝑜𝑔21 = 0 k = 2, p 0 Falls in 3rd case as 𝑙𝑜𝑔𝑏𝑎 < k If p < -1, then (𝑛𝑘 ) = (𝑛2) If p 0, then (𝑛𝑘 𝑙𝑜𝑔𝑛𝑝 ) = (𝑛2) T(n) = 2T (n / 2) + 𝑛2 T (n) = 2T (n / 2) + 𝑛2/ log n
9.1. Master Theorem for Decreasing Function As we know that, 𝑇(𝑛) = 𝑇(𝑛 − 1) + 1 − 𝑂(𝑛) 𝑇(𝑛) = 𝑇(𝑛 − 1) + 𝑛 − 𝑂(𝑛2 ) 𝑇(𝑛) = 𝑇(𝑛 − 1) + log 𝑛 − 𝑂(𝑛 𝑙𝑜𝑔 𝑛) Also, we know that the time complexity of 𝑇(𝑛) = 2𝑇(𝑛 − 1) + 1 is 𝑂(2𝑛 ). That is, 𝑇(𝑛) = 2𝑇(𝑛 − 1) + 1 𝑂(2𝑛 ) Also, 𝑇(𝑛) = 3𝑇(𝑛 − 1) + 1 𝑂(3𝑛 ) Therefore, 𝑇(𝑛) = 2𝑇(𝑛 − 1) + 𝑛 𝑂(𝑛2𝑛 ) So, the general form of recurrence relation is 𝑇(𝑛) = 𝑎𝑇(𝑛 − 𝑏) + 𝑓(𝑛) where, 𝑎 > 0, 𝑏 > 0 𝑎𝑛𝑑 𝑓(𝑛) = 𝑂(𝑛𝑘 ) 𝑤ℎ𝑒𝑟𝑒 𝑘 ≥ 0 The three cases we get from above are as follows:
96
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Case 1: 𝐼𝑓 𝑎 < 1 𝑡ℎ𝑒𝑛 𝑂(𝑛𝑘 ) or 𝑂(𝑓(𝑛)) Case 2: 𝐼𝑓 𝑎 = 1 𝑡ℎ𝑒𝑛 𝑂(𝑛𝑘+1) or 𝑂(𝑛. 𝑓(𝑛)) Case 3: 𝐼𝑓 𝑎 > 1 𝑡ℎ𝑒𝑛 𝑂(𝑛𝑘 . 𝑎𝑛/𝑏 ) or 𝑂(𝑓(𝑛). 𝑎𝑛/𝑏 ))
9.2. Limitations of Master Theorem Masters’ theorem can only be used for recurrence relations of the form: 𝑇(𝑛) = 𝑎𝑇(𝑛/𝑏) + 𝑓(𝑛) where, 𝑓(𝑛) = 𝑂(𝑛𝑘 log 𝑝 𝑛) for dividing recurrence relation. or 𝑇(𝑛) = 𝑎𝑇(𝑛 − 𝑏) + 𝑓(𝑛) where, 𝑓(𝑛) = 𝑂(𝑛𝑘 ) for decreasing recurrence relation Masters’ theorem cannot be used to solve relation functions if: 1. T(n) is a monotone function For example: 𝑇(𝑛) = cos(𝑥) 2. ‘a’ is not a constant For example: 𝑇(𝑛) = 𝑛𝑇(𝑛/3) + 𝑓(𝑛) 3. f(n) is not a polynomial For example, 𝑓(𝑛) = 2𝑛
Chapter 10
A Note on Empirical Complexity Analysis The sole explanation for dissecting an algorithm regarding its unpredictability is to settle on an earlier choice about the run time conduct of the algorithm to make a clearer image of the ongoing working of the algorithm. Algorithm examination is done as such as to portray the algorithm as per its appropriateness for different applications or in a broader viewpoint to contrast and another comparable algorithm for similar applications. Dissecting an algorithm can assist one with understanding it better and account for development as during the investigation cycle one will in general supplant a more mind-boggling part of the algorithm with an easier form and henceforth bringing about a superior, less complex and minimal algorithm. Hypothetical PC researchers term the examination of algorithms as its computational intricacy where the fundamental objective is to arrange the algorithms as indicated by its multifaceted nature and computational issues as per the trouble acquired by them. Building up an algorithm with negligible asymptotic execution time or with least conceivable number of activities for figuring any given technique are two distinct methodologies consolidated to register the general intricacy of the algorithm as far as execution time. By and large, computation of the intricacy request of an algorithm should be possible by taking a tally of minimal number of tasks needed to figure it. In a useful situation, such orders look bad as the sole spotlight is on request ofdevelopment most pessimistic scenario conduct of the algorithm. A superior and novel way to deal with register the continuous exhibition of an algorithm is to process the empirical computational unpredictability of the algorithm. The experimental investigation is finished by running the program (comprising the algorithm) on a PC and measures its run time multifaceted nature for inputs going in a few sizes and from now on, fitting the watched presentation into a factual model that predicts execution as an element of information size. Empirical complexity is a real time complexity of an algorithm an algorithm and is generally represented by Empirical O. It is the experimental gauge of the non-insignificant and applied weight-based factual bound. A factual bound, in contrast to a scientific bound, gauges the processing activity
98
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
as opposed to checking them and it takes all the tasks by and large and blends them for evaluating the bound. First running a program constituting the algorithm and then fitting a statistical model gives a practical view of the algorithm’s complexity in empirical form. The theoretical complexity estimates a bound on the worst-case (to be precise) run time of algorithm for different values of input say n. This n generally signifies to be approximating infinity. In real picture, we do not deal with infinite number of inputs rather we are interested only with a portion of that infinity. A better approach for dealing with the input which abides well with the application on which the algorithm is to be run is the empirical complexity analysis of an algorithm Empirical-O (O emp). In the empirical analysis, we try to give an empirical complexity bound on the run time of the algorithm by running the algorithm practically on a machine for different increasing input. This predicts the actual behavior of the algorithm. We try to estimate a model to fit on the run time complexity and then through a statistical approach prove that the predicted model properly fits the empirical complexity of the algorithm. Furthermore, the theoretical complexity works on the number of operations and the one that is the most dominant operation in case of compound operations turns out to be the determining factor in the run time complexity. This might not always be the case when the algorithm is run practically on a machine. In compound operations, one operation that seems to be most dominant in the theoretical complexity, may not be that dominant and some other specific operation turns out to be the determining factor. So, we must impose a statistical similarity test for compounding operations. The empirical complexity is although machine dependent but a broader concept for small inputs. An empirical-O is the empirical estimate of non-trivial and conceptual weight-based statistical bound. A statistical bound, unlike a mathematical bound, weighs the computing operations instead of counting them and takes all the operations collectively and mixes them for assessing the bound. The estimate is empirically obtained by giving numerical values to the weights and its credibility depends on how well our computer experiment was designed and how the same was analyzed. We can use statistical approach in our work because theoretical complexity analysis of an algorithm gives a run time assumption on the behavior of the algorithm with varying input. But it may be the case that prediction fails for smaller input. We can take the run time of the algorithm as “weight” by giving the different size of the input. This predicts the actual behavior of the algorithm. To prove the correctness of the empirical estimation “Fundamental Theorem of Finite Difference” can be used.
A Note on Empirical Complexity Analysis
99
10.1. The Fundamental Theorem of Finite Difference The estimated run time empirically can be found by using the Fundamental Theorem of Finite Difference and more precisely the inverse of it which states that if the 𝑛𝑡ℎ difference of a tabulated function is constant with higher differences being almost zero, then the function is a 𝑛𝑡ℎ degree of the polynomial. Mathematically, ∆𝑛 𝑃𝑛 (𝑥) = 𝐶 = 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡 The 𝑛𝑡ℎ difference of a 𝑛𝑡ℎ degree polynomial is Constant. ∆𝑛+𝑟 𝑃𝑛 (𝑥) = 0, 𝑟 = 1,2,3, … … Higher differences of a 𝑛𝑡ℎ degree of a polynomial is zero.
Figure 10.1. Basic flow of the Fundamental Theorem of Finite Difference.
100
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Its converse is also true. This means if the 𝑛𝑡ℎ difference of a tabulated function is constant and the higher differences are zero the function is a 𝑛𝑡ℎ degree of polynomial. The whole flow of the process can be depicted using the chart in Figure 10.1. In the following section we have empirically analyzed the run time complexity of some sorting algorithms such as Merge Sort, Quick Sort, Bubble Sort and Selection Sort.
10.2. Empirical Complexity of Merge Sort Merge sort adopts a Divide and Conquer strategy. It divides the input array to be sorted into two equal halves and then applies the recursive sorting procedure on the divided halves separately. After each divided half is sorted, the algorithm, then applies the merging operation to merge the sorted array. Below is the algorithm for merge sort: MergeSort(arr[], l, r) If r > l Do: Step 1: Divide the array into two halves by finding the middle point of the unsorted array as: middle m = (l+r)/2 Step 2: Apply procedure MergeSort on the 1st half of the divided array as: Call mergeSort(arr, l, m) Step 3: Apply procedure MergeSort on the 2nd half of the divided array as: Call mergeSort(arr, m+1, r) Step 4: Apply Merge procedure on the two halves of Step 2 and 3 as: Call merge(arr, l, m, r) The theoretical time complexity of Merge Sort in the worst case (to be precise in all the cases) is (𝑛𝐿𝑜𝑔𝑛) . Now, we compute the empirical complexity of the Merge Sort by running the program constituting the algorithm for varying and increasing input sizes on a machine and then examining it through the Fundamental Theorem of Finite Difference. To defy the cache hit of the machine, we have to take many trials of the same input. In this case, three different trials have been taken. The result is shown in Table 10.1.
A Note on Empirical Complexity Analysis
101
Table 10.1. Execution time of merge sort Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Trial 1 0.007 0.014 0.032 0.340 0.035 0.055 0.075 0.061 0.071 0.076
Trial 2 0.006 0.023 0.020 0.027 0.037 0.047 0.053 0.062 0.067 0.089
Trial 3 0.0060 0.0130 0.0200 0.0280 0.0436 0.0430 0.0490 0.0790 0.0660 0.0750
Mean (Execution Time) 0.00633 0.01667 0.02400 0.13167 0.03853 0.04833 0.05900 0.06733 0.06800 0.08000
Further, the first differences of the execution time are to be calculated. From the below difference Table 10.2 of the execution time of merge sort, it can be seen that, the first differences is almost constant. Hence by the fundamental theorem of finite difference and by the converse of it (more precisely), a 1st-degree polynomial will approximate the execution time of merge sort. Consequently, the empirical complexity of merge sort can be termed to be Oemp (n). Table 10.2. Difference table of merge sort Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Mean (Execution Time) 0.00633 0.01667 0.02400 0.13167 0.03853 0.04833 0.05900 0.06733 0.06800 0.08000
First Difference Δy 0.01033 0.10767 0.00980 0.00833
0.01200
The empirical approach deals with specific data set as input in hand. One may be interested in the whole universe (population) from where the data comes. So, adopting a linear regression approach which deals with overfitting of data can be used to deal with the whole population of input. Use of nonlinear regression model is not suggested in this case as the run time of Merge Sort
102
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
depends only on a single independent variable i.e., input in terms of the length of the message which in this case is linear. Even though we square a predictor attached to the independent variable, the final function used for fitting the model will be linear. Using the principle of least squares, we will fit a firstdegree polynomial of the form y = a + bx.
Figure 10.2. Fitted line plot and residual plot of the execution time of merge sort.
̂2 is the residual sum of squares which estimates: S = ∑(𝑦 − 𝑦) 𝑦̂ = 𝑎̂ + 𝑏̂ x’ The below Table 10.3 shows the values of ‘𝑎̂’ and ‘𝑏̂’ in the equation 𝑦̂ = 𝑎̂ + 𝑏̂ x.’ Table 10.3. Residual analysis of merge sort Linear model Poly1: 𝑦̂ = 𝑎̂ + 𝑏̂ x’ where x is normalized by mean 2750 and std 1514 Coefficients (with 95% confidence bounds): 𝑎̂ = 0.05399 (0.0299, 0.07807) 𝑏̂ = 0.01891 (-0.006471, 0.0443) Goodness of fit: SSE: 0.008725 R-square: 0.2696 Adjusted R-square: 0.1782 RMSE: 0.03302
The residual table is constructed in order to check the goodness of fit of the fitted model. If more or less a horizontal plot is obtained, then one can say that, the fitted model is actually the empirical estimate of the run time of an algorithm. From the Figure 10.2 above, one can observe that the residual plot
A Note on Empirical Complexity Analysis
103
of the merge sort algorithm almost gives a horizontal pattern. So, we can with precision say that the empirical run time of merge sort is Oemp (n). The residual table of the merge sort is shown below: Table 10.4. Residual table of merge sort Input
Y
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.00633 0.01667 0.02400 0.13167 0.03853 0.04833 0.05900 0.06733 0.06800 0.08000
̂ 𝒚 0.05411 0.054305 0.054444 0.056480 0.054719 0.054904 0.055106 0.055263 0.055276 0.055503
Residual ̂) (y - 𝒚 -0.04778 -0.03764 -0.03044 0.07519 -0.01619 -0.00657 0.00389 0.01207 0.01272 0.02450
Adopting a similar strategy, one can compute the empirical complexity of algorithms. For better understanding the approach, the empirical complexity of three more sorting algorithms viz. Bubble Sort, Selection Sort and Quick Sort is computed in the following sections.
10.3. Empirical Complexity of Quick Sort The worst-case theoretical complexity of Quick Sort algorithm is O (n2). Let us compute the empirical complexity of this algorithm following a similar procedure as discussed in the preceding section. Taking 3 different trials for each instance and the mean of these 3 trials as the execution time of the Quick Sort algorithm we get the following Table 10.5. Subsequently, the first differences of each instance as shown in Table 10.6 is computed which shows more or less a constant value. So, in this case also, the empirical run time can be taught to be linear with the input size i.e., Oemp (n).
104
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Table 10.5. Execution time of quick sort Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Trial 1 0.007 0.011 0.017 0.022 0.028 0.035 0.039 0.046 0.062 0.056
Trial 2 0.004 0.013 0.02 0.022 0.028 0.034 0.043 0.045 0.054 0.058
Trial 3 0.005 0.011 0.017 0.023 0.028 0.04 0.041 0.045 0.052 0.058
Mean (Execution Time) 0.0053333 0.0116667 0.0180000 0.0223333 0.0280000 0.0363333 0.0410000 0.0453333 0.0560000 0.0573333
Table 10.6. Difference table of quick sort Input
Mean (Execution Time)
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.0053333 0.0116667 0.0180000 0.0223333 0.0280000 0.0363333 0.0410000 0.0453333 0.0560000 0.0573333
First Difference Δy 0.0063333 0.0043333 0.0083333 0.0043333
0.0013333
The statistical model fitting and residual plot of the Quick Sort algorithm in shown below in Figure 10.3.
Figure 10.3. Fitted line plot and residual plot of the execution time of quick sort.
A Note on Empirical Complexity Analysis
105
We leave behind the calculation of residual value and interested readers can calculate it themselves as an exercise. The residual plot of the Quick Sort algorithm shows an almost horizontal pattern and hence the fitted model can be termed to be the best fit. Henceforth, we can say that the empirical computational complexity of the Quick Sort algorithm is Oemp (n).
10.4. Empirical Complexity of Bubble Sort Bubble Sort follows a simple sorting strategy by repeatedly swapping the adjacent elements if they are in incorrect order. The worst-case execution time of Bubble Sort algorithm theoretically is O (n*n). Let us compute the empirical run time of the Bubble Sort algorithm by following a similar procedure. The resultant execution time of each trial and its mean execution time are shown below in Table 10.7 and the differences are shown in the Table 10.8 below. In case of Bubble Sort one can see that it’s not the first difference that is constant, rather the second differences are almost constant. So, a 1st degree polynomial cannot estimate the execution time of Bubble Sort and rather a 2nd degree polynomial will however estimate the execution time of Bubble Sort. Hence, we can term that the empirical execution time of Bubble Sort is O (n2). Let us check the validity of our estimate by fitting a Statistical model of the form y = ax2 + bx + c and checking its Goodness of Fit using a residual plot. Table 10.7. Execution time of bubble sort Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Trial 1 0.03 0.14 0.31 0.56 0.86 1.25 1.73 2.61 2.82 4.30
Trial 2 0.032 0.137 0.309 0.566 0.872 1.253 2.261 2.466 2.814 3.465
Trial 3 0.033 0.137 0.307 0.552 0.869 1.245 1.697 2.237 2.824 4.233
Mean (Execution Time) 0.03267 0.13733 0.30833 0.55767 0.86567 1.25067 1.89500 2.43833 2.81900 3.99933
106
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
Table 10.8. Difference table of bubble sort Input
Mean (Execution Time)
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.03267 0.13733 0.30833 0.55767 0.86567 1.25067 1.89500 2.43833 2.81900 3.99933
First Difference Δy 0.10467
Second Difference Δy2 0.14467
0.24933 0.13567 0.38500 0.15833 0.54333 0.63700 1.18033
The Fitted Line Plot and the Residual plot of the estimated empirical bound are shown below in Figure 10.4. It can be observed from the diagram above that a 2nd degree polynomial estimate well the empirical execution time of Bubble Sort whose goodness can be tested by the horizontal residual plot as depicted above. Henceforth, one can argue that the empirical complexity of the Bubble Sort algorithm is Oemp (n2).
Figure 10.4. Fitted line plot and residual plot of the execution time of bubble sort.
10.5. Empirical Complexity of Selection Sort The Selection Sort algorithm sorts an array by repeatedly finding the minimum value in the unsorted array and putting it at the beginning. Theoretically the time complexity of performing the sorting operation in the worst case using
A Note on Empirical Complexity Analysis
107
the Selection sort algorithm comes to be O (n2). Let us compute the empirical run time of the Selection Sort algorithm by following a similar procedure. The resultant execution time of each trial and its mean execution time are shown below in Table 10.9 and the differences are shown in the Table 10.10 below. Table 10.9. Execution time of selection sort Input 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Trial 1 0.030 0.094 0.235 0.429 0.429 0.940 1.170 1.487 2.055 2.155
Trial 2 0.027 0.116 0.114 0.391 0.572 0.887 1.324 1.648 1.864 2.283
Trial 3 0.0240 0.1520 0.1320 0.3700 0.5710 0.8710 1.4110 1.6540 1.7230 2.1590
Mean (Execution Time) 0.0270 0.1207 0.1603 0.3967 0.5240 0.8993 1.3017 1.5963 1.8807 2.1990
Table 10.10. Difference table of selection sort Input
Mean (Execution Time)
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.0270 0.1207 0.1603 0.3967 0.5240 0.8993 1.3017 1.5963 1.8807 2.1990
First Difference Δy 0.0937 0.2363 0.3753 0.2947 0.3183
In case of Selection Sort also one can observe that the first differences are almost constant. So, a 1st degree polynomial is enough to estimate the empirical complexity of the algorithm. Hence, one can say that the empirical complexity of the Selection Sort algorithm is Oemp (n). Let us check the validity of our estimate by fitting a Statistical model of the form y = ax + b and checking its Goodness of Fit using a residual plot.
108
Soubhik Chakraborty, Prashant Pranav, Naghma Khatoon et al.
The Fitted Line Plot and the Residual plot of the estimated empirical bound are shown below in Figure 10.5:
Figure 10.5. Fitted line plot and residual plot of the execution time of selection sort.
As depicted above, it can be seen that a 1st degree polynomial estimates well the empirical execution time of the Selection Sort algorithm. But the Residual Plot does not depict a horizontal pattern. So, the fitted model cannot be the best fit of the execution time of Selection Sort. However, we argue that its empirical complexity is Oemp (n).
References
Chakraborty, S and Sourabh, S. K., A Computer Experiment Oriented Approach to Algorithmic Complexity, Lambert Academic Publishing, 2010. Chakraborty, S and Sourabh, S. K., On Why an Algorithmic Time Complexity Measure can be System Invariant rather than System Independent, Applied Mathematics and Computation, Vol. 190(1), 2007, p. 195-204, Elsevier Sc. Koehler J. R. and Owen A. B. (1996) ‘Computer experiments. In S. Ghosh and C. R. Rao (Eds.), Handbook of Statistics 13: Designs and Analysis of Experiments’ Elsevier Science B. V., North- Holland, Amsterdam pp. 261 – 308. Fang K. T. and Lin D. K. J. (2003) ‘Uniform experimental design and its application in industry. In Khattree R. and Rao C. R. (Eds.): Handbook of Statistics: Statistics in Industry’ Elsevier, North-Holland pp. 131 – 170. Fang K. T., Li R. and Sudjianto A. (2006) ‘Design and Modeling for Computer Experiments’ Chapman and Hall, New York. Mahmoud, H., Sorting: A Distribution Theory, John Wiley and Sons, 2000. Pranav, P, Dutta. S & Chakraborty, S., (2021), Empirical and statistical comparison of intermediate steps of AES-128 and RSA in terms of time consumption. Soft Computing, Springer https://doi.org/10.1007/s00500-021-06085-6. Sacks J., Welch W. J., Mitchell T. J. and Wynn H. P. (1989), Design and Analysis of Computer Experiments, Statistical Science, Vol. 4(4), pp. 409 – 435. Santner T. J., Williams B. J. and Notz W. I. (2003), The Design and Analysis of Computer Experiments, Springer, New York.
About the Authors
Soubhik Chakraborty Professor Department of Mathematics, Birla Institute of Technology Mesra, Ranchi, Jharkhand, India Dr. Soubhik Chakraborty is currently a professor and ex-Head in the Department of Mathematics, Birla Institute of Technology, Mesra, Ranchi, India. His research interests include algorithm analysis, music analysis and statistical computing in which, apart of publishing several books, monographs and research papers, he has guided several scholars leading to PhD. He is also an acknowledged reviewer associated with ACM, IEEE and AMS. He has received several awards in both teaching and research. Email: [email protected] (Corresponding Author)
Prashant Pranav Assistant Professor Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, Jharkhand, India Dr. Prashant Pranav is currently working as an Assistant Professor in the Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. His research areas include – Analysis of Algorithms, Cryptography, Random Numbers, Computational Musicology and Cloud Computing. He has many research papers published in well-known journals and conferences. He has authored two reference books. Email: [email protected]
112
About the Authors
Naghma Khatoon Assistant Professor Faculty of Computing and Information Technology, Usha Martin University, Ranchi, Jharkhand, India Dr. Naghma Khatoon is currently working as Assistant Professor, Department of Computing and Information Technology at Usha Martin University, Ranchi, India. She received her B.Sc. IT from Ranchi University and M.Sc. IT and Ph.D. from Birla Institute of Technology, Mesra in 2010, 2012 and 2018 respectively. Her research interests are WSN, MANET and IoT. Email: [email protected]
Sandip Dutta Professor Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, Jharkhand, India Dr. Sandip Dutta is currently a Professor and Head in the Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. His research areas include –Cryptography, Biometric Security, MANET Security, Cyber Security and Security issues in Cloud Computing. He has many research papers published in well-known journals and conferences. He has authored one reference book. Email: [email protected]
Index
#
E
0/1 knapsack problem, vii, 42, 43, 54, 55
empirical analysis, vii, 67, 70, 82, 83, 98 empirical complexity, v, vii, 82, 84, 85, 87, 97, 98, 100, 101, 103, 105, 106, 107, 108 empirical complexity analysis, v, vii, 97, 98
A analysis of algorithms, vii assignment problem, vii, 51, 52, 53, 55 asymptotic analysis, vii, 67
B backtracking, v, vii, 6, 9, 10, 45, 46, 47, 51, 53, 55 backtracking approach, vii, 6, 10, 45 Big – Oh, 68 Big - Omega, 69 binary search, vii, 14, 16, 29, 30, 32, 34, 35, 36, 37, 38, 86, 87 branch and bound, vii, 51, 55, 57, 59, 60 brute force algorithms, 1 brute force approach, 1, 9, 45 bubble sort, viii, 100, 103, 105, 106
F feasibility function, 4
G greedy algorithms, v, vii, 19 greedy approach, vii, 4, 19, 55 greedy technique, 4
J job sequencing problem, vii, 20 job sequencing problem with deadline, vii
C
L
candidate set, 4
Las Vegas, 8, 82
D
M
data structures, 29 Dijkstra algorithm, vii, 21, 22, 24 divide and conquer, vii, 2, 3, 12, 14, 17 dynamic programming, v, vii, 5, 6, 27, 28, 32, 33, 39, 40, 41, 42, 55 dynamic programming approach, vii, 6, 42
master theorem, v, vii, 91, 95, 96 merge sort, vii, viii, 3, 17, 18, 83, 100, 101, 102, 103 Monte Carlo, 8, 82
O objective function, 4, 51
114 optimal binary search tree (optimal BST), vii, 29, 32, 33, 39
Q quick sort, vii, 3, 81, 83, 84, 85, 86, 100, 103, 104, 105
R randomized algorithms, v, vii, 6, 81, 82 randomized binary search, vii, 82, 86, 87, 88, 89 randomized quick sort, 82, 83, 84, 85, 86 recurrence relation, vii, 12, 13, 14, 91, 95, 96
Index
S selection function, 4 selection sort, viii, 100, 103, 106, 107, 108 solution function, 4 sorting, viii, 3, 8, 12, 71, 76, 83, 100, 103, 105, 106, 109 stagecoach problem, vii, 27, 28 subset sum problem, vii, 39, 40, 41, 43
T Theta, 69 travelling salesman problem, vii, 51, 59, 61