427 56 3MB
English Pages 232 Year 2019
LINEAR AND INTEGER PROGRAMMING
·..
Sanaullah Khan Abdul Bari Mohammad Faisal Khan
Linear and Integer Programming
Linear and Integer Programming By
Sanaullah Khan, Abdul Bari and Mohammad Faisal Khan
Linear and Integer Programming By Sanaullah Khan, Abdul Bari and Mohammad Faisal Khan This book first published 2019 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2019 by Sanaullah Khan, Abdul Bari and Mohammad Faisal Khan All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-3913-X ISBN (13): 978-1-5275-3913-6
CONTENTS
Preface
vii
Chapter 1. Introduction 1.1 Numbers 1.2 Sets 1.3 Logarithms 1.4 Sequences and Series 1.5 Matrices 1.6 Finding the Inverse of a Matrix 1.7 Vectors 1.8 Linear Independence of Vectors 1.9 Solution of Systems of Simultaneous Linear Equations 1.10 Solution of Non-Homogeneous System 1.11 Differentiation 1.12 Maxima and Minima 1.13 Convex Sets 1.14 Convexity and Concavity of Function 1.15 Optimization 1.16 Mathematical Programming 1.17 Linear Programming Techniques 1.18 Nonlinear Programming Techniques 1.19 Integer Programming 1.20 Applications of Mathematical Programming
1
Chapter 2. Linear Programming 2.1 Introduction 2.2 The Linear Programming (L-P) Model 2.3 Graphical Presentation of L-P Model 2.4 Properties of Feasible Region of an LPP 2.5 Basic and Non-Basic Variables 2.6 The Simplex Method 2.7 The Simplex Algorithm 2.8 Degeneracy 2.9 Finding an Initial Solution: Artificial Variables 2.10 The Revised Simplex Method
69
vi
Contents
2.11 2.12 2.13 2.14 2.15 2.16
Duality Theory Dual Simplex Method Sensitivity Analysis Column Simplex Tableau Lexicographic Dual Simplex Method Problem with Equality Constraints
Chapter 3. Integer Programming 3.1 Introduction 3.2 Methods for Solving IP Problems 3.3 Graphical Representation of an IPP 3.4 Formulating Integer Programming Problems 3.5 Branch and Bound Enumeration 3.6 Search Enumeration
149
Chapter 4. Cutting Plane Techniques 4.1 Introduction 4.2 Basic Approach of Cutting Plane Methods 4.3 Gomory Cut 4.4 Other Properties of Fractional cuts 4.5 Dantzig Cut 4.6 Mixed Integer Cuts 4.7 Dual All Integer Method 4.8 Primal All Integer method 4.9 NAZ Cut for Integer Programming
183
References
217
PREFACE
Linear Programming (LP) is a process in which a real-life problem is transformed into a mathematical model consisting of a linear objective function of several decision variables to be maximized (or minimized) through an algorithmic procedure in the presence of a finite number of linear constraints. If the decision variables are required to be integers, the LP model is called an integer linear programming (ILP) model. A wide variety of real-life problems arising in economic optimization, management science, strategic planning and other areas of knowledge can be easily transformed into the format of linear programming or integer linear programming. Besides their importance in a large number of practical applications, the algorithms of LP and ILP can be visualized through simple geometrical concepts, thus making the algebraic derivations easy to understand. The first chapter of the book consists of the definitions of the terms used in the remainder. The basic concepts are illustrated through simple prototype examples. The reader is required to know intermediate level mathematics. The understanding of the basic concepts present in the development of algorithms for LP and ILP is essential as many real-life problems demand specific adaptations of standard techniques. In Chapter 2 we start by developing the basic concept of linear programming. Dantzig’s simplex method is then developed, and the simple iterative steps of the algorithm are illustrated by solving a numerical example. The concepts of duality and sensitivity analysis are also developed in this chapter. We explain how perturbations in the cost and constraint coefficients affect the shadow prices. In Chapter 3 we discuss integer linear programming. In order to observe the breadth of its applicability, some important situations are first formulated as ILP problems. Then in this chapter we develop the enumerative procedures for solving ILP problems. The well-known branch and bound technique is discussed and illustrated with numerical examples.
viii
Preface
A search enumeration technique suitable for (0-1) IP problems developed by Balas is also discussed in this chapter. In Chapter 4 we develop the cutting plane techniques for solving ILP problems. Gomory’s mixed integer cut is also developed. The primal and dual all integer methods are discussed in detail. The column tableau presentation of an LPP is explained (parallel to the row representation) as it simplifies the addition of new cutting planes when solving an ILP problem by cutting plane techniques. All suggestions for further improvement of the book will be received thankfully by us, to serve the cause of imparting good, correct and useful information to the students. The book is basically written as a textbook for graduate students in operations research, management science, mathematics and various engineering fields, MBA and MCA. It can be used as a semester course in linear and integer programming. Sanaullah Khan Abdul Bari Mohammad Faisal Khan
CHAPTER 1 INTRODUCTION
Linear and integer programming techniques have brought tremendous advancements in the field of optimization. Optimization is the science of selecting the best of many possible decisions in a complex real-life situation. Thus it is required in almost all branches of knowledge today. The development of optimization techniques, and specially the linear and integer linear programming methods, requires some mathematical background at undergraduate level. In order to provide a self-contained treatment in the book, we first explain the important terms used in subsequent chapters. 1.1 NUMBERS Natural Numbers The natural numbers are the numbers 1,2,3,…. with which we count. Integers The integers are the numbers 0, േ1, േ2 etc. These may be considered as particular points along a straight line (the real line) on either side of 0 thought of as distances, measured from an origin at zero (0). It is conventional to place positive numbers to the right of 0 and negative numbers to the left. The successive integers are one unit apart.
-3
-2
-1
0
1
2
3
Whole Numbers The whole numbers are the natural numbers including 0, namely 0,1,2,3 etc.
2
Chapter 1
Prime Numbers Included in the natural numbers are the prime numbers that are divisible by 1 and themselves only, e.g. 1,3,5,7 etc. Rational Numbers If p is an integer and q is a natural number and p and q have no common factor then is a rational number. Integers are rational numbers with q=1.
For example, -5/4 is a rational number since -5 is an integer and 4 is a natural number, and there is no common factor. A rational number can be represented either as a terminating decimal or a non-terminating but repeating (recurring) decimal, e.g. 2/7= .285714285714…. Fraction The term fraction is sometimes used to express numbers of the form p/q where p is an integer and q is a natural number. The numbers may have common factors. For example, -4/6 is a fraction since -4 is an integer and 6 is a natural number. There is a common factor, 2, so that -4/6=-2/3 which is a rational number. Irrational Numbers Numbers such as ߨ = 3.14159…, √2=1.41421…, √3, 2+√3 etc., which are non-repeating or non-terminating decimals, are called irrational numbers. These can only be written down approximately as numbers. (It is not possible to find any integers, p & q such that p/q represents such numbers. e.g. 22/7 is not an exact representation of ߨ, it is only an approximation.) Real Numbers Rational and irrational numbers taken together are the real numbers. e.g. 1, 2.13, 4.2,ߨ, √2 etc. Complex Numbers The numbers such as 2+3i, 4-2i, where i2=-1, are known as complex numbers. (By introducing complex numbers, it is possible to write down the solution of equations such as ݔଶ = -1). For example
െξെͻ ൌ ݅ξͻ ൌ ͵݅: ͵݅ ଶ ൌ െͻ
Introduction
3
Equality and Inequality Real numbers may be thought of as being distributed along a line. In order to perform operations with them, we need certain symbols that enable us to compare two numbers. If a and b are real numbers then: a=b means a is equal to b a>b means a is greater than b (a lies to the right of b on the real line) a3 when multiplied on both sides by 2 gives 8>6; If we add -8 on both sides we get -4>-5; When divided by 4 we get the unchanged inequality 1>3/4. However, if we multiply both sides by -1, the inequality gets reversed as -40 then OA represents –n and OB represents +n The modulus of +n =│+n│=n=length of OB>0.
4
Chapter 1
The modulus of –n =│-n│=n=length of OA =length of OB>0 Alternatively, the modulus of n may be thought of as the positive square root of n2 so that │n│=ξ݊ଶ . Interval A section of the real line is called an interval. The interval a d x d b is said to be closed since the end points are included in it. This closed interval is denoted by [a,b]. The interval a < x < b is said to be open as it does not include the end points and is denoted by (a,b). Factorial The product of all the integers from 1 to n, where n is a whole number, is called n factorial and is denoted by n! Thus 4! = 4.3.2.1 = 24 n! = n (n-1) (n-2) … 2.1 = n (n-1)! 0! = 1 Reciprocal ଵ
The reciprocal of a real number ܽ is b = provided ܽ ് Ͳ. An alternative notation of the reciprocal is ܽ ିଵ . The reciprocal of േλis 0 and the reciprocal of 0 is േ∞.
Proportional and Inversely Proportional A quantity a is proportional to another quantity b (written as a ןb) if a=kb where k is a constant and ܽ is inversely proportional to ܾ written as (ܽ ן ͳȀܾ). Indices If a quantity is written in the form ܣ then b is the index of A.
Introduction
5
Rules for operating with indices are: i. ܣ ൈ ܣ ൌ ܣା
iii. ܣ ൌ ͳǡ Ͳ ് ܣ
ii. ሺܣ ሻ ൌ ܣൈ ൌ ܣ
iv. ିܣ ൌ ͳȀܣ
Field A system of numbers ܨis called a field if knowing that a,b ;ܨ אa+b, a-b, a.b, a/b, ka etc. also belong to ܨ. 1.2 SETS A set is defined as a collection or aggregate of objects. There are certain requirements for a collection or aggregate of objects to constitute a set. These requirements are: (a) The collection or aggregate of objects must be well defined; i.e. we must be able to determine equivocally whether or not any object belongs to this set. (b) The objects of a set must be distinct; i.e. we must be able to distinguish between the objects and no object may appear twice. (c) The order of the objects within the set is immaterial; i.e. the set (a,b,c) is the same as the set (b,c,a). Example- The collection of digits 0,1,2,…,9 written as {0,1,2,3,4,5,6,7,8,9} is a set. Example- The letters in the word MISSISSIPPI satisfy all the requirements for a set as the 4 letters in the word are M, I, S and P well defined, distinct and the order of the letters is immaterial. We use curly brackets or braces “{ }” to designate a set. It is customary to name a set using capital letters such as A, C, S, X, etc. Elements or Members of a Set The objects which belong to a set are called its elements or members. The elements or the members of the set are designated by one of two methods: (1) the roster method (tabular form), or (2) the descriptive method (set builder form).
6
Chapter 1
The roster method involves listing within braces all members of the set. The descriptive method consists of describing the membership of the set in a manner such that one can determine if an object belongs in the set. E.g. in the roster method, the set of digits would appear as D={0,1,2,3,4,5,6,7,8,9} while in the descriptive method it would appear as D={ݔȁ ݔൌ Ͳǡͳǡʹǡ͵ǡͶǡͷǡǡǡͺǡͻ}. The Greek letter ( אepsilon) is customarily used to indicate that an object belongs to the set. If D represents the set of digits, then 2 אD means that 2 is an element of D. The symbol ( בepsilon with slashed line) represents non-membership. i.e. “is not an element of”, or “does not belong to”. Finite and Infinite Sets A set is termed finite or infinite depending upon the number of elements in the set. The set D above is finite, since it has only ten digits. The set N of positive integers or natural numbers is infinite, since the process of counting continues infinitely. Equal Sets Two sets A and B are said to be equal, written A=B, if every element of A is in B and every element of B is in A. For example, the set A={0,1,2,3,4} and B={1,0,2,4,3} are equal. Subset A set A is a subset of another set B if every element in A is in B. For example, If B={0,1,2,3} and A={0,1,2}, then every element in A is in B, and A is a subset of B. The symbol used for subset is ;كA is a subset of B is written as AكB. The set of rational numbers generally denoted by Q has the following important subsets: N = {1,2,3,…}; the set of all counting numbers or natural numbers. W = {0,1,2,3,…}; the set of whole numbers. I = {…-5,-4,-3,-2,-1,0,1,2,…}; the set of integers. F = {a/b│a,b….אN}; the set of all fractions.
Introduction
7
A is a “proper subset” of another set B if all the elements in A are in B, but not all the elements in B are in A. It is designated by the symbol ؿ. For A to be a proper subset of B, B must have all elements that are in A plus at least one element that is not in A; we write AؿB. The Set N of natural numbers is a proper subset of the set I of integers. Universal Set The term “universal set” is used for the set that contains all the elements the analyst will wish to consider. For example, if the analyst were interested in certain combinations of the English letters then the universal set would be all the letters of the English alphabet. Null Set The set which contains no element is called the null set or empty set. It is designated by the Greek letter ߮ (phi). Thus an empty set ߮ = {} is a null set as there is no element in it. It should be noted that {0} is not a null set. It is a set containing 0. The universal set and the null set are subsets of the universal set. Venn Diagram The Venn diagram, named after the English logician John Venn (1834–83), consists of a rectangle that conceptually represents the universal set. Subsets of the universal set are represented by circles drawn within the rectangle, see Fig. 1.1.
A
b
U
Fig 1.1 Fig 1.1 A Venn diagram
8
Chapter 1
Operations on Sets There are certain operations on sets which combine the given sets to yield another set. Such operations are easily illustrated through the use of the Venn diagram, as will be seen below. Complementation Let A be any subset of a universal set U. The complement of A is the subset of elements of U that are not members of A and is denoted by ܣᇱ (read ‘A complement’). Fig. 1.2 shows the Venn diagram for the complement ܣᇱ of the set A. ܣ′ is represented by the shaded region.
Ac A
U Fig 1.2 Complement of a set
Example: For the universal set D = {0,1,2,3,4,5,6,7,8,9}, the complement of the subset A = {0,1,3,5,7,9}, is ܣ′ = { 2,4,6,8 }. Intersection If A and B are any two sets, then the intersection of A and B is the set of all elements which are in A and also in B and is denoted by AתB (read ‘A intersection B’). See Fig. 1.3.
Introduction
9
U B
A
AB
Fig 1.3 Intersection of two sets.
Example: For the universal set D = {0,1,2,3,4,5,6,7,8,9} with subsets A = {0,1,2,5,7,9} and B = {0,3,4,5,9}, the intersection of A and B is given by AתB = {0,5,9}. From the definition it follows that AתU =A. Union The union of the sets A and B is the set of all elements of A together with all the elements of B and is denoted by AB ( read as A union B ) See Fig. 1.4.
A
B
U
AB
Fig 1.4 Union of two sets
Example: For the universal set D = {0,1,2,3,4,5,6,7,8,9} with subset A = {0,1,2,5,7,9} and B = {0,3,4,5,9}, AB = {0,1,2,3,4,5,7,9}.
10
Chapter 1
From the definitions, it follows that AU = U A ܣ′ = U. Disjoint Sets Two sets A and B are said to be disjoint (or mutually exclusive) if there is no element in common, i.e. if AתB =߮. For example, the sets A = {1,2,3} and B = {4,5,6} are disjoint. Cartesian Product If A and B are two sets, then the cartesian product of the sets, designated by AxB, is the set containing all possible ordered pairs (a,b) such that a A and bB. If the set A contains the elements a1,a2 and a3 and the set B contains the elements b1 and b2, then the cartesian product AxB is the set AxB = {(a1,b1), (a1,b2),(a2,b1),(a2,b2),(a3,b1),(a3,b2)} or AxB = ቄ ሺܽǡ ܾሻȁܽ ܣ אǡ ܾ ܤ אቅ 1.3 LOGARITHMS Definition: If for a positive real number a (a>0), a m = b, we say that m is the logarithm of b to the base a. In symbols we write m = logab e.g. log39 = 2 (because 32 =9). The logarithm to base 10 of A (=10m) is defined to be the index m so that log10A=m, The basic concept used in working with logarithms to base 10 is that of expressing a real positive number A in the form A =10 m. By writing another number B as 10n, we have AB = (10m)(10n) = 10m+n. Thus multiplication of A and B is transformed by this process into the addition of their indices m and n. log10 (AB) = m+n
Introduction
11
These concepts may be generalized to the logarithm to any base ܽ, where ܽ Ͳ. Let ܣൌ ܽெ and ܤൌ ܽே , Then ݈݃ ܣൌ ܯand ݈݃ ܤൌ ܰ From the laws of indices, we note that (i) ݈݃ ͳ ൌ Ͳ as ܽ ൌ ͳ (ii) ݈݃ ܽ ൌ ͳ as ܽଵ ൌ ܽ (iii) For ܽ ͳǡ ݈݃ Ͳ ൌ െλ as ܽ ି∞ ൌ Ͳ (iv) For ܽ ൏ ͳ,
log a 0
f
as ܽ∞ ൌ Ͳ
(v) ݈݃ଵ ͳ ͲͲ ൌ ʹ as ͳͲଶ ൌ ͳͲͲ (vi) ݈݃ଵ Ǥ ͲͲͳ ൌ െ͵ as ͳͲିଷ ൌ ǤͲͲͳ (vii) ݈݃ ሺ ܤܣሻ ൌ ݈݃ ܣ ݈݃ ܤ
(viii) ݈݃ ቀ ቁ ൌ ݈݃ ܣെ ݈݃ ܤ
(ix) ݈ܣ ݃ ൌ ܣ ݈݃ ܤ (x) ݈݃ ܣൌ ݈݃ ܣǤ ݈݃ ܾ (xi) ݈݃ଵ ܣൌ ݈݃ ܣൈ ʹǤ͵Ͳʹ. The most commonly occurring bases are 10 and the irrational number e = 2.71828…… The number e gives rise to natural or Napierian logarithms with loge x denoted by ln x. 1.4 SEQUENCES AND SERIES Sequences A sequence {xn} is an ordered set of symbols of the form x1, x2,……, xn ,…,
12
Chapter 1
where n is a number, and is such that each value of x is associated in turn with the natural numbers. Thus x1 is associated with the number 1, x2 with the number 2, and in general the term xn is associated with number n. Some simple examples of sequences are: {2n}, for which x1 = 2, x2 = 4,…,xn = 2n,…and {xn}, which stands for the sequence x1,x2,x3,..., xn,... Series Given a sequence ሼݔ ሽ, it is possible to form a second sequence denoted by {sn} with terms sn that are the sums of the first n terms of the sequence {xn}. That is, Sn = x1 + x2 + … +xn-1 + xn = Sn-1 + xn The terms sn are sometimes referred to as finite series. For any given sequence {xn}, it is possible to write down an infinite series S Where S = x1+ x2+ x3+ …+ xn+… The Arithmetic Series The sequence {xn} with xn = a + ( n-1) d, where ܽ and ݀ are some constants is called an arithmetic progression (AP). A sequence ሼܵ ሽ may be constructed by summing the terms of the sequence {xn}, as Sn = a +(a +d) + (a + 2d) + … + [a + (n-1)d]. The quantity ሼܵ ሽ is called a arithmetic series of order n. The Geometric Series The sequence {xn} defined by xn =ܽ ݎିଵ , where ܽ is a number and ݎ Ͳ is the common ratio, is called a geometric progression (GP). The sequence ሼܵ ሽwhere ܵ ൌ ܽሺͳ ݎ ڮ ݎିଵ ሻ ൌ ܽ geometric series.
ିଵ ିଵ
for ͳ ് ݎǤ is called a
Introduction
13
1.5 MATRICES Matrix The matrix is a rectangular array of numbers enclosed by brackets [ ] or ( ). The number of rows and columns determine the dimension or order of the matrix. A matrix with ݉rows and ݊ columns is called an ݉ ൈ ݊ matrix. If ܽ is the element in the ݅ ௧ row and ݆௧ column of a given matrix A then we write ܣൌ ሾܽ ሿǡ ͳ ݅ ݉ǡ ͳ ݆ ݊Ǥ ൈ
A matrix in which the number of rows equals to the number of columns (m=n) is called a square matrix. A square matrix in which at least one diagonal element is non-zero and the rest of the elements are zero is called a diagonal matrix. A diagonal matrix having all its diagonal elements equal to one is called an identity (or a unit) matrix and is denoted by I or In. ͳ For example, I3 =Ͳ Ͳ
Ͳ ͳ Ͳ
Ͳ Ͳ൩ ͳ
Two matrices ܣൌ ൫ܽ ൯ and ܤൌ ሺܾ ሻ of the same order, say ݉ ൈ ݊, are equal if ܽ ൌ ܾ for all ݅ ൌ ͳǡ ǥ ǡ ݊ and ݆ ൌ ͳǡ ǥ ǡ ݊Ǥ A matrix is called a null matrix if all its elements are equal to zero. A square matrix is called an upper triangular matrix if all the entries below the diagonal are zero. Similarly, a lower triangular matrix is one in which all the elements above the diagonal are zero. Addition and Subtraction The sum of matrices ܣൌ ሾܽ ሿ and ܤൌ ሾܾ ሿ is defined only when A and B have the same order. The sum is then A+B= [aij + bij] = ሾܿ ሿ A-B is defined as A+(-B). It is called the difference of A and B.
14
Chapter 1
Multiplication by a Scalar A matrix may be multiplied by a scalar quantity. Thus, ifߣ is a scalar and A is an ݉ ൈ ݊matrix, then ܣis a matrix, each element of which is λ times the corresponding element of A. Matrix Multiplication Two matrices may be multiplied together only if the number of columns in the first matrix equals the number of rows in the second matrix. Let A be an ݉ ൈ ݊ matrix and B be an ݊ ൈ matrix. Then the product is defined as AB=C=ሺܿ ሻ௫ , where ܿ ൌ σୀଵ ܽ ܾ for i=1,2,…m; j=1,2,….p Laws of Algebra for Multiplication (i) A (BC)=(AB) C : Associative law holds. (ii) A (B+C)=AB + AC : Distributive law holds. Transpose of a Matrix The transpose of a matrix A of order ݉ ൈ ݊is the matrix of order ݊ ൈ ݉ , denoted by ்ܣor ܣᇱ , such that the (i,j)th element of A is the (j,i)th element of ܣ′ . The following properties can be shown to hold: ′
(i) ሺ ܣ ܤሻ′ ൌ ܣ′ ܤ′ ൫ ܣ ܤ൯ ൌ ܣ′ ܤ′ (ii) ሺܤܣሻ′ ൌ ܤ′ ܣ′ (iii) ሺܣ′ ሻ′ ൌ ܣ (iv) ሺ݇ܣሻ′ ൌ ݇ܣ′ Symmetric Matrix A square matrix ܣis said to be symmetric if it is equal to its transposed matrix, i.e. if ܣ= ܣᇱ . A symmetric matrix can be constructed by multiplying the given matrix by its transpose. Thus ܣᇱ ܣand ܣܣ′ are both symmetric.
Introduction
15
Skew Symmetric Matrix A matrix ܣis said to be skew symmetric if ܣൌ െܣᇱ . It is always a square matrix whose diagonal elements are zero. It can be shown that any matrix ܣ can be written in the form ܣൌ
ͳ ͳ ሺ ܣ ܣ′ ሻ ሺ ܣെ ܣ′ ሻ ʹ ʹ
Orthogonal Matrix A square matrix ܣis said to be orthogonal if ܣܣ′ ൌ ܫ Trace of a Matrix The trace of a square matrix ܣ, denoted by trܣ, is defined as the sum of its diagonal elements, i.e. tr A =σ ܽ . (i) If A and B are square matrices of the same order then tr(A+B)=trA + trB (ii) If C and D are such that CD and DC are both defined, then CD and DC are both square, and tr CD=tr DC. Non-Singular Matrices A matrix A is called non-singular if there exists a matrix B, such that AB=BA=I. It follows that both A and B must be square matrices of the same order. Inverse of a Matrix Given a matrix A, if there exists a square matrix B such that AB=BA =I, then B is called the inverse of A. It is denoted byAെͳ. B is the left inverse of A if BA =I. C is the right inverse of A if AC =I. The following properties can be shown to hold: (i). ሺܤܣሻିଵ ൌ ି ܤଵ ିܣଵ (ii). ሺିܣଵ ሻ′ ൌ ሺܣ′ ሻିଵ (iii). An inverse matrix is unique
16
Chapter 1
(iv). ሺିܣଵ ሻିଵ ൌ ܣ (v) A zero matrix has no inverse. Remark: If a matrix does not have an inverse it is said to be singular. Determinant of a Matrix Associated with any square matrix ܣൌ ሾܽ ሿ having ݊ଶ elements is a number called the determinant of ܣand denoted by ȁܣȁ or det. ܣgiven by ȁܣȁ ൌ σሺേሻܽଵ ܽଶ ǥ ܽ , the sum being taken over all permutations of the second subscripts. The assignment of the + or – sign is explained below. The determinant of a 2x2 matrix ܣଶ is given by ܽ ȁܣଶ ȁ ൌ ቚܽଵଵ
ଶଵ
ܽଵଶ ܽଶଶ ቚ ൌ ܽଵଵ Ǥ ܽଶଶ െ ܽଶଵ Ǥ ܽଵଶ
The determinant of a 3x3 matrix ܣଷ is given by ܽଵଵ ܽଵଶ ܽଵଷ ܽ ȁܣଷ ȁ ൌ อ ଶଵ ܽଶଶ ܽଶଷ อ ൌ ܽଵଵ ܽଶଶ ܽଷଷ ܽଵଶ ܽଶଷ ܽଷଵ ܽଵଷ ܽଶଵ ܽଷଶ െ ܽଷଵ ܽଷଶ ܽଷଷ ܽଷଵ ܽଶଶ ܽଵଷ െ ܽଷଶ ܽଶଷ ܽଵଵ െ ܽଷଷ ܽଶଵ ܽଵଶ Ǥ The determination of a plus or minus sign is easily done by employing the following scheme. First repeat the first two columns of the matrix; next draw diagonals through the components as shown below:
The products of the components on the diagonals from upper left to lower right are summed, and the products of the components on the diagonals from lower left to upper right are subtracted from this sum. ȁܣȁ ൌ ܽଵଵ ܽଶଶ ܽଷଷ ܽଵଶ ܽଶଷ ܽଷଵ ܽଵଷ ܽଶଵ ܽଷଶ െ ܽଷଵ ܽଶଶ ܽଵଷ െ ܽଷଶ ܽଶଷ ܽଵଵ െ ܽଷଷ ܽଶଵ ܽଵଶ
Introduction
17
Note that, if the square matrix is of order 1, ȁܣȁ is not the absolute value of the number A. E.g. for the matrix of order 1 given by A= (-3), ȁܣȁ ൌ െ͵Ǥ ͳ Example: find the determinant of ܣൌ ൭ʹ ͵
͵ െͳ Ͳ
െͶ ൱ െʹ
ȁܣȁ ൌ ʹ ͷͶ Ͳ െ ͳʹ െ Ͳ ͳʹ ൌ ͷ Determinants have the following properties: (i). หܣ′ ห ൌ ȁܣȁ (ii). ȁܣȁ ൌ Ͳ if and only if A is singular. (iii). ȁܤܣȁ ൌ ȁܣȁȁܤȁ (iv). If two rows (or two columns) are interchanged then the value of the determinant changes only in sign. (v). If each element in a row (or column) is multiplied by a scalar then the value of the determinant is also multiplied by the scalar. It follows that for an ݊ ൈ ݊ matrix ܣ, ȁ݇ܣȁ ൌ ݇ ȁܣȁǤ, where k is a scalar. (vi). If a multiple of one row is added to another, the value of the determinant remains the same (a similar result also holds for the columns). (vii). The value of the determinant of an upper triangular matrix is given by the product of the elements on the leading diagonal.
Because of this last property, an efficient way of calculating the value of a determinant is by reducing it to triangular form. The reduction to a triangular
18
Chapter 1
form is done by using the properties (iv), (v) and (vi) , known as elementary transformations. Minors The determinant of the sub-matrix formed by deleting one row and one column from a given square matrix is called the minor associated with the element lying in the deleted row and deleted column. Consider a ͵ ൈ ͵ matrix ܣgiven by ܽଵଵ ܣൌ ൭ܽଶଵ ܽଷଵ
ܽଵଶ ܽଶଶ ܽଷଶ
ܽଵଷ ܽଶଷ ൱. ܽଷଷ
Then the minor associated with ܽଵଵ will be ܽଶଶ ܽଶଷ ܯଵଵ ൌ ቚܽ ܽଷଷ ቚ , which is obtained by deleting the first row and first ଷଶ column of A. Cofactors A cofactor is a minor with its proper sign, where the sign is determined by the subscripts of the element with which the cofactor is associated. If the sum of the subscripts is even then it will have a plus sign, and if the sum is odd then it will have a minus sign. ܽଵଵ ܽ For example, in the matrix ܣൌ ൭ ଶଵ ܽଷଵ
ܽଵଶ ܽଶଶ ܽଷଶ
ܽଵଷ ܽଶଷ ൱, ܽଷଷ
ܽଶଶ the cofactor of ܽଵଵ is ቚܽ ଷଶ
ܽଶଷ ܽଷଷ ቚ while the cofactor of ܽଵଶ is ܽଶଵ ܽଶଷ െ ቚܽ ܽଷଷ ቚ. ଷଵ
The Cofactor Matrix Associated with a matrix A is another matrix whose elements are the cofactors of the corresponding elements of A. Such a matrix is called the cofactor matrix. The Adjoint Matrix The transpose of the cofactor matrix is called the adjoint matrix.
Introduction
19
݆݀ܣǤ ܣൌ ሾܿݎݐ݂ܿܽǤ ܣሿ′ 1.6 FINDING THE INVERSE OF A MATRIX Let ܣbe a non-singular matrix, i.e. ȁܣȁ ≠ 0. It can be shown that the inverse of ܣis given by ିܣଵ =
ௗ ȁȁ
where ܣ݆݀ܣis the adjoint matrix of ܣ. Example: Let ܣൌ ቀ Ͷ ܣ݆݀ܣൌ ቀ െ͵
Ͷ െ͵ ቁǤ Thenȁܣȁ ൌ ͵ǡ ܣݎݐ݂ܿܽܥൌ ቀ ͵ Ͷ
͵
ସ
͵ ଷ ቁ. Thus ିܣଵ ൌ ቌିଷ ଷ
ଷ
െ͵ ቁǡ
ଷ ቍ ଷ
Computing the inverse directly by using this formula or by using the definition of the inverse is quite lengthy as can be seen in the following simple case: consider the ʹ ൈ ʹ matrix ܣൌቀ
Ͷ ͳ
ʹ ቁ ͵
We know that the inverse of A is another matrix ିܣଵ such that ିܣܣଵ ൌ ܫ. Let ܤൌ ିܣଵ ൌ ൬
ܾଵଵ ܾଶଵ
ܾଵଶ ൰ ܾଶଶ
Then by definition, ିଵ ൌ ቀ
Ͷ ͳ
ʹ ܾଵଵ ቁ൬ ͵ ܾଶଵ
ܾଵଶ ͳ ൰ൌቀ ܾଶଶ Ͳ
Ͳ ቁ ͳ
From the definition of the equality of matrices, we get Ͷܾଵଵ ʹܾଶଵ ൌ ͳ Ͷܾଵଶ ʹܾଶଶ ൌ Ͳ ͳܾଵଵ ͵ܾଶଵ ൌ Ͳ ͳܾଵଶ ͵ܾଶଶ ൌ ͳ
20
Chapter 1
These are four equations in four unknowns, which on solution give ܾଵଵ ൌ
͵ ͳ ʹ ǡ ܾଵଶ ൌ െ ǡ ܾଶଵ ൌ െͳǡ ܾଶଶ ൌ ͳͲ ͷ ͷ
Hence, ିଵ
ܣ
ܾ ൌ ൬ ଵଵ ܾଶଵ
ଷ
ܾଵଶ ൰ ൌ ቌ ଵଵ ܾଶଶ െ
ଵ
െ െ
ଵ ହ ଶቍ ହ
We can check our computations as
ିܣܣଵ ൌ ቀ
Ͷ ʹ
͵ ʹ ͳͲ ቁ൮ ͵ െͳ ͳͲ
െͳ ͷ ൲ ൌ ቀͳ ʹ Ͳ ͷ
Ͳ ቁ ͳ
It is obvious that finding the inverse of a matrix of order higher than 2x2 would be a rather cumbersome task using the direct method. The Gaussian Elimination procedure described below is a systematic procedure for higher order matrices. Finding the Inverse by the Gaussian Elimination Method Consider an ݊ ൈ ݊ matrixܣ. The procedure starts by augmenting the matrix A with the identity matrix ܫ . The ݊ ൈ ʹ݊augmented matrix ܣȁܫ then undergoes the following row operations until the matrix ܣis transformed to ܫ and the augmented identity matrix ܫ is transformed to a matrix ܤ. This new matrix ܤwill be the inverse of the matrixܣ. The row operations are: (i) adding a multiple of one row to another; (ii) multiplying a row by a nonzero constant; (iii) interchanging the rows. The row operations on (ܣȁܫሻare carried out such that ܣis first made upper triangular; then the elements above the diagonal are made zero; and finally the diagonal elements are made to equal one. If A is singular, this method results in a zero row being produced in the left hand part of the augmented matrix.
Introduction
ͳ Let us find the inverse of A =൭ͳ ʹ
ͳ ʹ
21
ʹ ͵൱ , if it exists. ͻ
The Augmented matrix is ͳ ൭ͳ ʹ
ͳ ʹ
ʹ ͳ ͵ อͲ ͻ Ͳ
Ͳ ͳ Ͳ
Ͳ Ͳ൱ ͳ
Step 1. Subtract the first row from the second row. This operation is denoted as ܴଶ ൌ ܴଶ െ ܴଵ . Also subtract twice the first row from the third. This is denoted as ܴଷ ൌ ܴଷ െ ʹܴଵǤ The two operations make the off-diagonal elements in the first column equal to zero. The new matrix is ͳ ൭Ͳ Ͳ
ͳ ͳ Ͷ
ʹ ͳ ͳ อെͳ ͷ െʹ
Ͳ ͳ Ͳ
Ͳ Ͳ൱ ͳ
The other steps are given below. Step 2. ܴଷ ൌ ܴଷ െ Ͷܴଶ ǡ ͳ ൭Ͳ Ͳ
ͳ ͳ Ͳ
ʹ ͳ ͳ อെͳ ͳ ʹ
Ͳ ͳ െͶ
Ͳ Ͳ൱ ͳ
Step 3. ܴଵ ൌ ܴଵ െ ʹܴଷ ǡ ܴଶ ൌ ܴଶ െ ܴଷ ǡ ͳ ൭Ͳ Ͳ
ͳ ͳ Ͳ
Ͳ െ͵ Ͳ อെ͵ ͳ ʹ
ͺ ͷ Ͷ
െʹ െͳ൱ ͳ
22
Chapter 1
Step 4. ܴଵ ൌ ܴଵ െ ܴଶ ͳ ൭Ͳ Ͳ
Ͳ ͳ Ͳ
Ͳ Ͳ Ͳ อെ͵ ͳ ʹ
͵ ͷ െͶ
െͳ െͳ൱ ͳ
The ܣmatrix has now been transformed to ܫଷ . Ͳ Therefore, ିܣଵ ൌ ൭െ͵ ʹ
͵ ͷ Ͷ
ͳ െͳ െͳ൱, which is the inverse of A =൭ͳ ͳ ʹ
ͳ ʹ
ʹ ͵൱. ͻ
Check: ͳ ൭ͳ ʹ
ͳ ʹ
ʹ Ͳ ͵൱ ൭െ͵ ͻ ʹ
͵ ͷ െͶ
ͳ െͳ െͳ൱ ൌ ൭Ͳ ͳ Ͳ
Ͳ ͳ Ͳ
Ͳ Ͳ൱. ͳ
1.7 VECTORS A vector is a row (or column) of numbers in a given order. The position of each number within the row (or column) is meaningful. Naturally a vector is a special case of a matrix in that it has only a single row (or a single column). It is immaterial whether a vector is written as a row or a column. Vectors are generally denoted by lower case bold letters. The components bear suffixes for specifying the order. A vector is usually thought of as a column of numbers. The row representation will be denoted by its transpose: ݔଵ ݔൌ ൭ ڭ൱ ǡ ் ݔൌ ሺݔଵ ǡ ǥ ǡ ݔ ሻ ݔ Thus in general, a row vector is a ͳ ൈ ݊matrix, where n=2,3,….. and a column vector is an ݉ ൈ ͳmatrix, where m=2,3….. The numbers in a vector are referred to as the elements or components of the vector. A vector consists of at least two components.
Introduction
23
Unit Vectors A unit vector is a vector in which one element is unity while the rest of the elements are zeros. A unit vector in which the ݅ denoted by ݁ .
݄ݐ
element is unity is
Thus ݁ଵ ൌ ሺͳǡͲǡͲǡ ǥ ǡͲሻǡ ݁ଶ ൌ ሺͲǡͳǡͲǡͲǡ ǥ ǡͲሻǡ ǥ ǡ ݁ ൌ ሺͲǡͲǡ ǥ ǡͲǡͳሻ. Null Vector A null vector is a vector in which all the elements are zero. Ͳ Ͳ Ͳ ൌ ሺͲǡͲǡ ǥ ǡͲሻ or Ͳ ൌ ൮ ൲. A null vector is also called a zero vector. ڭ Ͳ Graphical Representation of Vectors A vector can be represented as a directed line segment on a coordinate system. To illustrate this, let us consider a three-component vector ܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ܽଷ ሻ.We can represent it geometrically on a 3-dimensional rectangular coordinate system as shown in Fig. 1.5.
Fig 1.5 Geometrical representation of a vector
The directed line segment joining the origin of the coordinate system to point ሺܽଵ ǡ ܽଶ ǡ ܽଷ ሻ is the geometric representation of the vector ܽ.
24
Chapter 1
This representation of a vector has an analogy with the concept of a vector in physics or engineering, defined for representing quantities which have a magnitude and a direction, such as force. The magnitude of the vector is the length of the line segment from the origin to the point ሺܽଵ ǡ ܽଶ ǡ ܽଷ ሻ, given byඥܽଵଶ ܽଶଶ ܽଷଶ . The direction is described by the angles ߠଵ ǡɅଶ Ʌଷ that this line makes, respectively, with the three axes ݔଵ ǡଶ ଷ Ǥ Operations with Vectors Two n-݊ െcomponent vectors ܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ሻܾ ൌ ሺܾଵ ǡ ܾଶ ǡ ǥ ǡ ܾ ሻܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ሻܾܽ݊݀ ൌ ሺܾଵ ǡ ܾଶ ǡ ǥ ǡ ܾ ሻ are said to be equal if and only if ܽ ൌ ܾ ǡ ݅ ൌ ͳǡ ǥ ǡ ݊. Further, ܽ ܾ means that ܽ ܾ ǡ ݅ ൌ ͳǡ ǥ ǡ ݊. and ܽ ܾ means thatܽ ܾ ǡ ݅ ൌ ͳǡ ǥ ǡ ݊. and ܽ ܾ for at least one ݆. Addition Two given vectors can be added only if both have the same number of components. The sum of the two vectors ܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ሻܾ ൌ ሺܾଵ ǡ ܾଶ ǡ ǥ ǡ ܾ ሻܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ሻܾܽ݊݀ ൌ ሺܾଵ ǡ ܾଶ ǡ ǥ ǡ ܾ ሻ is defined to be the vector ܽ ܾ ൌ ሺܽଵ ܾଵ ǡ ܽଶ ܾଶ ǡ ǥ ǡ ܽ ܾ ሻ Multiplication by a Scalar Let ߣ be a scalar and ܽ ൌ ሺܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ሻ. The product of ߣ and ܽ is given by ߣܽ ൌ ሺߣܽଵ ǡ ߣܽଶ ǡ ǥ ߣܽ ሻ Linear Combination Given two vectors ݑଵ ܽ݊݀ݑଶ , we can always perform the operations of addition and multiplication by a scalar to obtain vectors of the form ݑଵ ݑଶ ǡ ͵ݑଵ ʹݑଶǡ ʹݑଵ െ ͵ݑଶ etc. Such vectors are called linear combinations of ݑଵ ݑଶ ݑଵ ܽ݊݀ݑଶ .
Introduction
25
In general, given m n-component vectors ݑଵ ǡ ݑଶ ǡ Ǥ Ǥ Ǥ Ǥ ǡ ݑ , the n-component vector ݑൌ σ ୀଵ ߣ ݑ ൌ ߣଵ ݑଵ ڮ ߣ ݑ is called a linear combination of ݑଵ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ǡ ݑ where ߣ ǡ i = 1,2,…,m are scalars (real numbers). If the scalars ߣ ߣ ǡ satisfy ߣ Ͳǡ i = 1,2,…,m and σ ୀଵ ߣ ൌ ͳ, then ݑൌ σ ߣ ݑ will be called a convex linear combination of ݑଵ ǡ ݑଶ ǡ Ǥ Ǥ Ǥ Ǥ ǡ ݑ . ୀଵ Scalar Product or Inner Product Any two vectors ݒݑݒݑcan be multiplied. The result of this multiplication is a real number called the inner product of the two vectors. It is defined as follows: The scalar product or the inner product of two n-component vectors ݑൌ ሺݑଵ ǡ ǥ ݑ ሻ and ݒൌ ሺݒଵ ǡ ǥ ݒ ሻ is defined to be the scalar ݑଵ ݒଵ ڮ ݑ ݒ ൌ σୀଵ ݑ ݒ If the two vectors ݒݑݒݑare written as column vectors then we denote the above scalar product by ݑᇱ Ǥ ݒ. If ݒݑݒݑboth are row vectors then the scalar product will be written as ݒݑ′ . For example, if ݑൌ ሺʹ
͵
ͳ Ͷሻ and ݒൌ ൭െʹ൱, then ݑǤ ݒൌ ͳʹ. Ͷ
Length or Norm of a Vector The length or magnitude of a vector ݑwritten as หݑห is defined as หݑห ൌ ξݑ′ Ǥ ݑൌ ඥݑଵଶ ڮ ݑଶ Distance between two Vectors The distance from the vector ݑto the vector ݒwritten, ห ݑെ ݒห, is defined as ห ݑെ ݒห ൌ ඥሺ ݑെ ݒሻ′ ሺ ݑെ ݒሻ ൌ ඥσୀଵሺݑ െ ݒ ሻଶ
26
Chapter 1
Distance has the following properties: (i) ห ݑെ ݒห Ͳ unless ݑെ ݒൌ Ͳ, which means that the distance between two distinct point is always positive. (ii) ห ݑെ ݒห ൌ ห ݒെ ݑห. (iii) ห ݑെ ݒห ห ݒെ ݓห ห ݑെ ݓหǤ This is called triangular inequality as in two and three dimensional spaces, the sum of the length of two sides of a triangle is not less than the length of the third side. Angle between two Vectors The angle ߠ between two n-dimensional vectors ݑand ݒ, where ݒ ് ݑis computed from ܿ ߠ ݏൌ
௨′ ௩ ห௨หห௩ห
ൌ
σ సభ ௨ ௩ మ మ ටσ సభ ௨ ටσసభ ௩
Orthogonality Two n-dimensional vectors ݑand ݒ൫ݑǡ Ͳ ് ݒ൯ are orthogonal if their scalar product is zero, i.e. ݑᇱ ݒൌ Ͳ. In this case then for the angle ߠ between them, గ గ ܿ ߠ ݏൌ Ͳǡ ݅Ǥ ݁Ǥߠ ൌ ܿ ߠ ݏൌ Ͳǡ ݅Ǥ ݁Ǥ ǡ ߠ ൌ . Note that in two dimensions, ଶ
ଶ
గ
two non-null vectors are called orthogonal if the angle between them is Ǥ ଶ
Vector Space A vector space ܸover a field ܨis a collection of vectors which is closed under the operations of addition and of multiplication by a scalar (called scalar multiplication) – i.e. a set of vectors ܸ forms a vector space if (1) The sum of any two vectors in ܸis also in ܸ. (2) For any vector ߙ in ܸ and any scalar ܿfrom ܨ, the vector ܿߙ is also in ܸ. Definition: For any positive integer n and any field F, the vector space denoted by ܸ ሺܨሻ is the set of all n-tuples ߙ ൌ ሺܽଵ ǡ ǥ ܽ ሻ and ߚ ൌ ሺܾଵ ǡ ǥ ܾ ሻ where the components
ai
and
bi
, ݅ ൌ ͳǡ ǥ ǡ ݊ are in F.
Introduction
27
The set of all linear combinations of given vectors ߙଵ ǡ ߙଶ ǡ ǥ ߙ is said to be a vector space spanned by ߙଵ ǡ ߙଶ ǡ ǥ ߙ . Euclidean Vector Space An ݊dimensional Euclidean vector space, denoted by ܧ , is the collection of all the vectors of dimension ݊, so that the addition of any two vectors, the multiplication by a scalar and the distance between any two vectors are defined as above. Subspace A subspace ܵ of a vector space ܸ is a subset of ܸ which itself is a vector space with respect to the operations of addition and scalar multiplication in ܸ. For example, in the three dimensional vector space ܸଷ ሺܴሻ, the vectors which lie in a fixed plane through the origin form by themselves a two dimensional vector space ܸଶ ሺܴሻ, which is a part of three dimensional space. Result: The set of all linear combinations of any set of vectors in a vector space ܸ is a subspace of ܸ. Proof: Let the given vectors be ߙଵ ǡ ߙଶ ǡ ǥ ߙ . For given scalars ܿଵ ǡ ǥ ǡ ܿ and ′ , consider the linear combinations ܿଵ′ ǡ ǥ ǡ ܿ ܿଵ ߙଵ ܿଶ ߙଶ ڮ ܿ ߙ ′ ߙ and ܿଵ′ ߙଵ ܿଶ′ ߙଶ ڮ ܿ
Adding the two we get, ′ ሻߙ ܸ א. ሺܿଵ ܿଵ′ ሻߙଵ ሺܿଶ ܿଶ′ ሻߙଶ ڮ ሺܿ ܿ
Multiplication by a scalar ݇ gives ݇ሺܿଵ ߙଵ ܿଶ ߙଶ ڮ ܿ ߙ ሻ ൌ ݇ܿଵ ߙଵ ݇ܿଶ ߙଶ ڮ ݇ܿ ߙ which also belongs to ܸ as each coefficient ݇ܿ is a scalar.
28
Chapter 1
1.8 LINEAR INDEPENDENCE OF VECTORS If one vector in the set of vectors from ܧ can be written as a linear combination of some of the other vectors in the set, then we say that the given vector is linearly dependent on the others, and in this case the set of vectors is said to be linearly dependent. If no vector can be written as a linear combination of the others then the set is said to be linearly independent. More precisely, a set of vectors ܽଵ ǡ ǥ ǡ ܽ is said to be linearly dependent if there exist scalars ߣଵ ǡ ǥ ǡ ߣ , not all simultaneously zero, such that ߣଵ ܽଵ ڮ ߣ ܽ ൌ Ͳ, where Ͳ represents a zero vector with ݊ components. If the above relationship holds only when all scalars ߣଵ ǡ ǥ ǡ ߣ are zeros then the set of vectors ܽଵ ǡ ǥ ǡ ܽ is said to be linearly independent. ʹ Ͷ To illustrate this, let us test the vectors ܽଵ ൌ ቀ ቁ and ܽଶ ൌ ቀ ቁfor linear ͵ ʹ independence. If possible, let there exist ߣଵ and ߣଶ , not both zero, such that ߣଵ ܽଵ ߣଶ ܽଶ ൌ Ͳ i.e.
Ͳ ʹ Ͷ ߣଵ ቀ ቁ ߣଶ ቀ ቁ ൌ ቀ ቁ, which gives Ͳ ͵ ʹ
ʹߣଵ Ͷߣଶ ൌ Ͳ and ͵ߣଵ ʹߣଶ ൌ Ͳ There is no unique solution to these equations unless ߣଵ ൌ Ͳ and ߣଶ ൌ Ͳ. Thus the two vectors are independent. Result 6.2: The non-zero vectors ߙଵ ǡ ߙଶ ǡ ǥ ߙ in a space are linearly dependent if and only if one of the vectors, say ߙ , is a linear combination (l.c.) of the preceding ones. Proof: Necessity: Suppose that ߙ can be expressed as a l.c. of ߙଵ ǡ ߙଶ ǡ ǥ ߙିଵ . Then
ߙ ൌ ܿଵ ߙଵ ܿଶ ߙଶ ڮ ܿିଵ ߙିଵ
or ܿଵ ߙଵ ܿଶ ߙଶ ڮ ܿିଵ ߙିଵ ሺെͳሻߙ ൌ Ͳ
Introduction
29
with at least one coefficient, -1, not zero. Hence the vectors are dependent. Sufficiency: Suppose the vectors are dependent. Then there exist scalars ݀ଵ ǡ ǥ ݀ not all zero, such that ݀ଵ ߙଵ ݀ ڮ ߙ ൌ Ͳ Now choose the last subscript ݇ for which ݀ ് Ͳ. Then we can solve for ߙ as a linear combination of ߙଵ ǡ ߙଶ ǡ ǥ ߙିଵ as ߙ ൌ ቀെ
ௗభ ௗೖ
ቁ ߙଵ ڮ ቀെ
ௗೖషభ ௗೖ
ቁ ߙିଵ
Thus we have expressed ߙ as a linear combination of the preceding vectors, except in the case ݇ ൌ ͳ. But in this case ݀ଵ ߙଵ ൌ Ͳ with ݀ଵ ് Ͳǡ so that ߙଵ ൌ Ͳ, which is contrary to our hypotheses that none of the given vectors equals zero. Two corollaries to this result follow. Corollary 1: A set of vectors is linearly dependent if and only if it contains a proper (i.e. smaller) subset generating the same. Corollary 2: Any set of finite vectors from a vector space contains a linear independent subset which generates the same space. Spanning Set Given an n-dimensional Euclidean space ܧ , a set of vectors which can be used to express any arbitrary vector in ܧ is called a spanning set. A spanning set of vectors need not necessarily be linearly independent. Dimension The minimum number of vectors spanning a space (or subspace) is called the dimension of the space (or sub space) Basis If the set of vectors ߙଵ ǡ ߙଶ ǡ ǥ ǡ ߙ is linearly independent and spans a vector space V, then this is said to be the basis of V. i.e. ߙଵ ǡ ߙଶ ǡ ǥ ǡ ߙ is said to form a basis of V if (i) ߙଵ ǡ ߙଶ ǡ ǥ ǡ ߙ are independent
30
Chapter 1
(ii) Each ߠ ܸ אcan be represented as a linear combination of ߙଵ ǡ ǥ ǡ ߙ . Remarks: (i) A vector space is said to be finite dimensional iff it has a finite basis. (ii).There can be more than one basis to a vector space. Result 6.3: The n-unit vectors ݁ ൌ ሺͳǡ ǥ ǡͲሻǡ ǥ ǡ ݁ ൌ ሺͲǡ ǥ ǡͳሻ form a basis for ܧ Ǥ Proof: Take any vector ߙ ൌ ሺݔଵ ǡ ǥ ݔ ሻ ܧ א Ǥ We can write ሺݔଵ ǡ ǥ ݔ ሻ ൌ ݔଵ ݁ଵ ڮ ݔ ݁ . We are now left to show that ݁ଵ ǡ ǥ ǡ ݁ are independent. Suppose that ݁ଵ ǡ ǥ ǡ ݁ are dependent. Then there exist scalars ݇ଵ ǡ ǥ ݇ , not all zero, such that ݇ଵ ݁ଵ ڮ ݇ ݁ ൌ Ͳ, or ሺ݇ଵ ǡ ǥ ǡ ݇ ሻ ൌ Ͳ which is contrary to the hypothesis. Result 6.4: If ߙଵ ǡ ߙଶ ǡ ǥ ǡ ߙ is a basis for V then every vector V can be uniquely expressed in terms of ߙଵ ǡ ߙଶ ǡ ǥ ǡ ߙ Ǥ Result 6.5: Every basis of a vector space contains the same number of vectors. Remarks: (i) A non-zero vector ܽ divided by its length ȁܽȁ is a vector of unit length. Let ܽ ൌ
หห
, then ܽ′ ܽ ൌ
′ ′
หห
Ǥ
หห
ൌ
′ Ǥ ห′ ห
σ మ
ൌ
ൌͳ
ඨσሺమ ሻටσ మ
(ii) Given any two vectors ܽ and ܾ of the same size, then หܽ′ ܾห หܽหหܾห. This is called the Schwartz inequality. Extracting a Basis out of a Set of Vectors Let the given vectors be ߙଵ ൌ ሺܽଵଵ ǡ ǥ ǡ ܽଵ ሻǡ ǥ ǡ ߙ ൌ ሺܽଵ ǡ ǥ ǡ ܽ ሻǡ where ݉ ݊.
Introduction
31
Procedure: (i) Write the vectors in m rows. (ii) Choose a row with a non-zero element in its first column. Divide all the elements in this row by the first element and write them in the (m+1)st row as ሺͳǡ ܽଶାଵ ǡ ǥ ǡ ܽାଵ ሻ. We call this the first pivotal row, and the ‘1’ in its position as the pivotal element. (iii) Make the transformations in the remaining m-1 rows by the pivotal ൈ ᇱ condensation procedure, i.e. ܽǡା ൌ ܽǡା െ భ భ ǡ ݅ ൌ ʹǡ ǥ ǡ ݉, భభ
and write these rows in the ݉ ʹ݊݀ ǡ ǥ ǡʹ݉ݐℎ row. (iv) Repeat steps (i) and (ii), by choosing a non-zero element in the second column from among the ݉ െ ͳ transformed rows. (v) After ݊ iterations, we obtain the ݊ pivotal rows which correspond to the independent vectors in the set. Example: A vector space V is generated by the vectors (1,0,1), (1,1,0), (3,2,1), (6,3,3), (0,2,2). Extract a basis for V. We write the given vectors in five rows as shown in the below table. The first pivotal row is obtained by dividing the first row by 1. The next two pivotal rows are obtained by repeating steps (ii) and (iii) of the procedure twice.
The basis corresponding to the set of given vectors obtained from three pivotal rows is given as
32
Chapter 1
ͳ ൭Ͳ Ͳ
Ͳ ͳ ͳ െͳ൱ Ͳ ͳ
1.9 SOLUTION OF SYSTEMS OF SIMULTANEOUS LINEAR EQUATIONS Consider a system of m linear equations in n unknowns ܽଵଵ ݔଵ ܽଵଶ ݔଶ ڮ ܽଵ ݔ ൌ ܾଵ ܽଶଵ ݔଵ ܽଶଶ ݔଶ ڮ ܽଶ ݔ ൌ ܾଶ ڭ ܽଵ ݔଵ ܽଶ ݔଶ ڮ ܽ ݔ ൌ ܾ We can write these equations in matrix form as ܽଵଵ ݔܣൌ ܾ, where ܣൌ ൭ ڭ ܽଵ
ڮ ڮ
ܽଵ
ܾଵ ൱ and ܾ ൌ ൭ ڭ൱ ܽ ܾ
or in vector notation as ܽଵ ߙଵ ݔଵ ߙଶ ݔଶ ڮ ߙ ݔ ൌ ܾ, where ߙ ൌ ൭ ڭ൱ ǡ ݆ ൌ ͳǡ ǥ ǡ ݊ ܽ We call the above system of linear equations homogeneous if ܾ ൌ Ͳ and non-homogeneous if ܾ ് Ͳ A system of homogeneous linear equations ݔܣൌ Ͳ has the trivial solution ݔൌ Ͳ. If A is non-singular, the only solution to ݔܣൌ Ͳ is ݔൌ Ͳ. When A is singular, there can also be many non-trivial solutions to ݔܣൌ ͲǤ Solution to Non-Homogeneous Systems Result-1. The non-homogeneous system ݔଵ ߙଵ ݔଶ ߙଶ ڮ ݔ ߙ ൌ ܾ
(7.1)
Introduction
33
has a solution iff ܾ is dependent on ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ . In other words the system (7.1) has a solution iff the maximum number of independent vectors among ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ is the same as the maximum number of independent vectors in ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ ܾ. (This condition is called the condition of consistency or compatibility of non-a homogeneous system (7.1)). Proof: Necessity: If (7.1) has a solution, let it be ܿ ൌ ሺܿଵ ǡ ǥ ǡ ܿ ሻ. Then ܿଵ ߙଵ ܿଶ ߙଶ ڮ ܿ ߙ ൌ ܾ, showing that ܾ is a linear combination of ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ . Suff: Let ܾ be linearly dependent on ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ . Then for some ܽ ൌ ሺܽଵ ǡ ǥ ǡ ܽ ሻ ് Ͳǡ ܾ ൌ ܽଵ ߙଵ ܽଶ ߙଶ ڮ ܽ ߙ , showing that (a1,a2,---,an) is a solution of (7.1). Result-2. If ܿ כis any given solution of the non-homogeneous system ݔܣൌ ܾ, then a vector ܿis also a solution of the non-homogeneous system iff ܿis of the form ܿ ൌ ܿଵ כ ℎwhere ℎis any solution of the corresponding homogeneous system. Proof: Nec. : Let ܿ כbe the given solution of (7.1) and ݄ any solution of ݔܣൌ Ͳ. Then כ ܿܣൌ ܾ and ℎ ൌ Ͳ On addition we get ܣ൫ܿ כ ݄൯ ൌ ܾ showing that ܿ כ ℎ is a solution of (7.1). Suff : Let ܿ be an arbitrary solution of (7.1). Then ܿܣൌ ܾ. Also we have כ ܿܣൌ ܾ. On subtraction we get ܣሺܿ െ ܿ כሻ ൌ Ͳ, showing that ܿ െ ܿ כൌ ݄, say, is a solution to the homogeneous system ݔܣൌ Ͳ. Result-3. Consider the system ݔܣൌ ܾ and the augmented matrix [A,ܾ] with m rows and n+1 columns. If the rank of [A,ܾ] is greater than the rank of A,
34
Chapter 1
then ܾcannot be represented as a linear combination of ߙଵ ǡ ߙଶ ǡ ڮǡ ߙ , and hence in this case there is no solution to the system ݔܣൌ ܾ. We can conclude that 1.
If rank [A,ܾ] > rank [A] , then ݔܣൌ ܾ has no solution.
2. If rank [A,ܾ] = rank [A] = n, then there exists a unique solution to the system ݔܣൌ ܾ. 3. If rank [A,ܾ] = rank [A] = k < n, then we have an infinite number of solutions to the system ݔܣൌ ܾ. 1.10 SOLUTION OF NON-HOMOGENEOS SYSTEMS 1.10.1 Solution by Cramer’s Rule: Consider the system ݔܣൌ ܾ, where A=(aij) is of full rank. Since A is of full rank, A-1 exists. Pre multiplying both ଵ భభ భ െ െ െ ȁȁ ଶ ȁȁ ۇǣ ۇ ۊǣ ۊ ౠ ିଵ -1 ۋ sides by A , we get ൌ ൌ ȁȁ ൌ ۈ ۈۋ ǣ ۈǣ ۋ భ ǣ ۉȁȁ െ െ െ ȁȁ ی ۉ ی ୬ or
ଵ
ݔ ൌ ȁȁ ሺܣଵ ܾଵ ڮ ܣ ܾ ሻǡ ݅ ൌ ͳǡ ǥ ǡ ݊.
Consider the matrix A(i) constructed from A by replacing column i by the column ܾ. ଵଵ Ǥ Ǥ Ǥ Ǥ ଵǡ୧ିଵ Ǥ ሺሻ ൌ ൮ Ǥ ୬ଵ Ǥ Ǥ Ǥ Ǥ ୬ǡ୧ିଵ
ଵ
ଵǡ୧ାଵ Ǥ Ǥ Ǥ Ǥ ଵ୬ ൲
୬
୬ǡ୧ାଵ Ǥ Ǥ Ǥ Ǥ ୬୬
The expression ሺܣଵ ܾଵ Ǥ Ǥ Ǥ Ǥ ܣ ܾ ሻ may be considered as the expansion of A(i) by the elements of the ith column. Thus ݔ ൌ
ȁሺሻȁ ȁȁ
, i=1,2,---,n.
This is known as Cramer’s rule for solving a system of linear equations with non singular Coefficient matrix.
Introduction
35
Example: Consider the following non-homogeneous system x+y+z=0 2x – y + z = -3 x + 2y
=5
ͳ ܣൌ ൭ʹ ͳ
ͳ െͳ ʹ
Then ͳ Ͳ ͳ൱, ൌ ൭െ͵൱. First we find ȁܣȁ ൌ Ͷ Ͳ ͷ
By applying the Cramer’s rule we get ݔൌ
ȁሺଵሻȁ ȁȁ
Ͳ ଵ ൌ อെ͵ ସ ͷ
ͳ െͳ ʹ
ͳ ସ ͳอ ൌ ସ ൌ ͳ Ͳ
ͳ Ͳ ͳ ଵ ଼ ൌ อ ʹ െ͵ ͳอ ൌ ସ ൌ ʹ ȁȁ ସ ͳ ͷ Ͳ Ͳ ȁܣሺ͵ሻȁ ͳ ͳ ͳ െͳʹ ݖൌ ൌ อʹ െͳ െ͵อ Ǥ ൌ ൌ െ͵ ȁܣȁ Ͷ Ͷ ͳ ʹ ͷ ݕൌ
ȁሺଶሻȁ
1.10.2. Gaussian Elimination Procedure Consider n linear equations in n unknowns ܽଵଵ ݔଵ ڮ ܽଵ ݔ ൌ ܾଵ ڭ ܽଵ ݔଵ ڮ ܽ ݔ ൌ ܾ
(1.10.1)
We can assume ܽଵଵ z 0, since otherwise the equations can be re-arranged. Solving the first equation for ݔଵ , we get ݔଵ ൌ െ
భమ భభ
ݔଶ െǤ Ǥ Ǥ Ǥ െ
భ భభ
ݔ
భ భభ
(1.10.2)
Dividing the first equation of system (1.10.1) by ܽଵଵ and eliminating ݔଵ from the remaining n-1 equations by using (8.2), we get
36
Chapter 1
ݔଵ భమ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ భ ݔ ൌ భ భభ భభ భభ ܽଵଶ ܽଵ ܾଵ ܽଶଶ െ ܽଶଵ ൬ ൰൨ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଶ െ ܽଶଵ ൬ ൰൨ ݔ ൌ ܾଶ െ ܽଶଵ ܽଵଵ ܽଵଵ ܽଵଵ ǣ ǣ ܽଵଶ ܽଵ ܾଵ ܽଶ െ ܽଵ ൬ ൰൨ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ െ ܽଵ ൬ ൰൨ ݔ ൌ ܾ െ ܽଵ ܽଵଵ ܽଵଵ ܽଵଵ or ′ ′ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଵ ݔ ൌ ܾଵ′ ݔଵ ܽଵଶ ′ ′ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଶ ݔ ൌ ܾଶ′ ܽଶଶ ǣ ǣ ′ ′ ܽଶ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ ݔ ൌ ܾ′
(1.10.3)
′ Where ܽ ൌ ܽ Ȁܽଵଵ ǡ ݆ ൌ ͳǡ ǥ ǡ ݊ǡ ܾଵ′ ൌ ܾଵ Ȁܽଵଵ ᇱ ൌ ܽ െ ܽଵ ቀ ܽ
భೕ భభ
ቁ and ܾ′ ൌ ܾ െ ܽଵ ቀ
భ భభ
ቁ ǡ ݅ǡ ݆ ൌ ʹǡ ǥ ǡ ݊Ǥ
ᇱ (i,j=2,3,…,n) differs from zero, we can assume, If at least one of the ܽ ᇱ ് Ͳ. The reduction process is continued without loss of generality, that ܽଶଶ ᇱ by dividing the second equation of (1.10.3) by ܽଶଶ and by using this equation to eliminate x2 in equations ͵ǡͶǡ ǥ ǡ ݊. Then x3 is eliminated from equations Ͷǡͷǡ ǥ ǡ ݊ and so on.
Finally we obtain the system in the form ݔଵ ܽଵଶ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଵ ݔ ൌ ܾଵ ݔଶ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଶ ݔ ൌ ܾଶ : : ݔିଵ ܽ ڮିଵǡ ݔ ൌ ܾିଵ The Last equation then gives ݔ ൌ ܾ .
Introduction
37
The (n-1)th equation gives ݔିଵ ൌ ܾିଵ െ ܽିଵǡ ܾ By continuing this process of back substitution, we obtain all the xi values. This procedure is called the Gaussian Elimination procedure. Instead of eliminating xk only in equations ܭ ͳǡ ܭ ʹǡ ǡ ǥ ǡ ݊ǡ we could equally well eliminate xk in equations 1,2,…,k-1 also, so that xk would appear only in the kth equation. Now back substitution is not needed. This modification of Gaussian Elimination is called the Gauss–Jordan method. The procedure of eliminating a particular variable from all but one equation is called a pivot operation. A pivot operation consists of two elementary operations, viz: (i) Any equation is multiplied by a non-zero element. (ii) An equation multiplied by a constant is added to another. So the system of equations obtained after these pivot operations has exactly the same solution as the original set. 1.10.3. Gauss Doolittle Method Consider the system of equations (8.1). Write down the augmented matrix of coefficients – (i) Choose a row having a non-zero value for its first element. Let it be the first row. Divide all its elements by the 1st so that the resulting row is of the form (1,c2,---,cn,d). Call this row ଵ . (ii) From every other row is subtracted a row obtained by multiplying ଵ by its first element. The resulting rows, except the one chosen above, have zeros as their first element. The first column is said to be swept out by the row ଵ . ଵ is called a pivotal row.
(iii) Omission of the first pivotal row and first column results in a reduced matrix on which operations (i) and (ii) are
38
Chapter 1
repeated until a single non-zero row remains, which is regarded as the last pivotal row. The computations for a system of 3 equations in 3 unknowns are presented in tabular form in table 1.10.1. A general step of the pivotal operation on the augmented matrix is given below. Let akl z 0. We eliminate xl from all the equations except the kth. This can be accomplished by dividing the kth equation by akl and subtracting ajl times the result from each of the other equations ݆ ൌ ͳǡʹǡ ڮǡ ݇ െ ͳǡ ݇ ͳǡ ǥ ǡ ݊Ǥ
Table 1.10.1: Gauss do-little Method in tabular form for the 3 x 3 system in which ܿଵଵ ൌ ቀܽଶଶ െ
ܽଵଵ ǣ ۇ ǣ ܽۈ ۈଵ ۈǣ ǣ ܽۉଵ
ǤǤǤǤǤ ǤǤǤǤǤǤ
ܽଵ ܽ
Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ
భమ మభ భభ
ǤǤǤǤǤ
ቁ and so on.
ܽଵ
ܾଵ
ܽ
ۊ ܾ ۋ ۋ ۋ
Ǥ Ǥ Ǥ Ǥ Ǥ ܽ
ܾ ی
ǤǤǤǤǤ
The resulting system of equations has a coefficient matrix of the form
Introduction ᇱ ܽଵଵ ǣ ۇǣ ܽ ۈᇱ ۈଵ ۈǣ ǣ ᇱ ܽ ۉଵ
ǤǤǤǤ
ᇱ ܽଵǡିଵ
Ͳ
ᇱ ܽଵǡାଵ
ᇱ ܽǡିଵ
ǤǤǤǤ
ᇱ Ǥ Ǥ Ǥ Ǥ ܽǡିଵ
39
ǤǤǤǤ
ᇱ ܽଵ
ܾଵᇱ
ͳ
ᇱ ܽǡାଵ
ǤǤǤǤ
ᇱ ܽ
ܾᇱ
Ͳ
ᇱ ܽǡାଵ
ᇱ Ǥ Ǥ Ǥ Ǥ ܽ
ۊ ۋ ۋ ۋ
ܾᇱ ی
where ′ ܽ ൌ
ೖೕ ೖ
, ݆ ൌ ͳǡ ǥ ǡ ݊.
ᇱ ܽ ൌ ܽ െ
ೕ ೖೕ ೖ
; i z k, ݆ ൌ ͳǡ ǥ ǡ ݊
This elimination step is called a pivot operation. The resulting system of equations obtained after a pivot operation has exactly the same solution as the original system. Next, if we perform a new pivot operation by eliminating xs, s z l. in all the equationa except the tth t z k, the zeros or ones in the lth column will not be disturbed. The pivot operation can thus be repeated using different pivotal elements. Finally we get a matrix of the form ͳ ۇǣ ǣ Ͳۉ
ǤǤǤǤǤǤ
Ͳ ǣ ǣ ǤǤǤǤǤǤǤ ͳ
ܾଵ ǣۊ ǣ ܾ ی
Then ݔ ൌ ܾ , ݅ ൌ ͳǡ ǥ ǡ ݊ is the solution vector. 1.10.4. Pivotal Reduction of a General System Consider a consistent system of ݉ linear equations in ݊ unknowns ሺ݉ ൏ ݊ሻ (having at least one solution) a11 x1 +.. .. .. + an1 xn = b1 :
:
:
:
am1 x1 + .. .. .. + amn xn = bm
(1.10.4)
40
Chapter 1
If pivotal operations with respect to any m variable, say x1,x2,---,xm , are carried out, the resulting set of equations reduces to the following canonical form ′′ ′′ ͳݔ Ǥ Ǥ Ǥ Ǥ Ͳݔ ܽଵǡାଵ ݔାଵ Ǥ Ǥ Ǥ Ǥ Ǥ ܽଵ ݔ ൌ ܾଵ′′
: ڰ
ڭ
ڭ
(1.10.5)
′′ ′′ ᇱᇱ ݔାଵ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ ݔ ൌ ܾ Ͳݔ Ǥ Ǥ Ǥ Ǥ Ǥ ͳݔ ܽǡାଵ
One particular solution which can always be deduced from the system (1.10.5) is ݔ ൌ ܾ′′ ݅ ൌ ͳǡʹǡ ڮǡ ݉ǣ ݔ ൌ Ͳǡ ݅ ൌ ݉ ͳǡ ݉ ʹǡ ڮǡ ݊Ǥ This solution is called a basic solution as the solution vector contains no more than m non-zero terms. The pivotal variables are called basic variables and the other variables non-basic. It is possible to obtain other basic solutions from the system (1.10.5). We can perform an additional pivotal operation on the canonical system (1.10.5) ′′ by using ܽ ് Ͳ as the pivotal term, q>m. The new system will still be in canonical form in which xq will be the pivotal variable in place of xp. This new canonical system yields a new basic solution. Example: Let us find a basic solution to the system ʹݔଵ ͵ݔଶ െ ʹݔଷ െ ݔସ ൌ ͳ ݔଵ ݔଶ ݔଷ ͵ݔସ ൌ ݔଵ െ ݔଶ ݔଷ ͷݔସ ൌ Ͷ The augmented matrix is ሺʹሻ ൭ ͳ ͳ
͵ ͳ െͳ
െʹ ͳ ͳ
െ ͳ ͵ ڭ൱ ͷ Ͷ
The various pivot operations are as follows: The operation ܴଶ ՞ ܴଵ gives
Introduction
ͳ ൭ʹ ͳ
ͳ ͵ െͳ
ͳ െʹ ͳ
41
͵ െ อͳ൱ ͷ Ͷ
The operations ܴଶ ՜ ܴଶ െ ʹܴଵ ǡ ܴଷ ՜ ܴଷ െ ܴଵ yield
ͳ ൭Ͳ Ͳ
ͳ ͳ െʹ
ͳ െͶ Ͳ
͵ െͳ͵ อെͳͳ൱ ʹ െʹ
The operations ܴଵ ՜ ܴଵ െ ܴଶ ǡ ܴଷ ՜ ܴଷ ʹܴଶ yield
ͳ ൭Ͳ Ͳ The operation ܴଷ ՜
ோయ ି଼
Ͳ ͳ Ͳ
ͷ െͶ െͺ
ͳ ͳ െͳ͵ อെͳͳ൱ െʹͶ െʹͶ
Ͳ ͳ Ͳ
ͷ െͶ ͳ
ͳ ͳ െͳ͵ อെͳͳ൱ ͵ ͵
gives
ͳ ൭Ͳ Ͳ
The operations ܴଵ ՜ ܴଵ െ ͷܴଷ ǡ ܴଶ ՜ ܴଶ Ͷܴଷ give
ͳ ൭Ͳ Ͳ
Ͳ ͳ Ͳ
Ͳ Ͳ ͳ
ͳ ʹ െͳ อͳ൱ ͵ ͵
From the last matrix we can write the solution of x1, x2, and x3 in terms of x4 as x1 = 2 - x4, x2 = 1 + x4, x3 = 3 – 3x4. The basic solution is obtained by putting x4 = 0. Then x1 = 2, x2 = 1, and x3 = 3. The other basic solutions can be obtained by bringing x4 into the basis in place of the other variables in turn.
42
Chapter 1
1.10.5. Tabular form of Jordan Elimination Consider the following system of m linear equations in ݊ unknowns x1, x2, …, xn. ݕ ൌ ܽଵ ݔଵ ܽଶ ݔଶ Ǥ Ǥ Ǥ ܽ ݔ Ǣ ݅ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݉
(1.10.6)
This system can be written in tabular form as ݕଵ ڭ ݕ ڭ ݕ
ൌ
ݔଵ ܽଵଵ
ǤǤǤǤǤ ǤǤǤǤǤ
ൌ
ܽଵ
ൌ
ܽଵ
ݔ௦ ܽଵ௦
ǤǤǤǤǤ ǤǤǤǤǤ
ݔ ܽଵ
Ǥ Ǥ Ǥ Ǥ Ǥ ሺܽ௦ ሻ
ǤǤǤǤǤ
ܽ
ǤǤǤǤǤ
Ǥ Ǥ Ǥ Ǥ Ǥ ܽ
ܽ௦
Table 1.10.2 Jordan Elimination over table (1.10.2) with pivot element ars z 0 means the operation of switching the roles of the departing variable yr and the independent variable xs, i.e. the operation of solving the equation ݕ ൌ ܽଵ ݔଵ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ௦ ݔ௦ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ ݔ for xs, substituting the solution into all the remaining equations of system (1.10.6) and writing the system in the form of a new table. It will be seen that the new table is of the form ݕଵ ڭ ݔ௦ ڭ ݕ
ൌ
ݔଵ ܾଵଵ
ǤǤǤǤǤ ǤǤǤǤǤ
ݕ ܽଵ௦
ǤǤǤǤǤ ǤǤǤǤǤ
ൌ
ܽଵ
ǤǤǤǤǤ
ܽ௦
Ǥ Ǥ Ǥ Ǥ Ǥ ܽ
ൌ
ܾଵ
Ǥ Ǥ Ǥ Ǥ Ǥ ܽ௦
ǤǤǤǤǤ
ݔ ܾଵ
ܾ
Introduction
where
ܾ ൌ ܽ െ
ೞ ೝೕ ೝೞ
43
(1.10.7)
ሺ݅ ് ݎǡ ݆ ് ݏሻ.
An iteration of Jordan Elimination with pivotal element ܽ௦ is obtained by the following steps: 1) The pivotal element is replaced by
ଵ ೝೞ
2) The remaining elements of pivotal column (sth) are divided by ars. 3) The remaining elements of pivotal row (r th) are divided by –ars. 4) For the remaining portion, the ሺ݅ǡ ݆ሻ௧ℎ element is replaced by ܽ െ
ೞ ೝೕ ೝೞ
For solving the system of non-homogeneous equations (1.10.4) by the application of Jordan Elimination, we write the coefficients in tabular form as Ͳ ڭ Ͳ
ൌ
ݔଵ ܽଵଵ
ǤǤǤǤǤǤǤ ǤǤǤǤǤǤǤ
ݔ ܽଵ
ͳ െܾଵ
ൌ
ܽଵ
Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ ܽ
െܾ
We make n Jordan eliminations by choosing the pivot columns which correspond to the non-slack variables, and the final solution is obtained as the entries in the last column. Let us solve the following system of three equations in four variables by the Jordan Elimination procedure: ʹݔଵ ͵ݔଶ െ ʹݔଷ െ ݔସ ൌ ͳ ݔଵ ݔଶ ݔଷ ͵ݔସ ൌ ݔଵ െ ݔଶ ݔଷ ͷݔସ ൌ Ͷ We may consider the non-slack variables as ݔଵ ǡ ݔଶ ݔଷ Ǥ ݔଵ ǡ ݔଶ ݔଷ Ǥ The tabular form for this system is written as
44
Chapter 1
Ͳ Ͳ Ͳ
ൌ ൌ ൌ
ݔଵ ሺʹሻ ͳ ͳ
ݔଶ ͵ ͳ െͳ
ݔଷ െʹ ͳ ͳ
ݔସ െ ͵ ͷ
ͳ െͳ െ െͶ
We choose the (1,1)th element “2” as the pivotal element. An iteration of Jordan Elimination gives
ݔଵ Ͳ Ͳ
ൌ ൌ ൌ
ͳȀʹ ͳȀʹ െʹ
െ͵Ȁʹ ሺെͳȀʹሻ െͷȀʹ
ͳ ʹ ʹ
Ȁʹ ͳ͵Ȁʹ ͳȀʹ
ͳȀʹ െͳͳȀʹ െȀʹ
For the next iteration we choose the (2,2)th element “-1/2” as pivot. The next table is ݔଵ ݔଶ Ͳ
ൌ ൌ ൌ
െͳ ͳ െʹ
͵ െʹ ͷ
െͷ Ͷ ሺെͺሻ
െͳ ͳ͵ ʹͶ
ͳ െͳͳ ʹͶ
The third iteration by choosing (3,3)th element “-8” as pivot gives the final solution appearing in the last column of the transformed table. ݔଵ ݔଶ ݔଷ
ൌ ൌ ൌ
ʹ ͳ ͵
1.11 DIFFERENTIATION We consider a function y = f(x), which is assumed to be continuous (see Fig. 1.6). Let us consider the change of ݕwith respect to ݔbetween the two
Introduction
45
points P(x0,y0) and Q(x0+h,y0+k). Denote by ܴ the ratio of the increase in the value of y to the increase in the value of x between P and Q. Rൌ
ሺ௬బ ାሻି௬బ ሺ௫బ ାℎሻି௫బ
ൌ . ℎ
As y = f(x) for all the points on the curve, we may also write Rൌ
ሺ௫బ ାℎሻିሺ௫బ ሻ ℎ
ൌ
ொௌ ௌ
.
When ݄ ՜ Ͳǡ ܴ gives the rate of change of ݕw.r.t. ݔat the point ܲ.
This average rate of change R also measures the slope of the curve at ܲas it may be noted in the triangle PSQ that ߠ ݊ܽݐൌ
ொௌ ௌ
.
The instantaneous rate of change is obtained by allowing h (and k) to approach zero while x0 and y0 are kept constant. Indeed, when Q approaches P, the chord PQ approaches the tangent to the curve at P. We write for ܴ in the limit ሺ௫బ ାℎሻିሺ௫బ ሻ
ݎൌ ݈݅݉ ቄ ℎ՜
ℎ
ቅ
As this rate of change may be computed for any point x0 for which f(x) is defined, it is conventional to drop the subscript 0 and to write
46
Chapter 1 ሺ௫ାℎሻିሺ௫ሻ
ݎൌ ݈݅݉ ቄ ℎ՜
ℎ
ቅ ൌ ݂ ′ ሺݔሻ.
(1.11.1)
The rate of change of y with respect to x at P is usually referred to as the derivative of f(x) at the point x. The derivative of a function y=f(x) is commonly denoted by such symbols ௗ௬ ௗሺ௫ሻ , ݕᇱ , or ݂ ′ ሺݔሻ as ǡ ௗ௫
ௗ௫
The process of finding a derivative is called differentiation. In defining the derivative of y=f(x), it was assumed that y is continuous. A discontinuous function is not differentiable at a point of discontinuity, and a continuous function also may not be differentiable at some unusual points. For example, the function ݕൌ ȁݔȁ is not differentiable at x=0. The curve has a sharp turn (corner) at the point ݔൌ Ͳ , and so a unique tangent cannot be drawn. Thus every differentiable function is continuous, but every continuous function may not differentiable, see Fig. 1.7.
y
y=-x
y=x
P(0,0) Fig 1.7. A continuous function not differentiable at x =0
Introduction
47
Higher Derivatives The derivative ݂ ᇱ ሺݔሻ of a function y = f(x) is sometimes called the first derivative. As ݂ ᇱ ሺݔሻ is also a function of x, it may be possible to differentiate ௗ it again to obtain ݂ ᇳ ሺݔሻ ൌ ሼ݂ ᇱ ሺݔሻሽ. The derivative of the first derivative ௗ௫ is known as the second derivative of the function y = f(x). It is also denoted by such symbols as
ௗమ௬ ௗ௫ మ
,
ௗ మ ሺ௫ሻ ௗ௫ మ
, or ݂ ″ ሺݔሻ.
The second derivative then indicates the rate of change of the first derivative with respect to x. In general, the nth derivative of y with respect to x is denoted by ௗ௬ ௗ௫
ൌ ݂ ሺݔሻ
Result 1: The derivative of a constant is zero. When y=c, a constant, the graph is a horizontal line and the gradient is always zero. See Fig. 1.8. Thus
ௗ௬ ௗ௫
ൌ
ௗሺሻ ௗ௫
ൌ ͲȀ݀ ݔൌ Ͳ,
as ݂ሺݔሻ ൌ ݂ሺ ݔ ߂ݔሻ ൌ ܿ
y
c x
0 Fig 1.8 ݕൌ ݂ሺݔሻ ൌ ܿ
Result 2: The derivative of the function ݂ሺݔሻ ൌ ݔ , where n is a number, is ݂ ᇱ ሺݔሻ ൌ ݊ ݔିଵ. The derivative of the function ݂ሺݔሻ ൌ ݇ ݔ , where k is a constant, is ݂ ′ ሺݔሻ ൌ ݇݊ ݔିଵ .
48
Chapter 1
Result 3: The derivative of the sum (or difference) of two functions equals the sum (or difference) of their derivatives. If ݂ሺݔሻ ൌ ݑሺݔሻ േ ݒሺݔሻ, then ௗ௬ ௗ௫
ൌ ݂ ′ ሺݔሻ ൌ
ௗ௨ ௗ௫
േ
ௗ௩ ௗ௫
Result 4: The derivative of the product of two functions equals the first function multiplied by the derivative of the second function plus the second function multiplied by the derivative of the first function. If
ݕሺݔሻ ൌ ݑሺݔሻǤ ݒሺݔሻǡ then ௗ௬ ௗ௫
ൌݑ
ௗ௩
ݒ
ௗ௫
ௗ௨ ௗ௫
.
Result 5: The derivative of the quotient (i.e. a fraction) of two functions ௨ሺ௫ሻ ݑሺݔሻ and ݒሺݔሻ, where ݕሺݔሻ ൌ , is given by ௩ሺ௫ሻ ௗ௬ ௗ௫
ൌ
௩
ೠ ೡ ି௨ ೣ ೣ మ ௩
.
Result 6: If y is a function of u, which is a function of x, i.e. y = f [u(x)], then the derivative of ݕmay be obtained by the so-called chain rule as ௗ௬ ௗ௫
ൌ
ௗ௬ ௗ௨
Ǥ
ௗ௨ ௗ௫
,
For example: Let ݕൌ ሺͶ ݔଶ ʹݔሻଶ and let ݑൌ Ͷ ݔଶ ʹݔ, then ݕൌ ݑଶ and
ௗ௬ ௗ௫
ൌ ʹሺͶ ݔଶ ʹݔሻሺͺ ݔ ʹሻ.
Result 7: If the variable y is a function of two independent variables u and v [i.e.,y=f(u,v)], then the partial derivative of y with respect to u is obtained by differentiating y with respect to u treating v as if it were a constant. In similar manner, the partial derivative of y with respect to v is obtained by differentiating ݕwith respect to ݒwhile ݑis treated as a constant. Result 8: If f(x) is a continuous function and has continuous first derivatives over some interval in x, then for any two points x1 and x2 in this interval, where x2=x1+h, there exists a T, 0 d T d 1, such that
Introduction
49
݂ሺݔଶ ሻ ൌ ݂ሺݔଵ ሻ ݄݂ ᇱ ሾߠݔଵ ሺͳ െ ߠሻݔଶ ሿ If, in addition, f(x) has continuous second derivatives, then there exists a T, 0dTd1, such that ݂ሺݔଶ ሻ ൌ ݂ሺݔଵ ሻ ℎ݂ ′ ሺݔଵ ሻ
ℎଶ ″ ݂ ሾߠݔଵ ሺͳ െ ߠݔଶ ሻݔଶ ሿ ʹǨ
This result is known as Taylor’s theorem. Taylor’s theorem can be extended to any order if f(x) has continuous derivatives of the requisite order. The general statement of Taylor’s theorem then becomes that there exists a T, 0d T d1, such that ݂ሺݔଶ ሻ ൌ ݂ሺݔଵ ሻ ݄݂ ᇱ ሺݔଵ ሻ
మ ଶǨ
݂ ᇳ ሺݔଵ ሻ ڮ
Ǩ
݂ ሾߠݔଵ ሺͳ െ ߠሻݔଶ ሿ.
1.12 MAXIMA AND MINIMA If a function y = f (x) is continuous then a necessary condition for ݂ to ௗ௬ attain a maximum or a minimum value is ൌ ݂ ᇱ ሺݔሻ ൌ Ͳ (Note: as the first ௗ௫ derivative at a point gives the slope of the line tangent to the curve at that point, at a maximum or a minimum value the curve turns its direction and the slope of the tangent line there is zero.) The sufficient condition for a maximum is ݕᇳ ൌ ݂ ᇳ ሺݔሻ ൏ Ͳ and for a minimum is ݂ ᇳ ሺݔሻ Ͳ. The function y = f(x) has an inflection point if ݕᇳ ൌ ݂ ᇳ ሺݔሻ ൌ Ͳ. Consider the function represented by the curve given in Fig. 1.9. The tangent to the curve makes an acute angle with the positive x-axis for the points on the sections of the curve AB, CD AND DE, where y is increasing with ݔ, and it makes an obtuse angle ( ߠ ݊ܽݐ൏ Ͳሻ on the section BC, where ݕdecreases with ݔ. The points B, C and D at which the slope is zero are called stationary points. ௗ௬ If changes its sign for increasing ݔfrom positive to negative then the ௫
point is a maximum (Point B); if
ௗ௬ ௗ௫
changes its sign from negative to ௗ௬
positive then the point is minimum (Point C). If is zero at some point but ௗ௫ it does change its sign with increasing ݔ, then the point is called a point of inflection (Point D).
50
Chapter 1
Fig 1.9. Points of maximum, minimum and a point of inflection
1.13 CONVEX SETS Definition: In ܧ , the line through two points ݔଵ and ݔଶ , ݔଵ zݔଶ is defined to be the set of points ܺ ൌ ൛ݔ
Τ ݔൌ ߣݔଵ ሺͳ െ ߣݔଶ ሻǡߣൟ
Definition: The line segment joining the points, ݔଵ and ݔଶ is defined to be the set of points ܺ ൌ ൛ ݔΤ ݔൌ ߣݔଵ ሺͳ െ ߣሻݔଶ ǡ Ͳ ߣ ͳൟ Definition: A convex linear combination (c.l.c.) of a finite number of points ݔଵ ǡ ǥ ǡ ݔ is defined as a point ݔൌ σ ଵ ߣ ݔ ǡ ߣ Ͳǡ ݅ ൌ ͳǡ ǥ ǡ ݉ǡ σ ߣ ൌ ͳ. Definition: A Hyperplane in ܧ is a set of points ݔsatisfying ܿ ᇱ ݔൌ ݖሺܿ ് Ͳሻ, where ܿ ᇱ ൌ ሺܿଵ ǡ ܿଶ ǡ Ǥ Ǥ Ǥ ǡ ܿ ሻ is a vector of constants and ݖis a given scalar. Definition: For a hyperplane ݖൌ ܿ ᇱ ݔ, the vector ܿis called normal to the hyperplane. The two vectors ܿ Τหܿห and െܿΤหܿห are the unit normals to the hyperplane. Definition: Two hyperplanes are said to be parallel if they have the same unit normals. Thus the hyperplanes ܿଵ ᇱ ݔൌ ݖଵ and ܿଶ ᇱ ݔൌ ݖଶ are parallel if ܿଵ ൌ ߣܿଶ ǡ ߣ ് Ͳ.
Introduction
51
Definition: A hyperplane ܿ ᇱ ݔൌ ݖdivides ܧ into the following three mutually exclusive and exhaustive sets: ܺଵ ൌ ൛ݔหܿ ′ ݔ൏ ݖൟǡ ܺଶ ൌ ൛ݔหܿ ′ ݔൌ ݖൟǡ ܺଷ ൌ ൛ݔหܿ ′ ݔ ݖൟǡ The sets X1 and X3 are called open half spaces. The hyperplane is a closed set. The sets ܺସ ൌ ሼ ݔȀܿ ᇱ ݔ ݖሽ and ܺହ ൌ ൛ݔหܿ ′ ݔ ݖൟare closed half spaces. Definition: In ܧ , the set of points ܵ ൌ ቄ ݔቚห ݔെ ܽห ൌ ݎቅ is called a hypersphere with centre at ܽand radius r. Definition: The set of points ܧ ܫ௦ ൌ ݔቚห ݔെ ܽห ൏ ݎin ܧ is called the interior of the hypersphere S. The interior of a hypersphere with centre at ܽand small radius is called an -nbh of ܽ (א-neighbourhood of ܽ). A point ܽis an interior point of the set S if there exists an -nbh of ܽ which contains only the points of S. Definition: A point ܾ is a boundary point of S if every nbh of ܾ, however small , contains both points that are in S and that are not in S. Definition: A set S is an open set if it contains only interior points. e.g.
^
x x12 x22 1
`
Definition: A set is closed if it contains all its boundary points. e.g.
^
x x12 x22 d 1
`
Definition: A set X is strictly bounded if there exists a positive number r such that for every ܺ א ݔǡ หݔห ݎǤ Definition: A set X is bounded from above if there exists an ݎsuch that for all ܺ א ݔǡ ݔ ݎǤ
52
Chapter 1
Definition: An open set is connected if any two points in the set can be joined by a polygonal path (sum of line segments) lying entirely within the set. Definition: A set X is convex if for any two points ܺଵ ǡ ܺଶ in the set, the line segment joining these points belongs to X – i.e. if ܺଵ ǡ ܺଶ X, then every point ܺ ൌ ߣܺଶ ሺͳ െ ߣሻܺଵ ,Ͳ ߣ ͳ also belongs to X. Definition: A point ܺ א ݔis an extreme point of the convex set X iff there do not exist points ݔଵ ǡ ݔଶ ǡ ݔଵ ് ݔଶ in the set such that ݔൌ ሺͳ െ ߣሻݔଵ ߣݔଶ , Ͳ ൏ ߣ ൏ ͳ. Theorem 1.13.1: An extreme point is a boundary point of the set. Proof: Let ݔ be an interior point of X. Then there exists an >0 such that every point in the -nbh of ݔ is in X. Let ݔଵ zݔ be a point in the -nbh ofݔ . Consider the point ݔଶ ൌ െݔଵ ʹݔ , หݔଶ െ ݔ ห ൌ หݔଵ െ ݔ ห; then ݔଶ is ଵ
in the -nbh. Furthermore ݔ = ሺݔଵ ݔଶ ሻ, and hence ݔ is not an extreme ଶ point. Theorem 1.13.2: A hyperplane is a convex set. Proof: Consider the hyperplane ܼ ൌ ܥ′ ܺ. Let ܺଵ and ܺଶ be on this hyperplane. Then ܥ′ ܺଵ ൌ ܼ ൌ ܥ′ ܺଶ . Now the convex linear combination ܺଷ ൌ ߣܺଶ ሺͳ െ ߣሻܺଵ , for Ͳ ߣ ͳ, is also on the hyperplane, since ܥ′ ܺଷ ൌ ܥ′ ሾߣܺଶ ሺͳ െ ߣሻܺଵ ሿ ൌ ߣܺܥଶ ሺͳ െ ߣሻ ܥ′ ܺଵ ൌ ߣܼ ሺͳ െ ߣሻܼ =Z Similarly, it can be seen that open and closed half spaces are convex sets. Theorem 1.13.3: The intersection of two convex sets is also convex.
Introduction
53
Proof: Consider two convex sets X1 and X2 , and let ݔଵ ǡ ݔଶ be any two distinct points in the setܺଷ ൌ ܺଵ ܺ תଶ. (If there is only one point in ܺଷ then we know that a point is convex.) Since ݔଵ ǡ ݔଶ X1, ݔො ൌ ߣݔଶ ሺͳ െ ߣሻݔଵ ܺ אଵ , Ͳ ߣ ͳǤ Similarly ݔො א ܺଶ ǡ Ͳ ߣ ͳǤ Thus ݔො ܺ אଵ ܺ תଶ ൌ ܺଷ and X3 is convex. Theorem 1.13.4: The set C of all convex combinations of a finite number of points ݔଵ ǡ ݔଶ ǡ Ǥ Ǥ Ǥ ǡ ݔ is convex. Proof: ܥൌ ൛ݔห ݔൌ σଵ ߣ ݔ ǡ ߣ Ͳǡ ݅ ൌ ͳǡʹǡ ǥ ǡ ݊ǡ σ ߣ ൌ ͳൟ. Let ݔሺଵሻ and ݔሺଶሻ ܥ א. We have to show that ܻ ൌ ߣ ݔሺଵሻ ሺͳ െ ߣሻ ݔሺଶሻ ܥ אfor all ߣ, Ͳ ߣ ͳǤ Let
ݔሺଵሻ ൌ σଵ ߙ ݔ Ǣ ߙ ͲǢ ݅ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݊Ǣ σ ߙ ൌ ͳǤ ݔሺଶሻ ൌ σୀଵ ߚ ݔ Ǣ ߚ Ͳ; i
1,2,..., n; σ ߚ ൌ ͳǤ
Then ܻ ൌ ߣ ݔሺଵሻ ሺͳ െ ߣሻ ݔሺଶሻ ൌ ߣൣσଵ ߙ ݔ ൧ ሺͳ െ ߣሻൣσଵ ߚ ݔ ൧
ൌ ሾߣߙ ሺͳ െ ߣሻߚ ሿ ݔ ଵ
ൌ ݎ ݔ Ǣ Ǥ ଵ
where ݎ ൌ ߣߙ ሺͳ െ ߣሻߚ ͲǢ ݅ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݊; and σଵ ݎ ൌ ߣ σଵ ߙ ሺͳ െ ߣሻ σଵ ߚ ൌ ߣ ሺͳ െ ߣሻ ൌ ͳ, so that ܻis a c.l.c. of ݔଵ ǡ ݔଶ ǡ Ǥ Ǥ Ǥ ǡ ݔ . Thus ܻ ܥ אand hence C is convex. Definition: The convex hull of a set A is the intersection of all convex sets which contain A. (It is indeed the smallest convex set containing A.) Definition: The convex hull of a finite number of points is called the convex polyhedron of these points. (Sometimes also called convex polylops). It follows that the convex hull of a finite number of points ݔଵ ǡ ݔଶ ǡ Ǥ Ǥ Ǥ ǡ ݔ is the set of all convex combinations of ݔଵ ǡ ݔଶ ǡ Ǥ Ǥ Ǥ ǡ ݔ . It is given by
54
Chapter 1
ܺ ൌ ሼ ݔΤ ݔൌ σ ߣ ݔ ǡ ߣ Ͳǡ σ ߣ ൌ ͳ ሽ; conversely, any point in a convex polyhedron can be represented as a convex combination of the extreme points of the polyhedron. Definition: The convex polyhedron spanned by n+1 points in En which do not lie in a hyperplane is called a simplex. In E2, a triangle and its interior form a simplex. Definition: The first partial derivative of ݂ሺݔሻ with respect to any component ݔ of ݔis defined by ݈݅݉
௱௫ೕ ՜
ሺ௫ା௱௫ೕ ሻିሺ௫ሻ ௱௫ೕ
డ
and is denoted by
డ௫ೕ
.
Definition: If the first partial derivatives of ݂exists, then the gradient of ݂ at ݔis defined to be the vector, grad ݂ ൌ ߂݂ሺݔሻ ൌ ቀ
డሺ௫ሻ డ௫భ
ǡڮǡ
డሺ௫ሻ ′ డ௫
ቁ
Definition: If the second partial derivatives of ݂exist, we define the Hessian matrix of f at x to be the ݊ ൈ ݊ matrix H(x) whose (i,j)th element is given by డమ ሺ௫ሻ డ௫ డ௫ೕ
, that is
ܪሺݔሻ ൌ ൬
డమ ሺ௫ሻ డ௫ డ௫ೕ
൰.
If the first partial derivatives of f exist and are continuous at x, we say ݂is continuously differentiable at x. Similarly, if the second partial derivatives of ݂exist and are continuous at x, we say f is twice continuously differentiable; Note that the Hessian matrix H(x) is a symmetric matrix. Definition: A differentiable function ݂ǣ ܧ ՜ ܧଵ is said to be pseudoconvex if for any ݔǡ ܧ א ݕ ߘ݂ሺݔሻ′ ሺ ݕെ ݔሻ Ͳ ֜ ݂ሺݕሻ ݂ሺݔሻ Definition: A function ݂ǣ ܧ ՜ ܧଵ is said to be quasi convex if given ݔଵ and ݔଶ in ܧ and any T, 0dTd1, ݂ሾߠݔଵ ሺͳ െ ߠሻݔଶ ሿ ݔܽܯሾ݂ሺݔଵ ሻǤ ݂ሺݔଶ ሻ Definition: Let ܧ א ݔ and f(x) be a differentiable real valued function. Let ݕbe a unit vector in En. Then tY and X+tY are also vectors in En, where tR.
Introduction
55
The directional derivative of f(x) in the direction of ݕis defined as ݈݅݉ ௧՜
ሺ௫ା௧௬ሻିሺ௫ሻ ௧
ൌ ݕ′ ߂݂ሺݔሻ
1.14 CONVEXITY AND CONCAVITY OF FUNCTION Definition: A function f(x) is said to be convex over a convex set X in ܧ , if for any two points ݔଵ and ݔଶ in X and for all O, 0 d O d1, ݂ሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ ߣ݂ሺݔଶ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ Definition: A function f(x) is said to be strictly convex if, for any two distinct pointsݔଵ Ƭݔଶ ܺ אand for every scalar O, where Ͳ ൏ ߣ ൏ ͳǡ Ͳ ൏ ߣ ൏ ͳǡ ݂ሾߣݔଵ ሺͳ െ ߣሻݔଶ ሿ ൏ ߣ݂ሺݔଵ ሻ ሺͳ െ ߣሻ݂ሺݔଶ ሻ. When ࢌ is a function of a single variable ࢞ǡ the above inequality can be illustrated by the graph of a strictly convex function drawn in Fig. 1.10 below.
Fig 1.10: A strictly convex function Definition: A function f(x) is said to be concave over a convex set ܺ if, for any two points ݔଵ & ݔଶ ܺ אand for all O, 0 d O d 1, ݂ሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ ߣ݂ሺݔଶ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ
56
Chapter 1
i.e., if the line segment joining the two points lies entirely below or on the graph of f(x) as shown Fig. 1.11.
Fig. 1.11. A strictly concave function
The function f(x) is strictly concave if strict inequality holds in the above, for any x1 z x2. Result 1. A linear function is both convex and concave onܴ . Proof:
Let ݔ ǡ ݔଵ ܴ א . Then ݂ሾߣݔ ሺͳ െ ߣሻݔଵ ሿ ൌ ߣሾܿݔ ݀ሿ ሺͳ െ ߣሻሾܿݔଵ ݀ሿ = ߣ݂ሺݔ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ
for all O.
Result 2. ࢟ ൌ ห࢞ห is convex. Proof: ȁߣݔଵ ሺͳ െ ߣሻݔଶ ȁ ߣȁݔଵ ȁ ሺͳ െ ߣሻȁݔଶ ȁ for 0 d O d 1. Or
ȁߣݔଵ ሺͳ െ ߣሻݔଶ ȁ ȁߣݔଵ ȁ ȁሺͳ െ ߣሻݔଶ ȁ , since O,(1-O) t 0.
Which is true from the inequality ȁܽ ܾȁ ȁܽȁ ȁܾȁ Result 3. y = x2 is convex.
Introduction
Proof:
57
We want to show that ሾߣݔଵ ሺͳ െ ߣሻݔଶ ሿଶ ߣݔଵ ଶ ሺͳ െ ߣሻݔଶ ଶ ,
or
ߣଶ ݔଵ ଶ ሺͳ െ ߣሻଶ ݔଶ ଶ ʹߣሺͳ െ ߣሻݔଵ ݔଶ ߣݔଵ ଶ ሺͳ െ ߣሻݔଶ ଶ ,
or
ߣሺͳ െ ߣሻݔଵ ଶ ߣሺͳ െ ߣሻݔଶ ଶ െ ʹߣሺͳ െ ߣሻݔଵ ݔଶ Ͳ,
or
ߣሺͳ െ ߣሻሺݔଵ െ ݔଶ ሻଶ Ͳ,
which is true since O,(1-O) t 0. Result 4. y =x12 + x22 is convex (the sum of two convex functions is convex). Result 5. The set of all solutions to the system ݃ ሺݔሻ Ͳ, ݃ ሺݔଶ ሻ Ͳǡ ݅ ൌ ͳǡʹǡ ǥ ǡ ݉is a convex set when each ݃ ሺݔሻ is a convex function. Proof: Let ݔଵ and ݔଶ satisfy the system; then ݃ ሺݔଵ ሻ Ͳ and ݃ ሺݔଶ ሻ Ͳǡ ݅ ൌ ͳǡʹǡ ǥ ǡ ݉ But as ݃ ሺݔሻ are convex for all i, we have ݃ ሺݔොሻ ߣ݃ ሺݔଵ ሻ ሺͳ െ ߣሻ݃ ሺݔଶ ሻ, where, ݔො ൌ ߣݔଵ ሺͳ െ ߣሻݔଶ , Thus݃ ሺݔොሻ ߣǤ Ͳ ሺͳ െ ߣሻǤͲ ൌͲ Result 6: If fk(x) (k=1,2,…,r) are convex functions defined on a convex set X and Ok t 0 (k=1,2,…,r), then σୀଵ ߣ ݂ ሺݔሻ is also a convex function on X. Proof: We have for x1, x2 X and 0 d t d 1. σୀଵ ߣ ݂ ሾݔݐଵ ሺͳ െ ݐሻݔଶ ሿ σ ߣ ሾ݂ݐ ሺݔଵ ሻ ሺͳ െ ݐሻ݂ ሺݔଶ ሻሿ ൌ σሾߣݐ ݂ ሺݔଵ ሻ ሺͳ െ ݐሻߣ ݂ ሺݔଶ ሻሿ ൌ ݐ ߣ ݂ ሺݔଵ ሻ ሺͳ െ ݐሻ ߣ ݂ ሺݔଶ ሻ
58
Chapter 1
Result 7: A differentiable function ݂ሺݔሻ is convex over a convex set ܺ if, for any two points x1 & x2 ܺ א, we have ݂ሺݔଶ ሻ ݂ሺݔଵ ሻ ቀ
ௗሺ௫ሻ ௗ௫
ቁ
௫ୀ௫భ
ሺݔଶ െ ݔଵ ሻ
Proof: As f(x) is convex, we have for any two points ݔଵ Ƭݔଶ ܺ אǡ ݂ሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ ߣ݂ሺݔଶ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ or ݂ሾݔଵ ߣሺݔଶ െ ݔଵ ሻሿ ݂ሺݔଵ ሻ ߣሾ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻሿ ሺ௫భ ାఒሺ௫మ ି௫భ ሻሻିሺ௫భ ሻ
or ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ
ఒሺ௫మ ି௫భ ሻ
൨ ሺݔଶ െ ݔଵ ሻ
By putting ℎ ൌ ߣሺݔଶ െ ݔଵ ሻ, the inequality can be written as ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ
ሺ௫భ ାℎሻିሺ௫భ ሻ ℎ
ሺݔଶ െ ݔଵ ሻ.
By taking the limit as ho0, the above inequality becomes ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ ቀ
ௗሺ௫ሻ ௗ௫
ቁ
௫ୀ௫భ
ሺݔଶ െ ݔଵ ሻ.
Result 8: Let f be differentiable on En. Then f defined over a convex set X is convex iff for any ݔଵ and ݔଶ X, ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ ߘ݂ሺݔଵ ሻሺݔଶ െ ݔଵ ሻ where ߘ݂ሺݔሻ ൌ ቀ
డ డ௫భ
ǥ
డ డ௫
ቁ
Proof: Nec: Let f be convex. Then for all x1 and x2 and 0 d O d 1, ݂ሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ ߣ݂ሺݔଶ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ i.e.
ଵ ఒ
ൣ݂ሺݔଵ ߣሺݔଶ െ ݔଵ ሻሻ െ ݂ሺݔଵ ሻ൧ ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ
(1.14.1)
On expanding ݂ሾݔଵ ߣሺݔଶ െ ݔଵ ሻሿ by Taylor’s theorem, we get ݂ሾݔଵ ߣሺݔଶ െ ݔଵ ሻሿ ൌ ݂ሺݔଵ ሻ ߣߘ݂ሾݔଵ ߣߠሺݔଶ െ ݔଵ ሻሿሺݔଶ െ ݔଵ ሻ Then inequality (1.14.1) becomes,
Introduction
59
ߘ݂ሾݔଵ ߣߠሺݔଶ െ ݔଵ ሻሿሺݔଶ െ ݔଵ ሻ ݂ሺݔଶ െ ݔଵ ሻ. On taking the limit as Oo0, we get (1.14.2)
ߘ݂ሺݔଵ ሻሺݔଶ െ ݔଵ ሻ ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ Suff:
Let ݔො ൌ ߣݔଵ ሺͳ െ ߣሻݔଶ .
Substituting ݔො for x1 in (2), we get ݂ሺݔଶ ሻ ݂ሺݔොሻ ߘ݂ሺݔොሻሺݔଶ െ ݔොሻ ൌ ݂ሺݔොሻ ߘ݂ሺݔොሻߣሺݔଶ െ ݔଵ ሻ .
(1.14.3)
Substituting ݔොfor ݔଵ and ݔଵ for x2 in (2), we get ݂ሺݔଵ ሻ ݂ሺݔොሻ ߘ݂ሺݔොሻሺݔଵ െ ݔොሻ (1.14.4)
ൌ ݂ሺݔොሻ ߘ݂ሺݔොሻሺͳ െ ߣሻሺݔଵ െ ݔଶ ሻ. Multiplying (3) by ሺ െ ࣅሻ and (4) by ࣅwe get
ߣ݂ሺݔଶ ሻ ሺͳ െ ߣሻ݂ሺݔଵ ሻ ሺͳ െ ߣሻ݂ሺݔොሻ ߣሺͳ െ ߣሻߘ݂ሺݔොሻሺݔଶ െ ݔଵ ሻ + ߣ݂ሺݔොሻ െ ߣሺͳ െ ߣሻߘ݂ሺݔොሻሺݔଶ െ ݔଵ ሻ ൌ ݂ሺݔොሻ. Result 9: A twice differentiable function ݂ሺݔሻ is convex if the Hessian ߲ ଶ ݂ሺݔሻ matrix ܪሺݔሻ ൌ ൘߲ ݔ߲ ݔ൨ is positive semi-definite. Proof: From Taylor’s theorem, we have ݂ሺ כ ݔ ℎሻ ൌ ݂ሺ כ ݔሻ σୀଵ ℎ
డ డ௫
ଵ ଶǨ
σ σ ℎ ℎ
డమ డ௫ డ௫ೕ
where 0 < T < 1 . By letting כ ݔൌ ݔଵ , כ ݔ ݄ ൌ ݔଶ and ℎ ൌ ݔଶ െ ݔଵ ǡ (1.14.5) can be written as
อ
ݔൌ כ ݔ ߠǤ ℎ (1.14.5)
60
Chapter 1 ଵ
݂ሺݔଶ ሻ ൌ ݂ሺݔଵ ሻ ߘ݂ ′ ሺݔଵ ሻሺݔଶ െ ݔଵ ሻ ሺݔଶ െ ݔଵ ሻ′ ܪ൛ݔଵ ߠሺݔଶ െ ଶ
ݔଵ ൟሺݔଶ െ ݔଵ ሻ. Since ܪሺݔሻ is positive semi-definite, the third term on RHS is Ͳ, and thus we have ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ ߘ݂ ′ ሺݔଵ ሻሺݔଶ െ ݔଵ ሻ. Result 10. Any local minimum of a convex function f(x) is a global minimum. Proof: We prove this result by contradiction. Suppose that there are two different local minima, say, ݔଵ and ݔଶ , and let ݂ሺݔଶ ሻ ൏ ݂ሺݔଵ ሻ. As ݂ሺݔሻ is convex, we have ݂ሺݔଶ ሻ െ ݂ሺݔଵ ሻ ߘ݂ ′ ሺݔଵ ሻሺݔଶ െ ݔଵ ሻ. Putting ݔଶ ൌ ݔଵ ൌ ݏ, we get ߘ݂ ᇱ ሺݔଵ ሻ ݏ Ͳ . This indicates that the value of the function f can be decreased further by moving in the direction ݏൌ ݔଶ െ ݔଵ from the pointݔଵ . This contradicts the assumption that ݔଵ is a local minimum. Result 11. Let ܧ א ݔ . If the quadratic form ݂ሺݔሻ ൌ ݔᇱ ݔܣis positive semidefinite, then ݂ሺݔሻ is a convex function. Proof: Let ݔଵ and ݔଶ be any two points in En, and let ݔො ൌ ሺͳ െ ߣሻݔଵ ߣݔଶ ǡ Ͳ ߣ ͳ. As ݂ሺݔሻ ൌ ݔ′ ݔܣis positive semidefinite, we have ݔᇱ ݔܣ Ͳ for any ܧ א ݔ ; then ݔ′ ݔܣො ൌ ሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ′ ܣሾߣݔଶ ሺͳ െ ߣሻݔଵ ሿ ൌ ሾݔଵ ߣሺݔଶ െ ݔଵ ሻሿ′ ܣሾݔଵ ߣሺݔଶ െ ݔଵ ሻሿ ൌ ݔ′ଵ ݔܣଵ ʹߣሺݔଶ െ ݔଵ ሻ′ ݔܣଵ ߣଶ ሺݔଶ െ ݔଵ ሻ′ ܣሺݔଶ െ ݔଵ ሻ ݔ′ଵ ݔܣଵ ʹߣሺݔଶ െ ݔଵ ሻ′ ݔܣଵ ߣሺݔଶ െ ݔଵ ሻ′ ܣሺݔଶ െ ݔଵ ሻ
Introduction
61
since ܣis positive semidefinite, and, Ͳ ߣ ͳ ൌ ݔ′ଵ ݔܣଵ ʹߣሺݔଶ െ ݔଵ ሻ′ ݔܣଵ ߣሺݔଶ െ ݔଵ ሻ′ ݔܣଶ െ ߣሺݔଶ െ ݔଵ ሻ′ ݔܣଵ ൌ ݔ′ଵ ݔܣଵ ߣ ݔ′ ଶ ݔܣଵ െ ߣ ݔ′ଵ ݔܣଵ ߣ ݔ′ ଶ ݔܣଶ െ ߣ ݔ′ଵ ݔܣଶ ൌ ߣ ݔ′ ଶ ݔܣଶ ሺͳ െ ߣሻ ݔ′ଵ ݔܣଵ Ǥ Result 12. Let ܣൌ ሺሺܽ ሻሻ. Then the quadratic form ݔ′ ݔܣis positive definite iff ܽଵଵ
ܽଵଵ Ͳ,ቚܽ ଶଵ
ൈ
ܽଵଵ ܽଵଶ ܽ ܽଶଶ ቚ Ͳǡ อ ଶଵ ܽଷଵ
ܽଵଶ ܽଶଶ ܽଷଶ
ܽଵଵ ܽଵଷ ܽଶଷ อ Ͳǡ …,อ ڭ ܽଷଷ ܽଵ
ǤǤǤǤǤ ǤǤǤǤǤ
ܽଵ ڭอ Ͳ, ܽ
i.e. if all the principal minors are positive. 1.15 OPTIMIZATION The systematic approach to decision-making (optimization) generally involves three closely-interrelated stages. The first stage towards optimization is to express the desired benefits and the required efforts and to collect the other relevant data, as a function of certain variables that may be called “decision variables”. The second stage continues the process with an analysis of the mathematical model and selection of the appropriate numerical technique for finding the optimal solution. The third stage consists of finding an optimal solution, generally with the help of a computer. The existence of optimization problems can be traced back to the middle of the eighteenth century. The work of Newton, Lagrange and Cauchy in solving certain types of optimization problems arising in physics and geometry by using differential calculus methods and calculus of variations was pioneering. These optimization techniques are known as classical optimization techniques and can be generalized to handle cases in which the variables are required to be non-negative and constraints may be inequalities, but these generalizations are primarily of theoretical value and do not usually constitute computational procedures. However, in some simple situations they can provide solutions which are practically acceptable. The optimization problems that have been posed and solved in recent years have tended to become more and more elaborate, not to say abstract. Perhaps the most outstanding example of the rapid development of
62
Chapter 1
optimization techniques occurred with the introduction of dynamic programming by Bellman in 1957 and of the maximum principle by Pontryagin in 1958; these techniques were designed to solve the problems of the optimal control of dynamic systems. The simply stated problem of maximizing or minimizing a given function of several variables has attracted the attention of many mathematicians over the past 60 years or so in developing solution techniques using mathematical programming (see below). 1.16 MATHEMATICAL PROGRAMMING A large number of real-life optimization problems that are usually not solvable by classical optimization methods are formulated as mathematical programming problems. There has been considerable advancement in the development of the theory and algorithms for solving various types of mathematical programming problems. The first mathematical programming problem was considered by economists (such as Von Neumann) in the early 1930s as the problem of optimum allocation of limited resources. Leontief in 1951 showed a practical solution method for linear type problems when he demonstrated his input-output model of an economy. These economic solution procedures did not provide optimal solutions, but only a feasible solution, given the model’s linear constraints. In 1941, Hitchcock formulated and solved the transportation type problem, which was also accomplished by Koopmans in 1949. In 1942, Kantorovitch also formulated the transportation problem but did not solve it. In 1945, the economist G. J. Stigler formulated and solved the “minimum cost diet” problem. During World War II, a group of researchers under the direction of Marshall K. Wood sought to solve an allocation type problem for the United States Air Force team SCOOP (Scientific Computation of Optimum Programs). One of the members of this group, George B. Dantzig, formulated and devised a solution procedure in 1947 for Linear Programming Problems (LPP). This solution procedure, called the Simplex method, marked the beginning of the field of study called mathematical programming. During the 1950s, other researchers such as David Gale, H.W. Kuhn and A.W. Tucker contributed to the theory of duality in LP. Others such as Charnes and Cooper contributed numerous LP applications illustrating the use of mathematical programming in managerial decision-making.
Introduction
63
A general mathematical programming problem can be stated as follows:
where
ݔܽܯሺ݊݅ܯݎሻܼ ൌ ݂ሺܺሻ ܵ݃ݐݐ݆ܾܿ݁ݑ ሺܺሻ ݎൌ ݎ ܾ ݅ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݉ǡ
(1.16.1) (1.16.2)
ܺ ൌVector of unknown variables that are subject to the control of decision maker.
ܼ ൌValue of the objective function which measures the effectiveness of the decision choice.
݃ ሺܺሻ = The function representing the ݅ ௧ constraint, ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉ ܾ ൌ available ݅ ௧ productive resource in limited supply, ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉ The objective function (1.16.1) is a mathematical equation describing a functional relationship between various decision variables and the outcome of the decisions. The outcome of managerial decision-making is the index of performance, and is generally measured by profits, sales, costs or time. Thus, the value of the objective function in mathematical programming is expressed in monetary, physical or some other terms, depending on the nature of the problem. The objective function ݂ሺܺሻ and the constraining functions ݃ ሺܺሻ may be either linear or nonlinear functions of variables. The objective of the decision maker is to select the values of the variables so as to optimize the value of the objective function ܼunder the given constraints. If ݂ሺܺሻ and ݃ሺܺሻ are both linear functions ofܺ, then the problem (1.116.1– 1.16.2) represents a linear programming problem (LPP). When the objective function to be minimized (maximized) is convex and the set defined by the constraining inequalities (1.16.2) is also convex, the problem is called a convex programming problem (CPP), otherwise it is a non-convex programming problem. If some or all the components of the vector X are required to be integers, then we call it an integer programming problem (IPP). The methods of linear programming, non-linear programming and integer programming are discussed below.
64
Chapter 1
1.17 LINEAR PROGRAMMING TECHNIQUES The general approach to the modelling and solution of linear mathematical models, and more specifically those models that seek to optimize a linear measure of performance, under linear constraints is called linear programming or more appropriately linear optimization. The general form of the (single objective) Linear Programming Problem (LPP) is given as finding the decision variablesݔଵ ǡ ݔଶ ǡ Ǥ Ǥ Ǥ ǡ ݔ which maximize or minimize a linear function subject to some linear constraints and non-negativity restrictions on the decision variables. The mathematical model for a general linear programming problem is stated as follows:
ݔܽܯሺ݊݅ܯݎሻܼ ൌ ܿ ݔ
ୀଵ
ܵ ݐݐ݆ܾܿ݁ݑ ܽ ݔ ݎൌ ݎ ܾ ݅ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݉ ୀଵ
ݔ Ͳ ݆ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ where ܿ ǡ ܽ and ܾ (called parameters of the LPP) are known constants for all ݅ and݆. Linear Programs have turned out to be appropriate models for solving practical problems in a wide variety of fields. G. B. Dantzig first conceived the linear programming problem in 1947. Koopman and Dantzig coined the name ‘Linear Programming’ in 1948, and Dantzig proposed an effective ‘Simplex method’ for solving linear programming problems in 1949. Dantzig’s simplex method solves a linear program by examining the extreme points of a convex feasible region. Linear programming is often referred to as a uni-objective constrained optimization technique. Uniobjective refers to the fact that linear programming problems seek to either maximize an objective such as profit or minimize the cost. The maximization of profit or minimization of cost is always constrained by the real-world limitations of finite resources. Linear programming gives decision makers an opportunity to combine the constraint limitations of the decision environment with the interaction of the variables they are seeking to optimize. New techniques for solving LPP are still being developed. Decades of work on Dantzig’s Simplex method had failed to yield a polynomial-time variant. The first polynomial-time LP algorithm, called the Ellipsoid algorithm and
Introduction
65
developed by Khachiyan (1979), opened up the possibility that noncombinatorial methods might beat combinatorial ones for linear programming. Karmarkar (1984) developed a new polynomial-time algorithm, which often outperformed the Simplex method by a factor of 50 on real world problems. Some other polynomial-time algorithms developed by Reneger (1988), Gonzaga (1989), Monteiro and Adler (1989), Vaidya (1990), Reha and Tutun (2000) are faster than Karmarkar’s algorithm. 1.18 NONLINEAR PROGRAMMING TECHNIQUES: A mathematical model that seeks to optimize a non-linear measure of performance is called a non-linear program. Every real-world optimization problem always has a non-linear form that becomes a linear programming problem after a slight modification. Non-linear programming emerges as an increasingly important tool in economic studies and in operations research. Non-linear programming problems arise in various disciplines, such as engineering, business administration, physical sciences and mathematics – or in any other area where decisions must be taken in some complex situation that can be represented by a mathematical model: ݂݁ݖ݅݉݅݊݅ܯሺݔሻǡ ܵ݃ݐݐ݆ܾܿ݁ݑ ሺݔሻ Ͳǡ ݅ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ ǡ ݉ where all or some of the functions ݂ሺݔሻ and݃ ሺݔሻǡ ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉ are non-linear. Interest in nonlinear programming problems developed simultaneously with the growing interest in linear programming. In the absence of general algorithms for the nonlinear programming problem, we can explore the possibilities of approximating the solution by linearization. The nonlinear functions of a mathematical programming problem are replaced by piecewise linear functions. These approximations may be expressed in such a way that the whole problem is turned into linear programming. Kuhn and Tucker (1951) developed the necessary conditions (which became sufficient also under special circumstances) satisfied by an optimal solution of a non-linear programming problem. These conditions, known as K-T conditions, laid the foundations for a great deal of later research and development in non-linear programming techniques. To date, no single technique is available which can provide an optimal solution to every NLPP like Simplex method for LPP. However, different methods are available for special types of NLPPs. Using the K-T conditions, Wolfe (1959) transformed the convex quadratic programming problem into an equivalent LPP to which the Simplex method could be applied with some additional restriction on the vectors entering the
66
Chapter 1
basis at various iterations. Some other techniques for solving quadratic programming problems are due to Beale (1959), Lemke (1962), Graves (1967), Fletcher (1971), Aggarwal (1974a, 1974b), Finkbeiner and Kall (1978), Arshad, et al. (1981), Ahsan et al. (1983), Todd (1985) and Fukushima (1986). Some further work is due to Ben-Daya and Shetty (1990), Yuang (1991), Wei (1992), Benzi (1993), Fletcher (1993), Ansteichere and Terlaky (1994) , Phi (2001), Anstreicher and Brixius (2001) and others. Among other NLPP methods, there are gradient methods and gradient projection methods. Like the Simplex method of LPP, these are iterative procedures in which at each step we move from one feasible solution to another in such a way that the value of the objective function is improved. Rosen (1960, 1961), Kelley (1960), Goldfarb (1969), Du and Zhang (1990), Lai et al. (1993) and others gave gradient projection methods for non-linear programming with linear and non-linear constraints. Zangwill (1969) gave a unified approach for solving NLP problems. He has shown that the methods of NLPP differ only in choosing the direction and the step length towards the optimum. Most of the methods of NLPP converge easily to an optimal solution when the functions involved are convex. Fiacco and McCormick (1968) developed penalty methods for convex programming problems. The methods for non-convex programming problems are intricate as these may possess a large number of local minima, and it may not to be easy to enumerate all of these in order to find the global minimum. The pioneering work in this field was done by Tui (1964), who developed a cutting plane method for minimizing a concave function with linear constraints. The other works among many in this direction are due to Ritter (1966), Zwart (1974), Hoffman (1976), Arshad et al. (1981), Khan and Faisal (2001) and Shakeel et.al. (2009). 1.19 INTEGER PROGRAMMING Any decision problem (with an objective to be maximized or minimized) in which the decision variables must assume non-fractional or discrete values may be classified as an integer optimization problem. In general, an integer problem may be constrained or unconstrained, and the functions representing the objective and constraints may be linear or non-linear. An integer problem is classified as linear if by relaxing the integer restriction on the variables, the resulting functions are strictly linear.
Introduction
67
The general mathematical model of an integer programming problem can be stated as: Maximize (or Minimize) ܼ ൌ ݂ሺܺሻ Subject to
݃ ሺܺሻሼ ݎൌ ݎሽܾ ǡ ݅ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ Ǥ Ǥ ǡ ݉. ݔ Ͳǡ
݆ ൌ ͳǡʹǡ Ǥ Ǥ Ǥ Ǥ Ǥ ݊.
ݔ is an integer for ݆ ܫ ك ܬ אൌ ሺͳǡʹǡ Ǥ Ǥ Ǥ Ǥ ǡ ݊ሻ where ܺ ൌ ሺݔଵ ǡ Ǥ Ǥ Ǥ ǡ ݔ ሻ is an n-component vector of decision variables. If J = I – that is, all the variables are restricted to be integers – we have an all (or pure) integer programming problem (AIPP). Otherwise, if ܫ ؿ ܬǡ i.e. not all the variables are restricted to be integers, we have a mixed integer programming problem (MIPP). In most practical situations, the values of the decision variables are required to be integers. Dantzig, Fulkerson and Johnson (1954), Markowitz and Manne (1957), Dantzig (1958, 1959) and others discussed the integer solutions to some special purpose LPPs. Gomory (1956) first suggested a systematic method to obtain an optimal integer solution to an AIPP. Later, Gomory (1960, 1963) extended the method to deal with the more complicated case of MIPP, when only some of the variables are required to be integers. These algorithms are proved to converge to the optimal integer solution in a finite number of iterations making use of familiar dual Simplex method. These are called the “cutting plane algorithms” because they mainly introduce the clever idea of constructing “secondary” constraints which, when added to the optimum (non-integer) solution, will effectively cut the solution space towards the required result. Cutting plane techniques are discussed in detail in Chapter 4. Another important approach called the “branch and bound” technique for solving the all-integer, mixed-integer and 0-1 integer problems, originated the straightforward idea of enumerating implicitly all feasible integer solutions. A general algorithm for solving all-integer and mixed integer LPPs was developed by Land and Doig (1960). Later, Dakin (1965) proposed another interesting variation of the Land and Doig algorithm. Also, Balas (1965) introduced an interesting enumerative algorithm for LPP
68
Chapter 1
with the variables having the value zero or one and named it the zero-one programming problem. These methods are the subject matter of Chapter 3. 1.20 APPLICATIONS OF MATHEMATICAL PROGRAMMING Mathematical programming (MP) is widely used in solving a number of problems arising in military, economics, industry, business management, engineering, inventory management, etc. The early applications for mathematical programming were concerned with military planning and coordination among various projects and efficient utilization of scarce resources. During the last six decades, mathematical programming techniques have been applied successfully to almost every possible sphere of human activity. Some of the applications for mathematical programming, especially linear and integer linear programming, are summarized below. There are many problems that require optimum allocation of limited resources – e.g. raw material, manpower, machine time or money – to attain the given objective. Some of the numerous contributors to mathematical programming techniques in solving resources allocation problems are Clark (1961), Everett (1963), Takahashi (1966), Arabeyre, Fearnley et al. (1969), Lee (1972), Silverman, Steuer and Wishman (1988), Namikawa and Ibaraki (1991) and Miettinen (2002). Mathematical programming techniques have been successfully applied to solve optimization problems arising in several industries such as oil, textiles and sugar. Some of the industrial applications for MP-techniques are due to Charnes, Cooper and Mellon (1952), Manne (1956), Garvin et al. (1957), Charnes and Cooper (1961), Catchpole (1962), Kapur (1963), Camm, Dearing and Tadisina (1987), Sukhorrukov (1988), Serafini and Speranza (1990), Rya (2004) and others. In the field of production scheduling, planning and inventory control, mathematical programming techniques have been applied by Bowman (1956), Kantrovich (1958), Deboer and Vandersloot (1962), Jones and Rope (1964), Silver (1967), Von Lanzenauer (1970), Florian (1988), Magnanti and Vachani (1990), Alidaee, Kochenberger and Ahmadian (1994), Van Hoesal et al. (1994) and several others.
CHAPTER 2 LINEAR PROGRAMMING
2.1 INTRODUCTION Linear programming is a process of transforming a real-life situation into a mathematical model consisting of a linear objective function which has to be maximized or minimized under a finite number of linear constraints and then developing algorithms with which the mathematical model can be solved. The great variety of situations to which linear programming can be applied is remarkable. Most problems dealing with the allocation of scarce resources among various activities (products) in an optimal manner can be transformed into a linear programming format. Some examples are the allocation of production facilities to various products, the allocation of airplane fuel to bomber runs, the problem of production scheduling and so on. The adjective “linear” in linear programming means that all the mathematical functions used in the model are required to be linear functions. The word “programming” is essentially a synonym for planning. Thus linear programming involves the planning of activities so as to get the best return among all feasible alternatives. 2.2 THE LINEAR PROGRAMMING (LP) MODEL The general form of mathematical model representing a linear programming problem is the following. Find x1, x2,…, xn which maximize or minimize the linear function Z=c1x1+c2x2+...+cnxn
(2.2.1)
70
Chapter 2
under the linear conditions. a11x1+a12x2+...+a1nxn {≤ = ≥} b1 a21x1+a22x2+...+a2nxn {≤ = ≥} b2 . .
(2.2.2)
. am1x1+am2x2+...+amnxn {≤ = ≥} bm and the non-negativity restrictions x1 ≥ 0, x2 ≥ 0,..., xn ≥ 0
(2.2.3)
where the cj, aij and bi are given constants. The variables x1,x2,...,xn being solved for are called decision variables, with n being a finite positive number. The linear function in (2.2.1) is called the objective function. It may be noted that minimization of Z is equivalent to maximization of -Z. The conditions in (2.2.2) are called constraints. One and only one of the three signs ≤, = and ≥ holds in a constraint. A constraint with a ≥ sign may be converted to a ≤ sign simply by multiplying on both sides by -1. A maximizing LP model, with only ‘≤’ constraints and nonnegativity restrictions, can be written as Max Z=c1x1+c2x2+...+cnxn s.t. a11x1+a1x2+...+a1nxn≤b1 . . . am1x1 + am2x2 + ... + amnxn ≤ bm x1≥0,..., xn≥0 Using the summation sign “∑”, this can also be written as
Linear Programming
71
Max Z = σୀଵ ܿ ݔ s.t. σୀଵ ܽ ݔ ܾ , i=1,...,m, xj ≥ 0,
j=1,...,n.
A more convenient way is to express it in matrix notation. Let the upper case ‘′’ denote the transpose of a matrix. Then ‘′’ transposes a row vector into a column vector. Let c = (c1,c2,…,cn)′ , b = (b1,b2,…,bn)′ and
ܽଵଵ ܣൌ ǣ ܽଵ
ǤǤǤ ǤǤǤ
ܽଵ ǣ ൩ ܽ
The LP Model can now be written as Max z = c′ x s.t. Ax ≤ b
(2.2.4)
x≥0 We call this the standard form of an LP model. The matrix A is called the technology or coefficient matrix, c the objective vector and b the right-hand side vector of the model. An interpretation of this LP model in terms of the optimal allocation of limited resources to given competing activities can be seen as follows. Suppose that n items are to be produced by a firm. The decision variables x1, x2,…, xn represent the number of units of the respective items to be produced during a given period of time. For the jth product, cj represents the profit obtained per unit over the given period. The number of relevant search resources is m, so that each of the m linear inequalities corresponds to a restriction on the availability of one of these resources. bi is the amount of the ith resource available to the n products. aij is the amount of resource i consumed by each unit of product j. The non-negativity restrictions (xj≥0) assure that there cannot be negative production. A value of (x1, x2,…, xn) for which all of the restrictions (2.2.2) and (2.2.3)
72
Chapter 2
are satisfied is called a feasible solution. A feasible solution which maximizes Z in (2.2.1) (or minimizes it in the minimization case) is called an optimal solution. 2.3 GRAPHICAL PRESENTATION OF LP MODEL An LP problem with only two variables can be solved by presenting it on a graph. This graphical solution will also clarify certain geometrical characteristics of LP problems. Consider the following example in two decision variables x1 and x2. P1 :
Max Z= 2x1+3x2 s.t. x1+3x2 ≤ 18
(i)
x1+ x2 ≤ 8
(ii)
x1 ≤ 6
(iii) (2.3.1)
x2 ≤ 7
(iv)
x1 ≥ 0
(v)
x2 ≥ 0
(vi)
The inequalities (i) to (iv) and the non-negativities (v) and (vi) give the common shaded area ‘S’ shown in Fig. 2.1
Fig 2.1 Feasible region for the problem P1
Linear Programming
73
Fig 2.2 The various level lines showing the increasing values of the objective function
We first draw the line x1+3x2=18 and then shade the side of the line in which the points satisfy x1+3x2 ≤ 18. This is done for all the constraints and the non-negativities. The common area is enclosed by the points e1e2e3e4e5, which is the feasible region of the problem. The points e 1, e2,…,e5, are the vertices of the feasible region, which are easily calculated as e1= (0,6), e2=(3,5), e3=(6,2), e4=(6,0), e5=(0,0). It is also clear from the figure that the constraint (iv) can be ignored without changing the feasible region. Now we explore the points in S which maximize the value of Z. To this end we draw a number of level lines for the objective function Z= 2x1+3x2. Five such level lines are drawn in Fig. 2.2, corresponding to Z=1, 6, 12, 18 and 30. The arrow in the figure points in the direction of increasing values of the objective function Z. To find the optimal solution, we move the level line in the direction of the arrow, so that the value of Z increases. We stop moving the level line when it reaches the boundary of S. Further movement means that no point of the level line is in S. The last point beyond which the level line leaves the feasible region is e2= (3, 5). So the optimal solution of the given problem is x1=3, x2=5, with the optimal objective value as Z= 21.0. The fact that a vertex of the feasible region is optimal plays an important role in the development of the Simplex method for solving an LP problem. It may further be noted that the objective level line corresponding to Z=30, though higher than the optimal value Z=22.5, cannot be accepted to give a solution as no point on it is in the feasible region. Now if we move the level line back in the reverse direction of the arrow, the first feasible point to be
74
Chapter 2
met is again at e2. This fact will be utilized while developing the dual Simplex method in Section 2.12. 2.4 PROPERTIES OF FEASIBLE REGION OF AN LPP A feasible region is empty if the constraints are inconsistent. For example, if an LP problem contains the constraints x1≥5, x2≤4 then its feasible region is empty. A non-empty feasible region is called bounded if all the variables are bounded on the feasible region. A non-empty feasible region is called unbounded if at least one of the variables can take on arbitrarily large value on the feasible region. Examples of bounded and unbounded feasible regions along with an objective level surface are shown below.
x2
x2
x1
Fig. 2.4 (a) Bounded feasible region
x2
x1
Fig. 2.4 (b) Unbounded feasible region
x1
Fig. 2.4 (c) Unbounded feasible region
Fig 2.4 Bounded and unbounded feasible regions
In Fig. 2.4(a) the feasible region is bounded. In Figs. 2.4(b) and 2.4(c) the feasible regions are unbounded. It can be noted that in the case of Fig. 2.4(b), an optimal solution to the problem with maximization objective function ݖൌ െʹݔଵ ͵ݔଶ does not exist: the value of the objective function goes on increasing for increasing values of x2. However, in the case of Fig. 2.4(c) an optimal solution of this objective function exists even though the feasible region is unbounded. The general form of an LPP is as given in (2.2.1) through (2.2.3). In general it is more convenient to work with equalities rather than with inequalities in (2.2.2). So it is desirable to convert inequalities into equations. This is carried out by introducing some additional variables in the problem called slack or surplus variables. It is also advantageous to have all b i≥0. Let us suppose that we have
Linear Programming
75
ܽ ݔ ܾ ǡ ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉′ ୀଵ
ܽ ݔ ܾ ǡ ݅ ൌ ݉′ ͳǡ Ǥ Ǥ Ǥ ǡ ݉′′ ୀଵ
ܽ ݔ ൌ ܾ ǡ ݅ ൌ ݉′′ ͳǡ Ǥ Ǥ Ǥ ǡ ݉ ୀଵ
We introduce non-negative variables xn+i ( ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉′′) as follows:
ܽ ݔ ݔା ൌ ܾ ǡ ݅ ൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݉′ ୀଵ
ܽ ݔ െ ݔା ൌ ܾ ǡ ݅ ൌ ݉′ ͳǡ Ǥ Ǥ Ǥ ǡ ݉′′ ୀଵ
Let n+m′′=݊. Then the above set of m constraints in ݊ variables can be written as: ݔܣൌ ܾ where ݔൌ ሺݔଵ ǡ Ǥ Ǥ Ǥ ǡ ݔ ሻ′ ǡ ܾ ′ ൌ ሺܾଵ ǡ Ǥ Ǥ Ǥ ǡ ܾ ሻ′ ܽଵଵ ۍ ǣ ێ ᇲଵ ܽ ێ ܽێᇲ ାଵଵ ܣൌێ ǣ ێ ܽ ێᇴ ଵ ێǣ ܽ ۏଵ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
ܽଵ ǣ ܽᇲ ܽᇲ ǣ ܽᇴ ǣ ܽ
ͳ ǣ Ͳ Ͳ ǣ Ͳ ǣ Ͳ
Ͳ ǣ Ͳ Ͳ ǣ Ͳ ǣ Ͳ
ǤǤǤ Ͳ ǣ ǤǤǤ ͳ ǤǤǤ Ͳ ǣ ǤǤǤ Ͳ ǣ ǤǤǤ Ͳ
Ͳ ǣ Ͳ െͳ ǣ Ͳ ǣ Ͳ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
Ͳ ې ǣ ۑ Ͳۑ Ͳۑ ǣ ۑ ۑ െͳۑ ǣ ۑ Ͳے
The variables xn+i, i=1,..., m′ are called slack variables and the variables ݔାଵ ǡ ݅ ൌ ݉′ ͳǡ Ǥ Ǥ Ǥ ǡ ݉″ are called surplus variables. Let us assign a price of zero to the variables i=1,..., m′′. Then
76
Chapter 2
ݖൌ ܿଵ ݔଵ ڮ ܿ ݔ Ͳݔାଵ ڮ Ͳݔ =ܿ′ݔ where ܿ′ =(c1,..., c݊). The LPP (2.2.4) reduces to the form Max Z=ܿ′ݔ ܣҧݔҧ ܾ ࢞≥ 0 Thus we observe that any LP problem can be transformed to the standard form ݖൌ ܿ′ݔ s.t. ݔ ܣൌ ܾ ൈ
(2.4.1)
x≥0 in which n>m and b ≥ 0. A vector x satisfying Ax=b, x ≥ 0 is called a feasible solution. A feasible solution with no more than m positive components of x is called a basic feasible solution. A basic feasible solution with exactly m positive components is called a non-degenerate basic feasible solution and a basic feasible solution with strictly less than m positive components will be called a degenerate basic feasible solution. A feasible solution which also maximizes (or minimizes) Z is called an optimal solution. In the following we give a few theorems needed for the development of the Simplex procedure for solving an LP problem in the next section. Theorem 2.4.1: Associated with every set of k linearly independent columns of A there is a basic feasible solution to ݔ ܣൌ ܾǡ ݔ Ͳ
ൈ
(2.4.2)
Proof: We may assume that the first k rows of A are independent. (Since
Linear Programming
77
otherwise the rows may be rearranged). If we select any k independent columns from A, say Ak, the variables associated with these columns can be put in terms of the remaining n-k variables: Akxk = bk - An-k xn-k
(2.4.3)
where ܾଵ ݔ ܾ ൌ ൭ ǣ ൱ ǡ ܣൌ ሺܣ ǡ ܣି ሻǡ ݔൌ ൬ ൰ ݔି ܾ We can assign arbitrary values to the variables in xn-k to obtain a solution to (2.4.3). If we set xn-k = 0, we obtain xk = Ak-1 bk ݔ as Ak is non-singular. Thus we obtain a solution ݔൌ ൬ Ͳ ൰of (2.4.2) with only k components positive. Note: Since there are ncm sets (at most) of m linearly independent vectors from the given set n, the value ncm is an upper bound to the possible basic solutions. To obtain all these solutions, every set of m columns from A should be independent, i.e. every Am should be non-singular. The ncm solutions so obtained may not all be non-degenerate. All the solutions will be non-degenerate if every set of m columns from (A, b) is independent. A given basic solution xm=Am-1 b will be non-degenerate if every set of (m-1) columns from Am augmented by b is independent. Theorem 2.4.2: The set of all solutions to an LPP is a convex set. Proof: The theorem holds obviously if there is a unique feasible solution. Assume there are two feasible solutions x1 and x2. We have Ax1=b, Ax2= b, x1≥ 0, x2≥ 0. Let x = λ x1+ (1- λ) x2 for 0≤ λ≤1. x is a convex linear combination (c.l.c.) of x1 and x2. We note that Ax = λAx1+ (1-λ) Ax2 = λb+(1-λ)b= b , and also x ≥ 0,
78
Chapter 2
showing that x is also a feasible solution. Theorem 2.4.3: The objective function of an LPP assumes its maximum (or minimum) at an extreme point of the convex set S of feasible solutions. If it assumes its maximum at more than one extreme point then it takes on the same value for every (clc.) of these particular points. Proof: Assume that S is neither void nor unbounded in any direction (but it is a convex polyhedron). Let x1,…, xp be the extreme points of S, and let x* be the maximum solution, i.e. f(x*)≥f(x) for all xאS. If x* is an extreme point, the first part of the theorem is proved. Suppose x* is not an extreme point. Then there exist αi, i= 1,…, p such that
αi ≥ 0, σୀଵ
αi =1 and x* = σୀଵ
f(x*) = f(σୀଵ
αixi) = σୀଵ
αi xi. Therefore,
αif(xi) (as f is linear).
Let f (ݔො m) = max f(xi). Then f(x*) ≤ α1f(ݔො m) +…+ αpf(ݔො m) = f(ݔො m) Since we assumed f(x*) ≥ f(x) for all xאS, we must have f(x*) = f(ݔො m). Thus x* must be an extreme point. To prove the second part, let f(x) assume its maximum at ݔො 1,…, ݔො q We will have f(ݔො 1) = …= f(ݔො q)= m, say. Now consider a (c.l.c.) of ݔො 1,…, ݔො q given by x= σଵ Then
αiݔො i, αi≥0, σଵ ߙ =1.
f(x)= f(σଵ ߙ ݔොi) = σଵ ߙ f(ݔො i)=∑αim=m, and the proof is completed. Theorem 2.4.4: Every basic feasible solution is an extreme point of the given feasible set. Proof: Let x = (x1,…, xk,0,…,0) be a basic feasible solution of S. Suppose
Linear Programming
79
that x is not an extreme point. Since x is feasible, we can write for x(1),x(2) אS, x=αx(1) + (1-α)x(2), where 00, the last n-k elements of x(1) and x(2) must equal 0; i.e. x(1)= (x1(1),…,xk(1),0,…,0) x(2)= (x1(2),…,xk(2),0,…,0). As x(1) and x(2) are feasible, we have Ax(1)=b i.e. a1x1(1)+…+akxk(1)= b and Ax(2)=b i.e. a1x1(2)+…+akxk(2)= b, where (a1,…,ak,…,an) are the columns of A. Subtracting the two equations we get a1(x1(1)-x1(2)) +…+ak(xk(1)-xk(2))=0. But from theorem (2.4.1), a1,…,ak are independent (as by assumption (x1,…,xk,0,…,0) is a basic solution). It follows that xi(1)-xi(2)=0, i=1,…,k. Thus x cannot be expressed as a convex linear combination of two distinct points of S and must, therefore, be an extreme point. Theorem 2.4.5: Every extreme point of S is a basic feasible solution. Proof: Let x be an extreme point and let the first k components x1,…,xk of x be positive. Then σୀଵ
xiai = b
(i)
Suppose x = (x1,…, xk, 0,…, 0) ′ is not basic. This implies that a1,…,ak are dependent, i.e. there exist d1,…, dk, not all zero,
80
Chapter 2
such that d1a1+…+dkak = 0
(ii)
For some d>0, multiply (ii) by d and add and subtract the result from (i) to obtain the two equations σଵ
xiai + d.σଵ
diai=b
σଵ
xiai – d.σଵ
diai=b
We than have two solutions to (i): x(1)= (x1+dd1,…,xk+ddk,0,…,0) x(2)= (x1-dd1,…,xk-ddk,0,…,0) We can choose d small enough to make all the components of x(1) and x(2) positive. Thus x(1) and x(2) are feasible. But x= (x(1)+x(2))/2, which contradicts the hypothesis that x is an extreme point of S. Thus a1,…, am are independent. Since no more than m vectors can be independent, it follows that the solution associated with them is basic. Exercise 2.4.1: The feasible region of an LPP is defined by the inequalities 2x1-x2≤4, x1-x2≥ -3, 3x1-2x2≤6, x1, x2≥0. Find a basic feasible solution in which x1 is in the basis. Solution: By introducing slack variables, the inequalities get transformed into the following set of equations: 2x1 - x2 + x3
=4
- x1+ x2
=3
+ x4
3x1 - 2x2
+x5
=6
An extreme point solution (initial) is obtained as x3=4, x4=3, x5=6, x1=x2=0. If we denote the vectors of the coefficient matrix by a1,…,a5 and of the R.H.S by a0, the above initial solution is expressed as 4a3+3a4 +6a5=a0
(i)
We wish to introduce a1 to obtain another extreme point solution. The representation of a1 in terms of the basis vectors is
Linear Programming
81
2a3-a4+3a5=a1
(ii)
Multiplying (ii) by θ and subtracting from (i), we get (4-2θ)a3+(3+θ)a4+(6-3θ)a5+θa1=a0 Choosing θ= min (4/2,6/3)=2, we get 5a4+2a1=a0. Or 2a1+0a2+0a3+5a4+0a5=a0 Comparing with the initial set of equations σହଵ ajxj=a0, the required basic solution is obtained as x1=2, x4=5, x2=x3=x5=0. In a similar manner we may obtain the basic feasible solutions corresponding to the extreme points consisting of only the variable x2 and of both the variables. 2.5 BASIC AND NON-BASIC VARIABLES Consider the LPP in the form Max Z= c′x s.t. Ax=b, x ≥0,
(2.5.1)
where m=0.
Linear Programming
83
x2 x5=0
x1=0
x6=0
7 e1 e2
x3=0
S e3
x2=0
e4
e5
x1
x4=0
Fig 2.4.. The constraints of the Problem P 1
There are ቀ ቁ =15 possible ways to choose such matrices B in A. Only five Ͷ of them correspond to a vertex of the feasible region. These five vertices are determined by taking two non-basic variables equal to zero as follows: Non-basic variables
Basic variables
e1:
(x1=0, x3=0)
(x2, x4, x5, x6)
e2:
(x4=0, x3=0)
(x1, x2, x5, x6)
e3:
(x4=0, x5=0)
(x1, x2, x3, x4)
e4:
(x2=0, x5=0)
(x1, x3, x4, x5)
e5:
(x2=0, x1=0)
(x3, x4, x5, x6)
For instance, e2 is the intersection of the lines x1+x2=8 and x1+3x2=18, which corresponds to slack variables x4 and x3. A non-feasible vertex can be seen, for example, when the non-basic variables are taken as x5 and x6. This is the point of intersection of lines x5=0 and x6=0. Then the basic variables are x1, x2, x3, x4, with the corresponding basis as
84
ͳ ͳ ܤൌ൮ ͳ Ͳ
Chapter 2
͵ ͳ Ͳ ͳ
ͳ Ͳ Ͳ Ͳ
Ͳ ͳ ൲ with Ͳ Ͳ
B-1b = ൮ ൲ െͻ െͷ
This gives the basic solution (6, 7, -9, -5, 0, 0), which is infeasible. It is worth noting that the vertices that are adjacent on the boundary of the feasible region differ in precisely one non-basic variable. For instance, the adjacent vertices e1 and e2 share the zero entry x3 but differ in the remaining non-basic entry (x1and x4). The non-adjacent vertices e1 and e4 differ in two non-basic entries. These observations are crucial in the development of the Simplex method in the next section. 2.6 THE SIMPLEX METHOD We start with the LP problem given in the form (2.4.1) Max Z = c′x s.t. ܣx=b, ൈ
x≥0
(2.6.1) (2.6.2) (2.6.3)
It can be assumed that the matrix A= (a1,...,an) always contains a set of m independents vectors. For if this is not the case, the original set of vectors is augmented by a set of m independent vectors, and then we seek a solution to the extended problem. As discussed in the previous section there are only ncm different extreme points (at most) that require investigation. The Simplex method starts by finding an (initial) extreme point of the feasible region and continues by choosing a vertex adjacent to it. In each iteration (step), the Simplex method finds, except for degeneracy, a vertex adjacent to the previous one at which the value of the objective Z is improved. The method stops when the objective value cannot be improved by choosing a new adjacent vertex or there is an indication of an unbounded solution. Suppose that x°=(x1°,... ,xm°,0,... ,0) ′ is an extreme point, i.e. a basic feasible solution. Associated with this basic feasible solution there is a set of independent vectors, say a1,...,am such that a1x1°+…+ amxm° = b
(2.6.4)
Linear Programming
85
where xi° ≥ 0, i=1,…,m. The value of the objective function at this point is c1 x1°+…+cmxm° = Z(°) say
(2.6.5)
Since a1,...,am are independent, we can express any vector from the set a1,...,an in terms of a1,...,am . Let a1x1j +…+ amxmj = aj, j=1,…,n,
(2.6.6)
and define c1x1j +... +cmxmj=Zj, j=1 ,...,n.
(2.6.7)
Theorem 2.6.1: If zj – cj < 0 for some j, then we can find x(1) at which value of the objective function Z(x(1)) >Z(o) Proof: Multiply (2.6.6) by some number θ>0 and subtract from (2.6.4), and similarly multiply (2.6.7) by the same θ and subtract from (2.6.5) for j=1,...,n to get (x1o- θx1j)a1+... +(xmo- θxmj)am+ θaj =b
(2.6.8)
(x1o- θx1j)c1+... +(xmo- θxmj)cm+ θcj=Z(o)- θ(zj-cj)
(2.6.9)
It is possible to choose θ>0 so small that all the coefficients (Xio- θxij), i=1,... ,m are non-negative. Two cases may arise. Case 1: For fixed j, at least one xij>0 for i=1,... ,m. In this case the largest value of θ for which all the coefficients in (2.6.8) remain non-negative is given by θ0= ݉݅݊ ቀ
௫ ௫ ೕ
ቁ Ͳǡ for all xij >0.
(2.6.10)
Let us assume here that all the basic solutions of the problem have exactly m positive elements. This is the assumption of non-degeneracy. (The degenerate case will be considered later). Under this assumption, the minimum in (2.6.10) will be attained for a unique i. We have thus constructed a new basic feasible solution by substituting this θ o in (2.6.8) and (2.6.9) at which the value of the objective function is
86
Chapter 2
Z(1) =Z(o) - θ(zj-cj) which is >Z(o),since(zj-cj)0, or of the form ai1x1+... +ainxn ≤ bj with bj < 0, we find the initial basic feasible solution by introducing artificial variables into the problem as explained below. Suppose we have an LP problem in the form Maximize Z =c′x′, s.t. Ax = b′ , x ≥ 0 , b ≥ 0 ,
(2.9)
in which the initial basic feasible solution is not available trivially. There are two procedures for obtaining the initial basic solution, in both of which the introduced artificial variables are all first eliminated: the Big M procedure and the Two Phase procedure. 2.9.1. The Big M Procedure Augment the problem (2.9) as follows: Maximize c1x1+...+cnxn- Mxn+1-…-Mxn+m s.t. ai1x1+...+ainxn+xn+j= bj, i=1,...,m and xj ≥ 0, j=1,... ,n+m. where M is a large positive number. The variables xn+j, i=1,..., m introduced into the problem are called artificial variables. A starting solution of the augmented problem is x1=... =xn=0, xn+i=bi, i=1 ,..,m
Linear Programming
99
Since xn+j, i=1,..., m have large negative objective coefficients, the Simplex algorithm applied to the above maximization LP problem will drive all these artificial variables out of the set of basic variables. To shorten the calculations, once an artificial variable leaves the basis (and therefore becomes zero), it need not be selected to re-enter the basis and so the corresponding column can be deleted. If the Simplex algorithm terminates with at least one artificial variable in the final solution, then the original problem is not feasible. Theorem 2.6.2 tells us that there are no other feasible solutions whose value of the objective function is larger than the final solution. Another possibility of terminating with the presence of an artificial variable is that M was not chosen large (big) enough. The above method of finding an initial solution is called the Big M procedure. A drawback of the Big M procedure is the possible computational error resulting from assigning a very large value to M. To illustrate this, let M=99 999. Consider for instance the LP problem Max 3x1 + 2x2 + x3 - x4 s.t.
3x1 + 2x2 + x3
=30
5x1 + x2 + 2 x3
=40
x1 + 2x2 +
=20
x3 + x4
(2.9.1)
x1, . . . , x4 ≥ 0 As one unit vector is already present in the coefficient matrix, we need to introduce only two artificial variables y1 and y2. The problem becomes Max 3x1+2x2+x3-x4-My1-My2 s.t.
3x1+2x2+ x3
+y1
=30
5x1+ x2+2x3
+y2
=40
x1+2 x2+ x3 + x4
=20
x1, x2, x3, x4,y1 ,y2 ≥ 0 The starting solution is y1=30,y2=40,x4=20,x1=0,x2=0,x3=0.
100
Chapter 2
Expressing the objective function in terms of the non-basic variables x1, x2, x3, yields Z = -70M-20+ (4+8M)x1+(4+3M)x2+(2+3M)x3. The influence of the original coefficients 3, 2 and 1 in (4+8M), (4+3M), (2+3M) is now very small relative to the multiples of M. While doing computations, the computer, having rounding errors inherent in the representation of real numbers, may become insensitive to the original objective coefficients 3, 2 and 1. Consequently, x1, x2 and x3 may be treated as having zero coefficients in the objective function. The Two Phase procedure discussed in the next section alleviates the difficulty of the too large M in the Big M procedure. 2.9.2. The Two Phase Procedure After introducing the artificial variables in problem (2.9), let the problem to be considered be
Maximize Z= c1x1+...+c1nxn-Mxn+1-…-Mxn+m s.t. a11x1 +
am1x1+
...
+anxn
+xn+1 =b1
:
:
:
: ...
+amnxn
+xn+m =bm
and xj≥0 , j=1 ,... ,n,n+1,n+m Writing the problem in tableau form as in Fig. 2.7.1, we obtain Fig. 2.9.1.
Linear Programming
101
The zj-cj values for this initial solution are given by zj-cj=c′BB-1aj-cj= ൜
െ ܯσ ୀଵ ܽ ǡ ݂ ݆ݎൌ ͳǡ Ǥ Ǥ Ǥ ǡ ݊ ൠ െ ܯെ ሺെܯሻ ൌ Ͳǡ ݂ ݆ݎൌ ݊ ͳǡ Ǥ Ǥ Ǥ ǡ ݊ ݉
and Zo= - Mσ ୀଵ ܾ In this tableau we write the zj-cj values in two rows designated by 0′ and 0′′. Note that we have placed the M-component and non-M component of zj-cj, for each j, in the 0′th and 0′′th rows respectively of that column. The variable to be introduced into the basis will correspond to the most negative element in the 0′th row (since the part of zj-cj in the 0′′th row in each column is negligible as compared to that in 0′th row). The variable to be eliminated from the basis is chosen in the usual manner. However, once an artificial variable is eliminated from the basis, it should never be selected to re-enter the basis. Hence we do not have to transform the last m columns of the tableau, which all correspond to the artificial variables. (If we are interested in the inverse of the final basis, then these last columns should be transformed as usual.) We continue to select a variable to be introduced into the basis, using the elements in 0′th row as criterion, until either (i) all the artificial variables are eliminated from the basis, in which case all the elements in 0′th row are equal to 0, and we have obtained a basic feasible
102
Chapter 2
solution to the original problem. (It should be noted that even if there are artificial variables in the basis, an iteration may not eliminate one of them.) The second alternative is that (ii) no negative 0′th element exists, in which case if the Z0 element in the 0′th row is negative (i.e. the artificial part of the corresponding value of the objective function is < 0) then the original problem is not feasible. Theorem 2.6.2 tells us that there are no other feasible solutions whose value of the objective function is larger than this final solution. With no negative element in the 0′th row, if the Zo part is equal to 0 then we have a degenerate feasible solution to the original problem which contains at least one artificial variable with value 0. phase I of the procedure is now complete. We have not, however, reached the maximum feasible solution. In phase II we continue the iterations by introducing a variable that corresponds to the most negative element in the 0′′th row which is below a zero element in the 0′th row. This criterion is used until the maximum solution has been reached, i.e. until there is no more negative element in the 0′′th row below a zero in the 0′th row. The final solution may or may not contain artificial variables with values equal to zero. We will illustrate the Two Phase procedure by means of the example of Section 2.9.1. After introducing the artificial variables y1and y2, the problem is Max Z=3x1+2x2+x3-x4-My1-My2 s.t.
3x1+2x2+x3+y1 5x1 +x2+2x3
=30 +y2
x1 +2x2+x3
=40 +x4
(2.9.2)
=20
x1 x2,x3,x4,y1,y2 ≥ 0 Writing the problem in tableau form as in Fig. 2.9.1, we get the tableau 2.9.2. For the next iteration the entering variable ݔଵ corresponds to the most negative M- component of zj-cj.
Linear Programming
103
Table 2.9.2: Initial tableau for the numerical problem
Table 2.9.3: First iteration of the problem
As y2 is entered into the basis (i.e. it leaves the set of non-basic variables), it should not be selected to re-enter the basis, so the column corresponding to y2 is thus deleted from further transformations. We find that in the second iteration, all the M components of zj- cj are 0. So we have obtained a feasible solution.
Table 2.9.4: Second iteration
104
Chapter 2
For the next iteration we choose the most negative element in the 0′′th row.
Table 2.9.5: Third and final iteration
The optimum solution is x1=x2=x3=5, x4=0. The corresponding objective function value is Z=30. 2.9.3 Condensed Tableau Representation The columns for the basic variables in a Simplex tableau are the unit vectors, and these can be eliminated with no danger of ambiguity arising. In condensed tableau representation, the leaving basic variable is placed in the position of entering non-basic variable. The transformations corresponding to the elements in the pivotal column are done by the following rule at each iteration: “The new elements for the pivotal column are obtained by simply dividing the elements of the pivotal column by the negative of this pivotal element. The entry for the pivotal element is just the reciprocal of the pivotal element. The transformations in the columns corresponding to non-basic variables are done as earlier.”
Consider for instance the table 2.9.4 corresponding to the first feasible solution of the numerical example (2.9.2)
Linear Programming
105
Note that the only non-basic variable is x3. The leaving basic variable x4 is placed in the position of the entering variable x3. The final tableau is written as Z0 zj-cj
x4
30
1
x2
5
1/6
x1
5
-1/2
x3
5
7/6
basis
2.10 REVISED SIMPLEX METHOD In the revised Simplex method, the number of arithmetic operations when executing the algorithm is kept to a minimum by updating at every iteration step only those entries of the Simplex tableau which are required for determining the entering and the leaving basic variables. The computational savings are more when the number of variables is large compared to the number of constraints and when the technology matrix contains many zeros. Consider the LP problem in the form Max c′x s.t. Ax=b x≥0 with A an (m, n)-matrix. Let us partition the matrix A as A= (B, N), where B is the basis matrix. The constraints are written as: BxB + NxN=b, where x′ = (x′B, x′N). A basic feasible solution is given by xB=B-1b, xB ≥ 0. The corresponding partition of the vector c is c′= (c′B, c′N). The vector c′BB-1 is called the pricing vector. Now to determine the entering basic
106
Chapter 2
variable, we require the quantities zj = c′BB-1aj , j=1,...,n, where aj are the columns of A. The index k of the entering non-basic variable corresponds to the most negative zj-cj. The index l of the leaving basic variable corresponds to the minimum of the ratios ൫షభ ൯
൜ሺ షభ
ேሻ வ
ൠ, i=1,…,m
In order to obtain the above information it is unnecessary to transform all of the B-1aj, xB, zj-cj, and Z0 at each iteration. The required quantities can be computed directly from their definition if B-1 is known, i.e. only if the basis matrix is transformed. Due to this observation, in the revised Simplex method only B-1 is transformed at each iteration. Also B-1b (= xb), c′BB-1 and c′Bb (=Z) are transformed at each iteration since this requires less computational effort than their computation by definition. It follows that there is no need to update and store the complete Simplex tableau (consisting of m+1 rows and n+1 columns) at each iteration. Only the updating of a tableau consisting of m+1 rows and m+1 columns is required. Below we compare the entries of the standard Simplex tableau (as given in Fig. 2.7.1) and those of the revised Simplex tableau. c′BB-1N-c′N
0′
cBB-1b
cBB-1
cBB-1b
B-1N
B-1B=I
B-1b
B-1
B-1b
Standard Simplex tableau
Revised Simplex tableau
In the revised Simplex procedure, the amount of new information the computer is required to record at each iteration is that of the inverse and the solution vector only. The recording is further reduced if the method of the product form of the inverse is used. 2.10.1. Using the Product Form of the Inverse Let us first see how the inverse of a new basis at each iteration can be obtained from the preceding basis by the application of elimination formulas. Let the m columns of a basis matrix B be given by a1,…, am. Denote
Linear Programming
107
B = (a1,…,al-1,al,al+1,….,am). Let us replace al by ak and denote the new matrix by B*= (a1,…,al-1,ak,al+1,….,am). Define
ݔଵ x =ቌ ڭቍ= B-1aj , j=1,…,n. ݔ j
We then have
B-1B= B-1(a1,…,al,,….,am) =
ͳ ǣ ۇ Ͳ ۈ Ͳۈ Ͳۈ ǣ ͳۉ
ǤǤǤ Ͳ ǣ ǤǤǤ ͳ ǤǤǤ Ͳ ǤǤǤ Ͳ ǣ ǤǤǤ ͳ
Ͳ ǣ Ͳ ͳ Ͳ ǣ Ͳ
Ͳ ǣ Ͳ Ͳ ͳ ǣ Ͳ
ǤǤǤ
ݔଵ ǣ ݔିଵ ݔ ݔାଵ ǣ Ͳ
Ͳ ǣ Ͳ Ͳ ͳ ǣ Ͳ
ǤǤǤ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
Ͳ ǣ ۊ Ͳ ۋ Ͳۋ Ͳۋ ǣ Ͳی
and ͳ ۇǣ Ͳۈ -1 -1 B B*=B (a1,…,ak,….,am) = Ͳۈ ۈ Ͳۈ ǣ ͳۉ -1
Let B = (bij) and B
=(bij*).
*-1
Then
bij*
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
Ͳ ǣ ͳ Ͳ Ͳ ǣ Ͳ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
Ͳ ǣۊ Ͳۋ Ͳۋ ۋ Ͳۋ ǣ ͳی
are given by the elimination formula
bij* = bij-(blj/xlk) × xik for ݅ ് ͳ
(1)
blj* = blj/xlk The validity of this transformation can be verified by direct multiplication of (B*-1B*). For example, the inner product of the first row of B *-1 and the first column of B* is given by b1*′ a1 = (b11-(bl1/xlk) × x1k) a11 +ڮ+ (b1m-(blm/xlk) × xlk)am1 = (b11a11+ڮ+b1ma1m) - x1k/xlk (bl1a11+ڮ+blmam1)
108
Chapter 2
The first term in this expression is the inner product of the first row of B -1 and first column of B, which is equal to 1.The second term contains the inner product of lth row of B-1 and a1, which is 0. Hence, b1*′ a1=1. In a similar way we can verify the other terms and find that B*-1 B*= I and hence the formulas (1) do generate B*. This procedure of obtaining the inverse of the current basis, in which only one column is changed from the previous basis, uses the idea of the so-called product form of the inverse. 2.10.2 Computational Format Consider the LP problem (2.6.1) to (2.6.3). An equivalent form of this problem is Max xn+m+1
(2.10.1)
ݏǤ ܽݐଵଵ ݔଵ ڮ ܽଵ ݔ ݔାଵ ൌ ܾଵ ڭڭڭڭڭ ൢ ܽଵ ݔଵ ڮ ܽ ݔ ݔା ൌ ܾ ܽାଵଵ ݔଵ ڮ ܽାଵ ݔ ݔାାଵ ൌ Ͳ
(2.10.2)
x1, ڮ,xn+m+1 ≥ 0
(2.10.3)
where am+1j = -cj , j=1,…,n and xn+1,…,xn+m are artificial variables. Note that xn+m+1 is not restricted in sign. If the starting basis consists of some artificial variables then as usual the computations are done in two phases. In phase I we find a first basic feasible solution (if one exists) and then in phase II we find the optimal solution. For ease of computation in phase I, we introduce a redundant equation: am+21x1+ڮ+am+2nxn+xn+m+2=bm+2
(2.10.4)
where am+2j=െ σ ୀଵ ܽ , j=1,…,n; bm+2 =σୀଵ ܾ
It follows from (2.10.2) and (2.10.4) that for the artificial variables we have xn+1+…..+xn+m= -xn+m+2
(2.10.5)
The redundant variable xn+m+2 is the negative of the sum of the artificial variables. Since xn+i ≥ 0, i=1,…, m, it is clear that for any approximate non-negative solution, xn+m+2 cannot be positive.
Linear Programming
109
The problem in revised form consists of m+2 equations ((2.10.2) and (2.10.4)) in n+m+2 variables. Thus a basic feasible solution will contain m+2 variables from the set (x1,….,xn, xn+m+1, xn+m+2). The signs of the last two variables are unrestricted, and these will always be in the solution. The variables from the set (x1,…, xn) that are in the optimal solution represent a corresponding optimal solution to the problem (2.6.1) to (2.6.3), with xn+m+1=c1x1+…+c1xn as the corresponding value of the objective function. Let us arrange the coefficients from (2.10.2) and (2.10.4) in an (m+2) × n matrix as follows: ܽͳͳ ۍǣ ܽ ێ ͳ ێ כܣൌ ێǣ ͳܽ ێ ܽێାͳǡͳ ܽۏାʹǡͳ
ǤǤǤ
ܽͳ ǣ ܽ ǣ ܽ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
ܽାͳǡ ܽାʹǡ
ܽͳ ǣ ܽ ǣ ܽ
ې ۑ ۑ כ כ ۑൌ ሺܽଵ ǡǤǤǤǡܽ ሻǡǤ ۑ ܽାͳǡ ۑ ܽାʹǡ ے
It may be noted that the columns in the matrix A* are just the first n columns of the original artificial basis tableau except that rows m+1 and m+2 of A* are the non-M and M parts of the zj-cj values. Since our starting basis consists of an identity matrix, its inverse is also an identity matrix. The inverse of the starting basis along with two more rows and two columns are recorded as ͳ ۍǣ ێ ܷ ൌ Ͳێ Ͳێ Ͳۏ
ǤǤǤ ǤǤǤ ǤǤǤ ǤǤǤ
Ͳ ǣ ͳ Ͳ Ͳ
Ͳ ǣ Ͳ ͳ Ͳ
′
ݑ Ͳ ۍଵ ې ǣ ێ ېǣ′ ۑ ۑ Ͳ ۑൌ ݑ ێ ۑǡǤ ۑ ′ Ͳݑێێ ۑାଵ ۑ ′ ͳݑۏ ےାଶ ے
We arrange below in tabular form the matrix U and the initial solution: (xn+i=bi, i=1,….,m,xn+m+1=0 and xn+m+2=bm+2)
110
Chapter 2
i
variables in the solution
values of the variables
vik
1
xn+1
b1=u1,0
u’1a*k
:
:
:
:
m
xn+m
bm=um,0
m+1
xn+m+1
0=um+1,0
:
m+2
xn+m+2
Σbi=um+2,0
u’m+2a*k
U
vik=u’ia*k
:
Table 2.10.1 Initial tableau for revised Simplex technique
2.10.3. Algorithm Step 1: If xn+m+2 0 implies a′jy* = cj, j=1,…, n a′jy* > cj implies x*j = 0, j=1,…,
n
The four relations must hold true for every pair of optimal solutions. However, it may happen in some cases that the strict inequalities in the above relations are also equalities, i.e. both yi*= 0 and a′i x*= bi are true, or both a′j y*= cj and x*j = 0 are true. The theorem of strong complementary slackness stated below stresses that there exists at least one pair of optimal solutions for which both yi* = 0 and a′ix* = bi cannot hold together and similarly both a′jy*= cj and x*j = 0 cannot hold together. Theorem of strong complementary slackness: Let both primal and dual programs, PI and PII, have feasible solutions. Then there exists at least one pair of optimal solutions x* and y* which satisfy y* + (Ax* - b) > 0 and x* + (c - A′y*) > 0 This theorem can be proved by using a very strong result of the system of inequalities called Farka’s Lemma. Farka’s lemma: If A is an m×n matrix and c is an n- vector then exactly one of the two systems below has a solution
Linear Programming
121
Ax ≤ 0, c′x ≥ 0
(i)
A′y = c, y ≥ 0
(ii)
Proof of Farka’s lemma: Consider the following set of primal and dual problems: Max c′x s.t. Ax ≤ 0
PI
Min 0′y s.t. A′y = c, y ≥ 0
PII
First assume that the system (ii) has a solution, say y*, i.e. A′y* = c, y* ≥ 0. Then y* is also an optimal solution of PII as all its feasible points have the objective value 0. If x* is an optimal solution of PI then from the duality theorem we should have c′x* = 0. Since the maximum objective value on the feasible region of PI is zero, it follows that the system (i) has no solution. Now assume that the system (ii) has no solution. Then PII is infeasible. On the other hand PI is feasible, since 0 satisfies its constraints. However, for PI to have an optimal solution, PII should have been feasible. But as PII is infeasible, c′x is unbounded above on ൻݔห ݔܣ Ͳൿ so that c′x* > 0 for some x* with A x* ≤ 0. Hence the system (i) has a solution.
2.12 DUAL SIMPLEX METHOD 2.12.1. Introduction In a maximization problem, we know that a basic feasible solution is optimal if all the zj – cj = c′BB-1aj – cj values are ≥ 0. However, every basic solution with all zj – cj ≥ 0 may not be feasible (since zj – cj are independent of the right hand side vector b for all j). The problems for which a starting basic (infeasible) solution with all zj – cj ≥ 0 is readily available can be easily solved by the so called Dual Simplex method. Consider an LPP in the form Min z = b′y s.t. A′y ≥ c, y ≥ 0, with b ≥ 0. Call this problem PI. The dual to PI given by Max c′x s.t. Ax ≤ b, x ≥ 0 is feasible (as b ≥ 0, a trivial solution is s=b)
122
Chapter 2
PI may be written as Max z = (-b′)y s.t. –A′y+ Is = -c, (y, s) ≥ 0 PI in tableau form is presented in the following table:
The linear programming model to be solved by the Dual Simplex method is recognized by first expressing the constraints in the form (≤) and the objective function in the maximization form. After adding the slack variables and putting the problem in tableau form, if any of the right hand side elements is negative and if all the zj – cj are ≥ 0 (i.e. the optimality condition is satisfied) then the problem is solved by the Dual Simplex method. The only difference between the regular Simplex method and the Dual Simplex method lies in the criterion used for selecting a variable to enter the basis and to leave the basis. For the Dual Simplex method we write the problem in the primal form, but the criteria used for inserting and leaving variable are those for its dual. This is the reason why this method is called the Dual Simplex method. Also note that in the Dual Simplex method, we first determine the variable to leave the basis and then the variable to enter the basis. 2.12.2. Algorithm Step 0: Find an initial solution to the primal problem PI We know that if a starting feasible solution to an LPP is not available we use artificial variables. If the given LP problem is such that a feasible solution to its dual is available, we can find a solution to the primal by solving the dual. The initial solution for the application of the Dual Simplex method to PI requires that for this solution all zj – cj ≥0. If the slack and/or surplus
Linear Programming
123
variables are introduced in each constraint then the conditions zj – cj ≥ 0 are equivalent to -cj ≥ 0 or cj ≤ 0 for all j. So for the maximization problem in which all the cj are ≤ 0, we can easily write an initial basic (infeasible and optimal) solution. For the problems in which all the cj are not ≤ 0, it becomes difficult to find an initial basic solution with all zj – cj ≥ 0. Step1: Find the leaving and entering variables We remove the variable xr from the basis for which (sr) = xBr= ݉݅݊ ൻݔ หݔ ൏ Ͳൿ
(2.12.1)
(this is indeed the criterion by which we enter the variable into the basis for the dual of the PI problem). Thus the basic variable having the most negative value leaves the basis. If all the basic variables are non-negative, the process ends and the (feasible) optimal solution is reached. The variable xk entering into the basis is determined by ሺݖ െ ܿ ሻȀ ൌ ሺܾ Ȁ ሻ ൌ ൛൫ܾ Ȁ ൯ȁܽ ൏ Ͳൟ
(2.12.2)
(This is indeed the criterion by which the variable x k leaves the basis, if we were solving the dual problem to PI). If all arj ≥ 0, there is no feasible solution to the problem, so halt. Step 2: Construct the transformed table Perform an iteration using row operations transforming –ark to unity and – aik (i ≠ r) to zero. Rearrange the variables in the current tableau for PI so that y are the non-basic variables, b is the vector of the residual costs (zj - cj), –c is the vector of the values of the new basic variables and –A is the updated matrix of coefficients. The formulas used to transform from one tableau to the next are indeed those used for the Simplex method. The difference in the algorithm is manifested in the selection of the variables to enter and leave the basis. 2.12.3 Numerical Examples Example 1: Let us use the Dual Simplex method for solving the following problem:
124
Chapter 2
Max -3x1-x2 s.t.
2x1+3x2 ≥2 x1 +x2 ≥ 1 x1, x2 ≥ 0
Multiplying the constraints by -1and then adding the slack variables, the problem is transformed to Max Z = -3x1-x2-0x3-0x4 s.t.
-2x1-3x2+x3 -x1 -x2
= -2 +x4
= -1
x1,…,x4 ≥ 0 As cj ≤ 0for all j, we can represent the problem in Dual Simplex tableau form as follows:
Iteration 1 Step I: The leaving basic variable is determined by choosing r such that xBr = ሺݔ Τݔ ൏ Ͳሻ
= Min (x3, x4) = Min (-2, -1) = -2 So x3 is removed from the basis. The entering basic variable is determined by choosing k given by (2.12.2): ሺݖ െ ܿ ሻȀ = Max (-3/2, -1/3)= -1/3 , so x2 enters the basis.
Linear Programming
125
The pivot element is -3. Step II: The transformed table is obtained as below
he solution is still infeasible but it remains optimal. So we proceed to the next iteration. Iteration2 Step I: The leaving basic variable is clearly x4. The entering basic variable corresponds to െͳ ͳ െͳ ൬ ൘൬ ൰ ǡ ൘൬ ൰൰ ൌ ݉ܽݔሺ െ ǡ െͳሻ ൌ െͳ ͵ ͵ ͵ ͵ So x3 enters the basis and the pivot element is -1/3 of the second row and third column. Step II: The transformed table is obtained as below. xB
a1
a2
a3
a4
zj-cj
-1
2
0
0
1
x2
1
1
1
0
-1
x3
1
1
0
1
-3
Now the solution has become feasible, and hence it is the required optimal solution given by x1= 0, x2= 1, x3= 1 and x4= 0 with the objective function value -1.
126
Chapter 2
Example 2 : Min x1+x2+x3 s.t x1-x2+x3
≥2
- x1-x2+x3
≤1
x1,x2,x3 ≥ 0 The problem is equivalent to (in primal form) Max Z = - x1-x2-x3 s.t.
-x1+x2-x3+x4 -x1-x2+x3 x1,…,x5
= -2 +x5
= 1 ≥0
The initial tableau for the Dual Simplex method is constructed below.
x4 leaves the basis and x1 enters. The pivot element -1 corresponds to the first row and first column. The transformed table is as given below.
The solution, being feasible, becomes optimal with x1 =2 and Z = -2. Note that the solution to the dual problem to PI is y1 =1, y2= 0 which is read from the zj – cj values in the final tableau corresponding to the initial basic variables x4 and x5.
Linear Programming
127
2.13 SENSITIVITY ANALYSIS In general, the parameters of an LP problem are not known with certainty. In many instances it is very useful for management to know how sensitive the optimal solution is to perturbations of the parameter values of the LP problem under consideration. In the LP model: Max c′x s.t. Ax ≤ b, x ≥ 0, perturbation is possible in the objective coefficients (the components of c), the right hand sides (the component of b), the technological coefficients (the entries of A) and the right hand side of the non-negativities (the vector 0 in x ≥ 0). The rate at which the optimal objective value would change for small perturbations of the right hand side of a constraint is called the shadow price of that constraint. 2.13.1 Perturbing Right Hand Sides of Constraints We prove below the fact that the shadow price of a constraint (i.e. the rate of change of objective value w.r.t. its right hand side) is equal to the optimal value of the corresponding dual variable (in the case of binding constraints, the optimal solution is assumed to be non-degenerate). Theorem of Shadow Prices and Dual Decision Variable: Given an LP problem having a non-degenerate optimal solution, the shadow price of any constraint is equal to the optimal value of the corresponding dual decision variable. Proof: Consider the LP problem Max z = c′x s.t. Ax ≤ b, x ≥ 0 , where A is an m×n matrix. After adding m slack variables xs, the constraints are transformed to Ax + Ixs = b or BxB + NxN = b, where I is the m×m unit matrix and (x′, x′s) have been partitioned together as (x′B, x′N). Let B* be a basis matrix corresponding to the optimal solution xB* = B*-1b, xN* = 0. From the duality theorem in Section 2.11.2, we know that
128
Chapter 2
y*′ = c′B*B*-1 is an optimal solution of the dual problem: Min y′b s.t. A′y ≥ c, y ≥ 0 Note that we have assumed non-degeneracy at the optimal solution, so that a small change δb in the vector b keeps B* as the optimal basis matrix. (In the degenerate case, small change in the vector b may give rise to a different optimal basis matrix.) Now for a change δb in the vector b, let the corresponding change in the solution vector x*B be δxB*. Note that δxN* = 0, as it is assumed that for a small change δb in b, the matrix B* stays as the optimal basis matrix. Hence B*(xB* + δxB*) = b + δb , Or δxB* = B*-1δb. The corresponding increment in the objective function is δZ =c′B*δxB* = c′B*B*-1δb = y*′δb, which shows that the vector y* gives the sensitivity of the optimal objective value with respect to the perturbations in the vector b. Now suppose that we perturb only the ith constraint by a small change δbi while keeping δbk = 0 for k ≠ i. Then the shadow price of the constraint i is δz/δbi = y*i, where y*i is the ith component of the dual optimal vector y*, which corresponds to the ith constraint of the primal problem. It may be noted that although the constraints a′ix ≤ bi and bi – a′ix ≥ 0 are the same, the shadow price of the constraint a′ix ≤ bi is negative of the shadow price of the constraint bi – a′ix ≥ 0. We illustrate the theorem by perturbing the right hand side of a constraint in the following LP model in which two types of items are manufactured in quantities x1 and x2 (in boxes of 10 000). ܼ ൌ ʹଵ ͵ଶ ǤǤଵ ͵ଶ ͳͺሺ݅ሻ ̵̵ ݔଵ ݔଶ ͺሺሻ ݔ ۔ ሺሻ ଵ ۖ ە ݔଵ ǡଶ Ͳ ۓ ۖ
(2.13.1)
The right hand side of the constraint (ii) of P denotes the maximum number of boxes that the firm may manufacture per month. This number may be varied in the search for increasing the profit – for instance by working overtime – to the extent that the other constraints are not violated.
Linear Programming
129
If we perturb the right hand side of the constraint (ii) by a factor α, we get x1+x2 ≤ 8 + α Fig. 2.7 shows the feasible regions and the corresponding optimal solutions are shown for α = -3, -2, -1, 0, 1 and 2. The corresponding objective values for various values of α are listed in Table 2.13.1. R.H.S
Α
Solution
Z
5
-3
(0,5)
15.0
6
-2
(0,6)
18.0
7
-1
(3/2,11/2)
19.5
8
0
(3,5)
21.0
9
1
(9/2,9/2)
22.5
10
2
(6,41)
24.0
11
3
(6,4)
24.0
Table 2.13.1 Objective values
Fig 2.6 Perturbation function
130
Chapter 2
In Fig. 2.6. we draw the perturbation function of the constraint (ii), which is seen to be a piecewise linear function. Note that the increase of Z in the neighbourhood of α =0 for a small change α in the right hand side of the constraint (ii) is (3/2) α. We observe below that this rate of change of Z at α=0 is equal to the optimal value of the dual variable corresponding to the constraint (ii). The LP problem “P” after adding slack variables x3, x4, x5, is Max Z = 2x1+3x2 s.t. x1 + 3x2 +x3 x1 + x2 x1
=18 +x4
=8 +x5
=6
x1,…,x5 ≥ 0 The optimal solution is obtained in three iterations as follows:
Linear Programming
α = -3
x2
131
α = -2
x2
Opt. (0, 6) (0, 5) Opt.
Z=6
x1
x2
α = -1
x1
α=0
x2
(3/2, 11/2) Opt.
(3, 5)
Opt.
x1
x2
α=1
x1
x2
α=2
(9/2, 9/2) Opt.
Opt. (6, 4)
x1
Fig 2.7 Effect of right hand perturbations
x1
132
Chapter 2
The dual to the problem P is given by Min 18y1+8y2+6y3 s.t.
y1+y2
≥2
y1+y2
≥2
3y1+y2+y3
≥3
y1, y2, y3
≥0
We know from the duality theorem that the optimal solution to the dual is given by y* = (c′BB-1)′. The optimal basis corresponding to the variables x1, x2, x5 is
Linear Programming
ͳ ܤൌ ൭ͳ ͳ
͵ ͳ Ͳ
133
Ͳ ՚ ݔଵ Ͳ൱ ՚ ݔଶ ͳ ՚ ݔଷ
Its inverse read from the optimal solution table is given by െͳȀʹ ି ܤଵ ൌ ൭ ͳȀʹ ͳȀʹ
͵Ȁʹ െͳȀʹ െ͵Ȁʹ
Ͳ ՚ ݔଵ Ͳ൱ ՚ ݔଶ ͳ ՚ ݔଷ
Then c′Bb-1 = (2, 3, 0) B-1 = (1/2, 3/2, 0), giving y*2 = 3/2. Note that the optimal dual solution can also be read from z j – cj values in the final primal tableau, which correspond to the slack variables. The slack variable added to the second constraint is x4 , and the z4 – c4 value is 3/2. It may be noted that the shadow price of a non-binding constraint remains zero within certain limits. Let the j th constraint of an LP problem be nonbinding at the optimum. This means that the slack variable corresponding to the jth constraint is non-zero there (i.e. x*n+j > 0). From the weak complementary slackness theorem of Section 2.11.3, this implies that the optimal value of the dual variable corresponding to xn+j is zero – i.e. y*j = 0, from which we conclude that the jth constraint has zero shadow price. 2.13.2 Perturbing the Zeros of the Non-Negativities The rate at which the objective value changes for a small perturbation of the right hand side zero of a non-negativity is called the shadow cost of that non-negativity condition (also called the reduced cost). If the perturbed nonnegativity of the optimal solution is binding then the solution is assumed to be non-degenerate. The shadow cost is indeed the shadow price of the nonnegativity considered as a constraint. Similarly, the shadow cost of a nonpositivity xi ≤ 0 is the shadow price of xi ≤ 0 considered as a constraint. Also note that the shadow cost of xi ≤ 0 is equal to the negative of the shadow cost -xi ≤ 0 – i.e. of 0 ≤ xi. It was shown in the theorem of shadow prices and dual variables in Section 2.13.1 that the shadow prices of binding constraints (in case of nondegenerate optimal solution) are optimal values of the dual decision variables. In the theorem below we show that the shadow costs are the negatives of the optimal values of dual slack variables.
134
Chapter 2
Theorem of Shadow Costs and Dual Slack Variables Consider the primal LP problem Max Z = c′x s.t. Ax ≤ b, x ≥ 0
(PI)
The shadow cost of any non-negativity is equal to the negative of the optimal value of the corresponding dual slack variable. Furthermore, if the shadow cost of a non-negativity xj ≥ 0, where xj is a non-basic variable, is non-zero then it is the maximum amount by which the value of the objective coefficient cj can be reduced before xj becomes a basic variable. Proof: Let the zero in the right hand side of the non-negativity xj ≥ 0 of PI be perturbed by δxj, j=1,…,n. Hence x ≥ 0 is replaced by x ≥ δx, where δx is the n-vector (δx1,…,δxn). Substituting v = x – δx, PI is transformed into Max (c′ δx) + c′v s.t. Av ≤ b –Aδx, v ≥ 0. Note that the right hand side vector b has been perturbed by the vector – Aδx. It follows from the proof of the theorem of shadow prices that δZ = c′δx + y*′(-Aδx) =c′δx – (δx)′A′y* where y* is the optimal solution of the dual to PI. Now let ys be the corresponding vector of values of slack variables. From the constraints of the dual problem A′y* ≥ c, we have A′y* - c = y*s. Substituting A′y*′ = c+y*s in δz, we get δz = c′δx – (δx)′(c + y*s) = c′δx – (δx)′c – (δx)′y*s = – (δx)′y*s = - (δx1,…,δxn)y*s Let ݕ௦ כൌ ሺכ ݕ௦భ ǡ ڮǡ כ ݕ௦ ሻǤ Then we have ߜݖȀߜݔ ൌ െכ ݕ௦ೕ ߜݖȀߜݔ ൌ െכ ݕ௦ೕ Hence the shadow cost of xj ≥ 0 is equal to the negative of the optimal of the complementary dual variable ݕ௦ೕ .
Linear Programming
135
To illustrate the second part of the theorem, we consider the non-negativity xj ≥ 0, where xj is a non-basic variable. The corresponding dual variable ݕ௦ೕ is then basic. The shadow cost of xj is െݕ௦ೕ < 0, so that כ σ ୀଵ ܽ כ ݕ௦ ൌ ܿ ݕ௦ ൌ ܿ െ ݀ with dj>0.
Now, if we replace cj by cj – e with e 0), becomes the objective coefficient of xj, which is the negative (of ݕ௦ positive while xj has to enter the set of basis variables in order to keep the tableau optimal. It is due to this part of the theorem that the shadow costs are also called reduced costs. Note: It may be noted that the shadow costs of non-binding non-negativities is zero. 2.13.3 Perturbing the Coefficients in the Objective Function In order to study the change in solution by perturbation of the objective coefficients, we consider again the problem “P” of Section 2.13.1, viz. Max Z = 2x1+3x2 s.t. “P”
x1+3x2
≤18
(i)
x1 +x2
≤8
(ii)
x1
≤6
(iii)
x1, x2 ≥ 0 The optimal solution obtained in Table 2.13.2 is x1=3, x2=5 with objective value 21. Let us perturb the coefficient of x1 from 2 to 2+d. It is seen in Fig. 2.8 that the optimal solution (3, 5) remains unchanged for various values of d from -1 to 1. However, for d ≥ 1 the solution changes to (6, 2).
136
Chapter 2
Fig 2.8 Optimal solution and level line for problem ′P′
x2
d = -2
(3, 5)
6, 2
d=2
d = -1 d=0
d=1
X1
O
Fig 2.9 Canting level lines for different values of d
Linear Programming
137
z
30
20 10
-2
-1
0
1
2
d
Fig 2.10 Perturbation function for changes in C1
The values of the objective function for different values of d are given in Table 2.13.2.
c1
d
Solution
Z = c1x1 + 3x2
0
-2
(0,6)
18
1
-1
(3,5)
18
2
0
(3,5)
21
3
1
(3,5)
24
4
2
(6,2)
30
5
3
(6,2)
36
Table 2.13.2 Objective values for perturbations in C1
In Fig. 2.10 we draw the perturbation function for changes in C1. Again it is seen that the perturbation function of an objective coefficient is a piecewise linear function.
138
Chapter 2
2.13.4 Concavity of Perturbation Function In Section 2.13.1 we noted that for the numerical example, the perturbation function of the second constraint is concave. We will show below that in a maximization (minimization) problem, for a constraint or a non-negativity the perturbation function is always concave (convex). Consider the LP problem Max Z= c′x s.t. Ax = b, x ≥ 0. Let us perturb the ith constraint on the right hand side by α Є R. Define Z(α) = Max c′x s.t. A x = b + αei
(2.13.2)
where ei is the ith unit vector , i=1,…,n. For any α1 and α2 in R, let x*1 and x*2 be the optimal solutions of Q with respective objective values Z (α1) and Z(α2). It is known that the set of all (n-vectors) x defined by Sn = {x | Ax = ݔ b + αei} is convex. Similarly, the set of all vectors ቀ ቁdefined by ߙ ݔ ܵାଵ ൌ ቄቀ ቁ ݔܣൌ ܾ ߙ݁ ቅ is convex. ߙ כݔ כݔ Now consider the vectors ൬ ଵ ൰and ൬ ଶ ൰in the Sn+1. From convexity of Sn+1 ߙଵ ߙଶ we have כݔ כݔ for 0 ≤ λ ≤1, ߣ ൬ ଵ ൰ ሺͳ െ ߣሻ ൬ ଶ ൰ ܵ אାଵ ߙଵ ߙଶ Hence λx*1 + (1- λ) x*2 is a feasible solution of the problem Max {c′x | Ax = b + αei} with α = λ α1 + (1 - λ) α2 for each λ in [0, 1]. This implies that Z {λ α1 + (1 - λ)α2} ≥ c′{ λ x*1 + (1 - λ)x*2} = λc′x*1 + (1 - λ)c′x*2 = λZ (α1) + (1 - λ)Z(α2) This shows the concavity of Z(α). Similarly it can be shown that the function Z (α) = Min {c′x | Ax + b + αei} is convex.
Linear Programming
139
Remark: The perturbation function of an objective coefficient in a maximization (minimization) LP problem is convex (concave).This we observed in the numerical example ′P′ in Section 2.13.3. 2.14 COLUMN SIMPLEX TABLEAU It was observed in Section 2.11.1 that a strong relationship holds between the primal and dual linear programs. From this relationship, a new tableau form was developed by Beale (1954) for efficient computation of both primal and dual problems. It is known that solving the primal is more efficient if the number of variables is larger than the number of constraints, and in the opposite case the solution of the dual is more efficient. The column tableau form is applicable to both the primal and the dual Simplex methods. The feasibility and optimality conditions are applied in exactly the same manner as given earlier for both methods. The only minor difference occurs in changing the basis. In the row tableau form, the pivotal row is divided by the pivotal element αrk but in the column tableau the pivotal column is divided by the negative of the pivotal element (i.e. - αrk). This follows since in the primal problem, the starting solution is dual infeasible, i.e. zk-ck