Nonlinear Optimization: Models and Applications 9780367444150, 9781003009573


501 93 14MB

English Pages [417] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Table of Contents
Preface: Nonlinear Optimization—Models and Applications
Acknowledgments
Author
1 Introduction to Optimization Models
1.1 Introduction
1.1.1 History
1.1.2 Applications of Optimization
1.1.3 Modelling
1.2 Classifying Optimization Problems
1.3 Review of Mathematical Programming with Excel Technology
1.3.1 Excel Using the Solver
1.3.2 Examples for Integer, Mixed-Integer, and Nonlinear Optimization
1.4 Exercises
1.5 Review of the Simplex Method in Excel Using Revised Simplex
1.5.1 Steps of the Simplex Method
References and Suggested Further Reading
2 Review of Differential Calculus
2.1 Limits
2.2 Continuity
2.3 Differentiation
2.3.1 Increasing and Decreasing Functions
2.3.2 Higher Derivatives
2.4 Convex and Concave Functions
Exercises
References and Suggested Reading
3 Single-Variable Unconstrained Optimization
3.1 Introduction
3.2 Single-Variable Optimization and Basic Theory
3.3 Basic Applications of Max-Min Theory
Exercises
3.4 Applied Single-Variable Optimization Models
Exercises
Projects
References and Suggested Reading
4 Numerical Search Techniques in Single-Variable Optimization
4.1 Single-Variable Techniques
4.1.1 Unrestricted Search
4.1.2 Exhaustive Search
4.1.3 Dichotomous Search
4.1.4 Golden Section Search
4.1.5 Finding the Maximum of a Function Over an Interval with Golden Section
4.1.6 Golden Section Search with Technology
4.1.6.1 Excel Golden Search
4.1.6.2 Maple Golden Search
4.1.6.3 MATLAB Golden Search
4.1.7 Illustrious Examples with Technology
4.1.8 Fibonacci’s Search
4.1.8.1 Finding the Maximum of a Function Over an Interval with the Fibonacci Method
4.2 Interpolation with Derivatives: Newton’s Method
4.2.1 Finding the Critical Points (Roots) of a Function
4.2.2 The Basic Application
4.2.3 Newton’s Method to Find Critical Points with Technology
4.2.4 Excel: Newton’s Method
4.2.5 Maple: Newton’s Method
4.2.6 Newton’s Method for Critical Points with MATLAB
4.2.7 The Bisection Method with Derivatives
Exercises
Projects
References and Suggested Further Readings
5 Review of Multivariable Differential Calculus
5.1 Introduction: Basic Theory and Partial Differentiation
5.2 Directional Derivatives and the Gradient
Exercises
References and Suggested Reading
6 Models Using Unconstrained Optimization: Maximization and Minimization with Several Variables
6.1 Introduction
6.2 The Hessian Matrix
6.3 Unconstrained Optimization
Exercises
6.4 Eigenvalues
Exercises
Reference and Further Suggested Reading
7 Multivariate Optimization Search Techniques
7.1 Introduction
7.2 Gradient Search Methods
7.3 Examples of Gradient Search
7.4 Modified Newton’s Method
7.4.1 Modified Newton with Technology
Exercises
7.5 Comparisons of Methods
7.5.1 Maple Code for Steepest Ascent Method (See Fox and Richardson)
7.5.2 Newton’s Method for Optimization in Maple
Exercises
Projects Chapter 7
References and Suggested Reading
8 Optimization with Equality Constraints
8.1 Introduction
8.2 Equality Constraints Method of Lagrange Multipliers
8.3 Introduction and Basic Theory
8.4 Graphical Interpretation of Lagrange Multipliers
8.5 Computational Method of Lagrange Multipliers
Lagrange Method with Technology
8.6 Applications with Lagrange Multipliers
Exercises
Projects
References and Suggested Reading
9 Inequality Constraints: Necessary/Sufficient Kuhn–Tucker Conditions (KTC)
9.1 Introduction to KTC
9.2 Basic Theory of Constrained Optimization
9.2.1 Necessary and Sufficient Conditions
9.3 Geometric Interpretation of KTC
9.3.1 Spanning Cones (Optional)
9.4 Computational KTC with Maple
9.5 Modelling and Application with KTC
Exercises
Project
Manufacturing
References and Suggested Reading
10 Specialized Nonlinear Optimization Methods
10.1 Introduction
10.1.1 Numerical and Heuristic Methods
10.1.2 Technology
10.2 Method of Feasible Directions
Exercises
10.3 Quadratic Programming
Exercises
10.4 Separable Programming
10.4.1 Adjacency Assumptions
10.4.2 Linearization Property
Exercises
References and Suggested Reading
11 Dynamic Programming
11.1 Introduction: Basic Concepts and Theory
11.1.1 Characteristics of Dynamic Programming
11.1.2 Working Backwards
11.2 Continuous DP
11.3 Modelling and Applications of Continuous DP
Exercises
11.4 Models of Discrete Dynamic Programming
11.5 Modelling and Applications of Discrete DP
Exercises
References and Suggested Readings
12 Data Analysis with Regression Models, Advanced Regression Models, and Machine Learning Through Optimization
12.1 Introduction and Machine Learning
12.1.1 Machine Learning
12.1.1.1 Data Cleaning and Breakdown
12.1.1.2 Engineering
12.1.1.3 Model Fitting
12.2 The Different Curve Fitting Criterion
12.2.1 Fitting Criterion 1: Least Squares
12.2.2 Fitting Criterion 2: Minimize the Sum of the Absolute Deviations
12.2.3 Fitting Criterion 3: Chebyshev’s Criterion or Minimize the Largest Error
Exercises
12.3 Introduction to Simple Linear and Polynomial Regression
12.3.1 Excel
12.3.2 Regression in Maple
12.3.3 MATLAB
Exercises
12.4 Diagnostics in Regression
12.4.1 Example for the Common-Sense Test
12.4.1.1 Exponential Decay Example
12.4.2 Multiple Linear Regression
Exercises
12.5 Nonlinear Regression Through Optimization
12.5.1 Exponential Regression
12.5.1.1 Newton–Raphson Algorithm
12.5.2 Sine Regression Using Optimization
12.5.3 Illustrative Examples
12.5.3.1 Nonlinear Regression (Exponential Decay)
Exercises
12.6 One-Predictor Logistic and One-Predictor Poisson Regression Models
12.6.1 Logistic Regression and Poisson Regression with Technology
12.6.1.1 Logistic Regression with Technology
12.6.1.2 Simple Poisson Regression with Technology
12.6.2 Logistic Regression Illustrious Examples
12.6.3 Poisson Regression Discussion and Examples
12.6.3.1 Normality Assumption Lost
12.6.3.2 Estimates of Regression Coefficients
12.6.4 Illustrative Poisson Regression Examples
12.6.4.1 Maple
Exercises
Projects
12.7 Conclusions and Summary
References and Suggested Reading
Answers to Selected Problems
Index
Recommend Papers

Nonlinear Optimization: Models and Applications
 9780367444150, 9781003009573

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Nonlinear Optimization

Textbooks in Mathematics Series Editors: Al Boggess and Ken Rosen Mathematical Modeling with Excel Brian Albright and William P. Fox

Chromatic Graph Theory, Second Edition Gary Chartrand and Ping Zhang

Partial Differential Equations: Analytical Methods and Applications Victor Henner, Tatyana Belozerova, and Alexander Nepomnyashchy

Ordinary Differential Equations: An Introduction to The Fundamentals Kenneth B. Howell

Algebra: Groups, Rings, and Fields Louis Rowen

Differential Geometry of Manifolds, Second Edition Stephen T. Lovett

The Shape of Space, Third Edition Jeffrey R. Weeks

Differential Equations: A Modern Approach with Wavelets Steven G. Krantz

Advanced Calculus: Theory and Practice John Srdjan Petrovic

Advanced Problem Solving Using Maple™ Applied Mathematics, Operations Research, Business Analytics, and Decision Analysis William P. Fox and William C. Bauldry

Nonlinear Optimization: Models and Applications William P. Fox

https://www.crcpress.com/Textbooks-in-Mathematics/book-series/ CANDHTEXBOOMTH

Nonlinear Optimization Models and Applications

William P. Fox

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software

First edition published 2021 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN © 2021 Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright. com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@ tandf.co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 978-0-367-44415-0 (hbk) ISBN: 978-1-003-00957-3 (ebk) Typeset in Garamond by codeMantra

Dedicated to my kids: Leslie, James, and Katie, and all my grandchildren, and my wife: Hamilton Dix-Fox, and to Frank R. Giordano, my mentor and friend

Contents Preface: Nonlinear Optimization—Models and Applications.......................xiii Acknowledgments.............................................................................................xix Author................................................................................................................xxi

1 Introduction to Optimization Models....................................................1

1.1 Introduction.......................................................................................1 1.1.1 History...................................................................................2 1.1.2 Applications of Optimization.................................................3 1.1.3 Modeling...............................................................................3 1.2 Classifying Optimization Problems....................................................4 1.3 Review of Mathematical Programming with Excel Technology.........8 1.3.1 Excel Using the Solver.........................................................10 1.3.2 Examples for Integer, Mixed-Integer, and Nonlinear Optimization...............................................20 1.4 Exercises...........................................................................................25 1.5 Review of the Simplex Method in Excel Using Revised Simplex......26 1.5.1 Steps of the Simplex Method...............................................29 References and Suggested Further Reading..................................................33

2 Review of Differential Calculus............................................................35

2.1 Limits...............................................................................................35 2.2 Continuity........................................................................................39 2.3 Differentiation................................................................................. 40 2.3.1 Increasing and Decreasing Functions...................................43 2.3.2 Higher Derivatives...............................................................43 2.4 Convex and Concave Functions.......................................................43 Exercises......................................................................................................48 References and Suggested Reading..............................................................49

3 Single-Variable Unconstrained Optimization.......................................51 3.1 Introduction.....................................................................................51 3.2 Single-Variable Optimization and Basic Theory................................52 3.3 Basic Applications of Max-Min Theory............................................54 Exercises......................................................................................................57 vii

viii  ◾ Contents

3.4 Applied Single-Variable Optimization Models..................................59 Exercises......................................................................................................65 Projects.......................................................................................................68 References and Suggested Reading..............................................................69

4 Numerical Search Techniques in Single-Variable Optimization...........71 4.1

Single-Variable Techniques...............................................................71 4.1.1 Unrestricted Search..............................................................73 4.1.2 Exhaustive Search................................................................74 4.1.3 Dichotomous Search............................................................74 4.1.4 Golden Section Search.........................................................76 4.1.5 Finding the Maximum of a Function over an Interval with Golden Section......................................... 77 4.1.6 Golden Section Search with Technology..............................79 4.1.6.1 Excel Golden Search.............................................79 4.1.6.2 Maple Golden Search............................................80 4.1.6.3 MATLAB Golden Search.....................................82 4.1.7 Illustrious Examples with Technology.................................85 4.1.8 Fibonacci’s Search................................................................88 4.1.8.1 Finding the Maximum of a Function over an Interval with the Fibonacci Method.................88 4.2 Interpolation with Derivatives: Newton’s Method............................91 4.2.1 Finding the Critical Points (Roots) of a Function................91 4.2.2 The Basic Application..........................................................92 4.2.3 Newton’s Method to Find Critical Points with Technology..................................................................94 4.2.4 Excel: Newton’s Method......................................................94 4.2.5 Maple: Newton’s Method.....................................................94 4.2.6 Newton’s Method for Critical Points with MATLAB..........96 4.2.7 The Bisection Method with Derivatives...............................98 Exercises....................................................................................................100 Projects.....................................................................................................100 References and Suggested Further Readings..............................................101

5 Review of Multivariable Differential Calculus....................................103 5.1 Introduction: Basic Theory and Partial Differentiation...................103 5.2 Directional Derivatives and the Gradient.......................................109 Exercises.................................................................................................... 113 References and Suggested Reading............................................................ 114

6 Models Using Unconstrained Optimization: Maximization and

Minimization with Several Variables..................................................115 6.1 Introduction................................................................................... 115 6.2 The Hessian Matrix........................................................................ 117

Contents  ◾  ix

6.3 Unconstrained Optimization..........................................................128 Exercises....................................................................................................136 6.4 Eigenvalues.....................................................................................139 Exercises....................................................................................................140 Reference and Further Suggested Reading................................................. 141

7 Multivariate Optimization Search Techniques...................................143 7.1 Introduction...................................................................................143 7.2 Gradient Search Methods...............................................................143 7.3 Examples of Gradient Search.......................................................... 151 7.4 Modified Newton’s Method........................................................... 158 7.4.1 Modified Newton with Technology...................................162 Exercises....................................................................................................166 7.5 Comparisons of Methods...............................................................166 7.5.1 Maple Code for Steepest Ascent Method ­(See Fox and Richardson)..................................................166 7.5.2 Newton’s Method for Optimization in Maple....................168 Exercises....................................................................................................170 Projects Chapter 7.....................................................................................171 References and Suggested Reading............................................................171

8 Optimization with Equality Constraints............................................173

8.1 Introduction...................................................................................173 8.2 Equality Constraints Method of Lagrange Multipliers...................... 173 8.3 Introduction and Basic Theory....................................................... 174 8.4 Graphical Interpretation of Lagrange Multipliers........................... 176 8.5 Computational Method of Lagrange Multipliers ...........................178 Lagrange Method with Technology.................................................180 8.6 Applications with Lagrange Multipliers..........................................188 Exercises.................................................................................................... 191 Projects.....................................................................................................193 References and Suggested Reading............................................................194

9 Inequality Constraints: Necessary/

Sufficient Kuhn–Tucker Conditions (KTC)........................................195 9.1 Introduction to KTC......................................................................195 9.2 Basic Theory of Constrained Optimization....................................196 9.2.1 Necessary and Sufficient Conditions..................................197 9.3 Geometric Interpretation of KTC.................................................. 200 9.3.1 Spanning Cones (Optional).............................................. 200 9.4 Computational KTC with Maple.................................................. 204 9.5 Modeling and Application with KTC............................................. 218 Exercises....................................................................................................225

x  ◾ Contents

Project...................................................................................................... 228 Manufacturing............................................................................... 228 References and Suggested Reading ...........................................................229 10 Specialized Nonlinear Optimization Methods...................................231 10.1 Introduction...................................................................................231 10.1.1 Numerical and Heuristic Methods.....................................231 10.1.2 Technology....................................................................... 234 10.2 Method of Feasible Directions....................................................... 234 Exercises....................................................................................................238 10.3 Quadratic Programming................................................................239 Exercises................................................................................................... 246 10.4 Separable Programming.................................................................247 10.4.1 Adjacency Assumptions.....................................................248 10.4.2 Linearization Property.......................................................248 Exercises....................................................................................................257 References and Suggested Reading............................................................257 11 Dynamic Programming......................................................................259 11.1 Introduction: Basic Concepts and Theory.......................................259 11.1.1 Characteristics of Dynamic Programming.........................261 11.1.2 Working Backwards...........................................................261 11.2 Continuous DP..............................................................................262 11.3 Modeling and Applications of Continuous DP.............................. 264 Exercises................................................................................................... 266 11.4 Models of Discrete Dynamic Programming...................................267 11.5 Modeling and Applications of Discrete DP....................................270 Exercises....................................................................................................276 References and Suggested Readings...........................................................278

12 Data Analysis with Regression Models, Advanced Regression

Models, and Machine Learning through Optimization......................279 12.1 Introduction and Machine Learning..............................................279 12.1.1 Machine Learning............................................................. 280 12.1.1.1 Data Cleaning and Breakdown...........................281 12.1.1.2 Engineering.........................................................282 12.1.1.3 Model Fitting......................................................282 12.2 The Different Curve Fitting Criterion............................................282 12.2.1 Fitting Criterion 1: Least Squares.......................................282 12.2.2 Fitting Criterion 2: Minimize the Sum of the Absolute Deviations.......................................................... 284 12.2.3 Fitting Criterion 3: Chebyshev’s Criterion or Minimize the Largest Error............................285 Exercises....................................................................................................285

Contents  ◾  xi

12.3 Introduction to Simple Linear and Polynomial Regression.............287 12.3.1 Excel..................................................................................288 12.3.2 Regression in Maple...........................................................289 12.3.3 MATLAB..........................................................................290 Exercises....................................................................................................291 12.4 Diagnostics in Regression...............................................................291 12.4.1 Example for the Common Sense Test................................294 12.4.1.1 Exponential Decay Example...............................294 12.4.2 Multiple Linear Regression................................................296 Exercises....................................................................................................296 12.5 Nonlinear Regression through Optimization.................................296 12.5.1 Exponential Regression......................................................297 12.5.1.1 Newton–Raphson Algorithm .............................298 12.5.2 Sine Regression Using Optimization..................................307 12.5.3 Illustrative Examples..........................................................312 12.5.3.1 Nonlinear Regression (Exponential Decay).........312 Exercises....................................................................................................322 12.6 One-Predictor Logistic and One-Predictor Poisson Regression Models..........................................................................322 12.6.1 Logistic Regression and Poisson Regression with Technology................................................................323 12.6.1.1 Logistic Regression with Technology..................323 12.6.1.2 Simple Poisson Regression with Technology.......330 12.6.2 Logistic Regression Illustrious Examples............................335 12.6.3 Poisson Regression Discussion and Examples.....................337 12.6.3.1 Normality Assumption Lost................................338 12.6.3.2 Estimates of Regression Coefficients.................. 342 12.6.4 Illustrative Poisson Regression Examples.......................... 343 12.6.4.1 Maple................................................................. 343 Exercises ...................................................................................................354 Projects.....................................................................................................358 12.7 Conclusions and Summary.............................................................359 References and Suggested Reading............................................................359 Answers to Selected Problems......................................................................361 Index ...........................................................................................................389

Preface: Nonlinear Optimization—Models and Applications Why Nonlinear Optimization? For years, most mathematics majors, both pure and applied, have taken multivariable calculus within their core courses. What should be taken next? Although there are many debates on the next course or courses, one course that must be considered is nonlinear optimization. The world is more nonlinear than linear, so the topic makes sense. With the advent of technology to a great extent in mathematics programs, it makes even more sense. When I created the original course in 1990 while at the United Sates Military Academy, I was trying to both educate the students and introduce to them topics in operations research, applied mathematics, and mathematical economics. My search for an applied text was in vain. There were books on nonlinear optimization and nonlinear programming, but they were highly theoretical and pitched (in my opinion) at the graduate level. I created and wrote a comprehensive study guide for the course, which is the precursor of this book. The courses both at USMA and then at Francis Marion University were very successful. One student called me after he won a Rhodes Scholarship and told me that he won because he was the only person who could explain the role of shadow prices in nonlinear optimization. I used technology and labs within the course to expose the students to technological advances. I used mathematics, Maple™, and Excel at different points in the course. The study of nonlinear optimization is both a fundamental and a key course for applied mathematics, operations research, management science, industrial engineering, and economics at most colleges and universities. The use of linear programming software for microcomputers has become widely available. Like most tools, however, it is useless unless the user understands its applications and purpose. The user must ensure that the mathematical input accurately reflects the real-world xiii

xiv ◾

Preface

problem to be solved and that the numerical results are correctly used. Therefore, the mathematical modeling framework is critical to setting up and solving mathematical programming problems. The study of nonlinear optimization has usually been reserved as a graduate course in these same subject areas. The advancements in computer algebra systems and in the calculus reform enable this topic to be presented earlier. Most students complete the calculus sequence with multivariable calculus. This course begins where that course leaves off. The world is mostly nonlinear, and the students should begin early learning the techniques to solve problems modeled by nonlinear optimization. Rapidly changing technologies evidenced by the use of graphing calculators and enhanced computer software systems (Maple) enable the students to be exposed to these topics earlier than before.

Approach Mathematics is its own language: As such, mathematics is numerical, graphical, and analytical. We present the concepts in several ways to foster this understanding. The presentation is fully exposed and entirely self-contained. Students can read and study the fundamentals outside of class, allowing the instructor more freedom to interpret and stress in class those ideas and applications of primary interest, need, and instructor preference. Geometric interpretation: In one, two, and three dimensions, geometric interpretation is invaluable. We use geometric interpretation to reinforce the concepts and to foster understanding of the mathematical procedures. By gaining geometrical insights, the students can translate these procedures and both understand and apply them in higher dimensions. We seek to build upon and improve the students’ geometrical intuition gained from their calculus sequence. The students also see that many problems can be analyzed and approximate solutions found before analytical solutions techniques are applied. Numerical approximations: Numerical solutions techniques are used in the first four parts. They are the methods employed when the calculus cannot be readily or easily used. Early on, the students are exposed to numerical techniques. These numerical procedures are algorithmic and iterative. It is time consuming for a student to program or iterate these procedures by hand. Therefore, worksheets are provided in Excel and Maple to facilitate the procedure. Some worksheets are interactive and require the students to make decisions at each iteration. Algorithms: All algorithms are provided in a step-by-step format to aid the students in learning the method. Examples follow the summary to illustrate their uses and applications. Throughout, we emphasize the process and interpretation, not the rote use of the formula. Computer Tools: Technology is critical for nonlinear optimization. We found that no single optimization is suitable for learning and solving nonlinear problems.

Preface



xv

Therefore, we present (where applicable) the use of either MATLAB®, Maple, or Excel, as well as combinations of these software packages. We do not avoid the use of canned software such as Excel, MATLAB, or Maple, and their illustrations are limited. Illustrations of these are used as necessary. The associated problems are limited in scope only by the abilities of the software packages. Exercises: Many exercises are provided at the end of each section and chapter so that the students can practice the solution techniques and work with the mathematical concepts discussed. Answers to odd number problems will be provided at the back of the text. Projects are also provided at the end of each chapter to enhance understanding of the concepts and their applications to real-world-type problems. These allow the students to “put the concepts together” in a coherent applied problem. Course: This course is designed as a semester-long course in nonlinear optimization. Each section is designed to correspond to one- or two-hour-long lectures. Optional material and labs are clearly marked. Content: The nonlinear optimization sections have been used for years in our nonlinear optimization course. Students find it straightforward and easy to follow. The techniques are strengthened through both visualization and the use of technology.

Why Models and Modeling? Students need to get excited about the material. One way that I have found to do that is through mathematical modeling and applications to real-world-type issues and problems. Model building and its process expose students to realistic problems from business, industry, government, and engineering. Since this book is an introductory level, it provides a wide exposure to nonlinear optimization topics. To apply nonlinear optimization methods effectively, students require more than just numerical algorithms or lists of procedures. They need to see “how and when” we can apply the various topics. Therefore, we think students gain an important exposure that allows them to explore well beyond this text. Formulating nonlinear optimization problems is both an art and a science. Understanding the importance of assumption in developing an initial model is essential. The art is developed through practice, as with a serve in tennis. Practice makes one better. The science is developed through the understanding of the methods and algorithms presented in this text.

Prerequisite We emphasize that this text is written for the student who has completed multivariable calculus. Since relatively few students master all of the calculus involved in optimization, we briefly review the key calculus concepts as they arise in the text.

xvi ◾

Preface

Since these reviews are fundamental to the development of the optimization concepts, they are embedded in the text, but clearly marked as review. Linear algebra is not a prerequisite, although there are many concepts from linear algebra in the text. The concepts of solving simultaneous equations, linear independence and dependence, determinants, Cramer’s rule, the Hessian matrix, the Jacobian matrix, quadratic forms, and matrix definiteness are all defined as they relate to optimization principles. A simplistic overview of topics and skills needed from linear and matrix algebra is provided in a just-in-time fashion. The rise of CAS and powerful calculators allow students to explore more into courses that are primarily at graduate level. The depth required of a theoretical graduate course is not present, but the breadth of coverage of topics, models, and applications is provided. The main teaching problem that we found was finding an undergraduate text that meets our needs—a nice, low-level blend of theory with lots of applications and projects for students to become immersed in throughout the course. Every topic will be addressed with mathematical modeling as a theme. Where possible, a geometric illustration is provided to help visualize both the theory and the algorithm. We also think that a course in linear optimization as either a prerequisite or a co-requisite would be helpful for some of the topics in Chapter 10.

Audience This book was written for undergraduates at junior and senior levels, although we fully believe it can be used at the introductory graduate level for operations research or applied mathematics. Students in the following fields are candidates for this course: applied mathematics, operations research, management science, economics, financial mathematics, actuarial science, and computer science. This text enables a new course beyond linear optimization (linear programming) to be taught at the undergraduate level.

What Is Special Although this is a first edition, it has updates from almost 30 years of teaching experience and materials. Examples, as well as the exercises and projects, are new and robust.

Preface



xvii

Technology Many of the examples in this text were solved using MATLAB, Maple, and Excel. Their assistance in solving problems is critical to understand nonlinear optimization. Programs used to solve problems in this text can be requested from the author at [email protected]. Excel software is used for discrete numerical searches and linear programming using the simplex (or revised simplex) such as Chapters 4, 7, and 10–12. Maple, as a computer algebra system with excellent graphics, is used in Chapters 3–10 and 12. MATLAB is used in Chapters 4, 7, 10, and 12. MATLAB® is a registered trademark of The MathWorks, Inc. For product information, please contact: The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098 USA Tel: 508-647-7000 Fax: 508-647-7001 E-mail: [email protected] Web: www.mathworks.com

Acknowledgments First, I thank Dr. Frank Giordano for allowing me to create this course in 1991 while on the faculty at the United States Military Academy (USMA). I thank Dr. Thomas Daula, Department of Economics, at USMA for his insights and economic problems that we used from time to time in our course. I thank those rotating faculty that taught sections of this course with me and added to the richness of the material: Dr. Jeff Appleget, Paul Grim, Dr. A. Johnson, Dr. Pat Driscoll, Dr. Chris Fowler, and all the students that I taught and collaborated with over the years, especially Dr. Todd Combs. I thank Francis Marion University (FMU) for adding this course to our electives for the math majors. I thank Dr. Hank Richardson for all his programming help while we were at Francis Marion University. I also thank all those faculty that kept teaching the course after my departure from FMU. I also thank the editorial and production staff at Taylor & Francis Group for all their help and guidance. William P. Fox

xix

Author William P. Fox is an emeritus professor in the Department of Defense Analysis at the Naval Postgraduate School. He received his PhD in Industrial Engineering (Operations Research) from Clemson University and has taught at the United States Military Academy and at Francis Marion University where he was the chair of mathematics. He has written many publications, including over 20 books and over 150 journal articles. Currently, he is an adjunct professor in the Department of Mathematics at the College of William and Mary. He is the emeritus director of both the High School Mathematical Contest in Modeling and the Mathematical Contest in Modeling.

xxi

Chapter 1

Introduction to Optimization Models 1.1 Introduction Optimization is the act of obtaining the “best” result under given circumstances. In design, construction, and maintenance of any engineering system, engineers have to take many technological and managerial decisions at several stages. The ultimate goal of all such decisions is to minimize either the effort or cost required and maximize benefits. Since the effort required and the benefits desired in any practical situation are expressed as a function of certain decision variables, optimization can be defined as the process of finding the conditions that give the maximum or the minimum value of the function. As seen in Figure 1.1, the point x corresponds to the maximum value of f(x). There is no single method available for solving all optimization problems efficiently. Hence, a number of optimization methods have been developed for solving different types of optimization problems. The optimum-seeking methods are also known as mathematical programming techniques (specifically, nonlinear programming techniques) and are generally studied as part of operations research or applied mathematics. Operations research is a branch of mathematics that is concerned with the application of scientific methods and techniques to decision-making problems, and with establishing the best or optimal solutions. Table 1.1 lists various mathematical techniques used in the areas of operations research. As operations research is ever evolving, the list is always growing. Mathematical programming techniques are useful in finding the minimum and maximum of a function with or without a prescribed set of constraints. The stochastic processes can be used to analyze problems that are described by random 1

2



Nonlinear Optimization

Table 1.1 Some Operations Research Methods Methods of Operations Research Mathematical Programming

Stochastic Techniques

Calculus

Decision Theory

Regression Analysis

Calculus of Variations

Markov Processes

Cluster Analysis

Linear Programming

Queueing Theory

Design of Experiment

Nonlinear Programming

Renewal Theory

Factor Analysis

Geometric Programming

Simulation Methods

Taguchi Design

Quadratic Programming

Reliability Theory

Discriminate Analysis

Separable Programming

Risk Analysis

ANOVA

Integer and Mixed Integer Programming

Statistical Methods

Hypothesis Testing and Analysis

Dynamic Programming Networks Game Theory Heuristics and other methods such as Tabu Search, Simulated Annealing, and Genetic Algorithms Source: Modified and Updated from Rao (1979, p. 2).

variables having known probability distributions. The statistical methods enable one to analyze the experimental data and build empirical models to obtain a more accurate representation of the physical situation. This book deals with the mathematical programming techniques.

1.1.1 History The existence of optimization can be traced back to Newton, Leibniz, Cauchy, and Lagrange. The development of the differential calculus methods to optimization was possible due to the contributions of Newton and Leibniz. The foundations of calculus of variations are attributed to Bernoulli, Euler, Lagrange, and Weirstrass. The method of optimization for some constrained problems, which involves equality constraints and multipliers, is attributed to Lagrange (hence, Lagrange multipliers). Despite all

Introduction to Optimization Models



3

this initial hard work, it was not until the 20th century when computers allowed continuation of the study of optimization techniques. In 1947, George Danzig developed a simplex method for linear programming. In the early 1950s, Kuhn and Tucker developed the Kuhn–Tucker conditions for inequality constrained optimization problems. In 1957, Bellman discovered the principle of optimality for Dynamic Programming. Numerical methods began to develop in the 1960s as computers were more widely developed and used. Indeed, the history of techniques continues as they are still being developed today. TOPSIS (Technique Order Preferences to the Ideal Solution) was developed in the 1980s out to Kansas State and used by budget analysts in the government. Interior point methods for optimization are still in development.

1.1.2 Applications of Optimization Optimization, in its broadest sense, can be applied to solve many problems. Here is a short list. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Design of space travel for minimum time Design of star wars for maximum defense Optimal trajectories for space travel Design of bridges and roads Design to withstand earthquakes, hurricanes, and other disasters Optimal design of internal gears in machines Optimal design of networks and routing Optimal scheduling Find shortest routes Maximize profit Minimize costs Maximize efficiency of service and lines.

1.1.3 Modeling There are other situations in mathematical modeling that require us to determine the “best” or “optimal” solution. It may be the problem of determining the maximum profit a firm can make or that of finding the minimum sum of squared deviations between a fitted model and a set of data points. The process of finding the “best” solution to such problems is known as optimization. As you can see in this book, the solution of a model requiring optimization can be very difficult to obtain. The study of optimization constitutes a large and interesting field of mathematics in which extensive research is currently being conducted. While many optimization problems can be solved by a direct application of elementary single- and multivariable calculus, others require the application of specialized mathematics, which is best studied in a separate course or even a sequence of courses.

4



Nonlinear Optimization

In this chapter, we first present an overview of the field of continuous optimization. Later in the chapter, some scenarios are presented that naturally give rise to models requiring optimization. In the next chapter, we present the simplex method for resolving linear programs, and also present search techniques for finding numerical approximations to optimization problems. We provide a general classification of optimization problems, and we address situations that lead to models illustrating many types of optimization problems. In both these sections, the emphasis is placed on model formulations. This emphasis will allow you for additional practice on the first several steps of the modeling process while simultaneously providing a preview of the kinds of problems you will learn to solve in advanced mathematics courses. We address a special class of problems that can be solved using only elementary calculus. In the illustrative problem of that section, we develop a model for determining an optimal “inventory strategy.” The problem is concerned with deciding in what quantities and how often goods should be ordered to minimize the total cost of carrying an inventory. The restrictions on the various intermediate are rather severe, so the sensitivity of the solutions to the assumptions is examined. The emphasis is on model solution and model sensitivity analysis.

1.2 Classifying Optimization Problems In order to provide a framework for discussing a class of optimization problems, we offer a basic model for such problems. The problems are classified according to the various characteristics of the basic model that are possessed by the particular problem. We discuss, too, variations from the basic model itself. The basic model is given as equation (1.1). Maximize or minimize f ( X )

(1.1)

⎧ g ( xi ) ≤ b j ⎪⎪ Subject to ⎨ g ( xi ) = b j ⎪ ⎪⎩ g ( xi ) ≥ b j for i from 1 to m and j from 1 to n. Now let us explain the rather intimidating notation. To optimize means to maximize or minimize. The subscript j indicates that there may be one or more functions to optimize. The functions are distinguished by the integer subscripts that belong to the finite set J. We seek the vector f(X ) giving the optimal value for the set of functions The various components of the vector (x1, x 2, …, xn) are called the decision variables of the model, while the function f(X ) is called the objective function.

Introduction to Optimization Models



5

By “subject to,” we connote that there may be certain “side” conditions that must be met. For example, if the objective is to minimize costs of producing a particular product, it might be specified that all contractual obligations for the product be met as side conditions. Side conditions are typically called constraints. The integer subscript i indicates that there may be one or more constraint relationships that must be satisfied. A constraint may be an equality (such as precisely meeting the demand for a product) or inequality (such as not exceeding budgetary limitations, or providing the minimal nutritional requirements in a diet problem). Finally, each constant bi represents the level that the associated constraint function must achieve and, because of the way optimization problems are typically written, is often called the right-hand-side in the model. Thus, the solution vector must optimize each of the objective functions and simultaneously satisfy each constraint relationship. We now consider one simplistic problem illustrating the basic ideas. Example 1.1: Determining a Production Schedule The manufacturer of a new plant is planning to introduce two new products: a 19-inch stereo color set with a manufacturer’s suggested retail price (MSRP) of $339 and a 21-inch stereo color set with an MSRP of $399. The cost to the company is $195 per 19-inch set and $225 per 21-inch set, plus an additional $400,000 in fixed costs of initial parts, initial labor, and machinery. In a competitive market in which they desire to sell the sets, the number of sales per year will affect the average selling price. It is estimated that for each type of set, the average selling price drops by one cent for each additional unit sold. Furthermore, the sales of 19-inch sets will affect the sales of 21-inch sets and vice versa. It is estimated that the average selling price for the 19-inch set will be reduced by an additional 0.3 cents for each 21-inch set sold, and the price for the 21-inch set will decrease by 0.4 cents for each 19-inch set sold. We desire to provide them the optimal number of units of each type of set to produce and determine the expected profits. Recall Profit is revenue minus cost, P = R − C. Formulate the model to maximize profits. Ensure that you have accounted for all revenues and costs. Define all your variables. Solution We set up the problem to be solved. We want to maximize profits by making two types of TV sets. Let x1 = number of 21″ TV sets produced Let x 2 = number of 19″ TV sets produced Maximize profit = ( 339 − 0.01x1 − 0.03 x 2 ) x1 + ( 399 − 0.01x 2 − 0.04x1 ) x 2 − 400,000 − 195x1 − 225 x 2 To solve this type of problem, we have to learn unconstrained optimization. There are various ways of classifying optimization problems. These classifications are not meant to be mutually exclusive, but to describe certain mathematical characteristics possessed by the problem under investigation. We now describe several of these classifications.

6



Nonlinear Optimization

An optimization problem is said to be unconstrained if there are no constraints and constrained if one or more side conditions are present. The production problem just described illustrates an unconstrained problem. Now, consider adding a modification as follows. Example 1.2: Revisit Production of TVs Problem We assumed that the company has the potential to produce any number of TV sets per year. Now we realize that there is a limit on production capacity. Consideration of these two products came about because the company plans to discontinue manufacturing of its black-and-white sets, thus providing excess capacity at its assembly plants. This excess capacity could be used to increase the production of other existing product lines, but the company feels that these new products will be more profitable. It is estimated that the available production capacity will be sufficient to produce 10,000 sets per year (about 200 per week). The company has ample supply of 19- and 21-inch color tubes, chassis, and other standard components; however, circuit assemblies are in short supply. In addition, the 19-inch TV requires different circuit assemblies than the 21-inch TV. The supplier can deliver 8,000 boards per year for the 21-inch model and 5,000 boards per year for the 19-inch model. Considering this new information, what should the company now do? Formulate this problem. Solution We now have a constraint. This problem will be solved with constrained techniques discussed in Chapters 8 and 9. Maximize profit = ( 339 − 0.01x1 − 0.03 x 2 ) x1 + ( 399 − 0.01x 2 − 0.04x1 ) x 2 − 400,000 − 195x1 − 225 x 2 Subject to: 8,000 x1 + 5,000 x 2 ≥ 10,000 x1, x 2 ≥ 0

Example 1.3 Economic Order Quantity (EOQ ): Suppose a newspaper publisher must purchase three kinds of paper stock, Q 1, Q 2 , and Q 3. The publisher must meet their demand, but desire to minimize their costs in the process. They decide to use an Economic Lot Size model to assist them in their decisions. Given an EOQ model with constraints where the total cost is the sum of the individual quantity costs: C (Q1,Q 2 ,Q 3 ) = C (Q1 ) + C (Q 2 ) + C (Q 3 ) C (Qi ) = ai ⋅ d i Qi + hi ⋅ Qi 2

Introduction to Optimization Models



7

where d is the order rate, h is the cost per unit time (storage), Q/2 is the average amount on hand, and a is the order cost. The constraint is the amount of storage area available to the publisher so that he can have the three kinds of paper in hand for use: Q1, Q 2, and Q3. The items cannot be stacked, but can be laid side by side. They are constrained by the available storage area, S. The following data are collected: Type I

Type II

Type III

D

32 rolls/week

24

20

A

$25

$18

$20

h

$1/roll/week

$1.5

$2.0

S

4 sq ft/roll

3

2

You have a 200 sq ft of storage space available. We might formulate the problem as an NLP. Minimize total costs = ( 25)( 32 ) Q1 + (1)Q1 2 + ( 24 ) (18) Q 2 + 1.5Q 2 2 + ( 20 ) ( 20 ) Q 3 + 2Q 3 2 Subject to: 4Q1 + 3Q 2 + 2Q 3 ≤ 200 Qi > 0 for i = 1, 2,3

Example 1.4 Cobb-Douglass: Suppose, you want to use the Cobb–Douglass function P(L,K ) = A La Kb to predict output in thousands, based upon the amount of capital and labor used. Suppose you know that the prices of capital and labor per year are $10,000 and $7,000, respectively. Your company estimates the values of A as 1.2, a = 0.3, and b = 0.6. Your total cost is assumed to be T = PL * L + Pk * k, where PL and Pk are the prices of capital and labor, respectively. Initially consider that there is a funding level of $62,000. We can formulate the problem to determine which budget yields the best solution for your company. Minimize total cost = 1.2 L0.3 K 0.6 Subject to: 10,000 L + 7,000 K = 60,000 L, K ≥ 0

8



Nonlinear Optimization

Often we might need to be familiar with linear programming. An optimization problem is said to be a linear program if it satisfies the following properties: 1. There is a unique objective function. 2. Whenever a decision variable appears in either the objective function or one of the constraint functions, it must appear only as a power term with an exponent of 1, possibly multiplied by a constant. 3. No term in the objective function or in any of the constraints can contain products of the decision variables. 4. The coefficients of the decision variables in the objective function and each constraint are constant. 5. The decision variables are permitted to assume fractional and integer values. These properties ensure, among other things, that the effect of any decision variable is proportional to its value.

1.3 Review of Mathematical Programming with Excel Technology Technology is critical to solving, analyzing, and performing sensitivity analysis on linear programing problems. Technology provides a suite of powerful, robust routines for solving optimization problems, including linear programs (LPs). Technologies that we illustrate include Excel, LINDO, and LINGO as these appear to be used often in engineering. We also examined GAMS, which we found powerful but too cumbersome to discuss here. We tested all other software packages and found them all useful. We show the computer chip problem first using the revised simplex method. The revised simplex method is mathematically equivalent to the standard simplex method, but differs in implementation. Instead of maintaining a tableau that explicitly represents the constraints adjusted to a set of basic variables, it maintains a representation of a basis of the matrix representing the constraints as we describe using the computer chip problem. Example 1.5 Profit Z = 140 x1 + 120x 2 Subject to: 2x1 + 4 x 2 ≤ 1, 400 ( assembly time ) 4x1 + 3 x 2 ≤ 1,500 ( installation time ) x1 ≥ 0, x 2 ≥ 0



Introduction to Optimization Models

First, we put the problem in standard canonical form into a matrix, M: Z

x1

x2

s1

s2

z

1

−140

s1

0

s2

0

RHS

−120

0

0

2

4

1

0

1,400

4

3

0

1

1,500

We also need the basis B and it’s inverse B −1. In the first iteration for this problem, B and B −1 are identical. B is represented by the original columns of {Z, s1, s 2}. We present B −1. Z

s1

s2

z

1

0

0

s1

0

1

0

s2

0

0

1

We examine the costs coefficient [1, −140, −120, 0, 0] corresponding to {z, x1, x 2, s1, s 2} and choose the most negative as our problem is a maximization problem (if the problem were a minimization problem, then we would select the most positive). We select x1’s coefficient of −140. To see if and where it enters, we perform a min positive ratio test. We take the min positive ratio to the column associated with x1 and the RHS values. Our choices are {1,400/2 = 700 or 1,500/4 = 375}. The minimum is 375. This implies that s 2 is replaced by x1. So, we replace the column of s 2 with the column of x1 to get our new B and compute the new B −1 as follows: Z

s1

x1

z

1

0

35

s1

0

1

−0.5

x1

0

0

0.25

Now that we have the new B −1, we multiply the new B −1 and the original matrix M to obtain the new tableau, called T. Z

x1

x2

s1

s2

RHS

z

1

0

−15

0

35

52,500

s1

0

0

2.5

1

−0.5

650

x1

0

1

0.75

0

0.25

375

9

10



Nonlinear Optimization

We check for optimality by looking at the costs, [1, 0, −15, 1, 35]. Since the cost associated with x 2 is negative, we are not optimal. So, we repeat the process where x 2 enters the basis, B. The min positive ratio values are {650/2.5 = 260, 375/0.75 = 500}, and we select 260. This implies that x 2 enters and s1 leaves the basis. We replace the column of s1 in B with the column of x 2 and compute B −1 as follows: Z

x2

x1

1

6

32

0

0.4

−0.2

0

−0.3

0.4

We multiply the new B −1 and the original M to obtain Z

x1

x2

Z

1

0

0

x2

0

1

x1

0

0

s1

s2

RHS

6

32

56,400

0

0.4

−0.2

260

1

−0.3

0.4

180

We examine our costs [1, 0, 0, 6, 32]. Since they are all greater than or equal to 0, we are optimal and we have our solution. We find our optimal solution as x1 = 180, x 2 = 260, and z = 56,400. We present this method as the reader might find it useful in Chapter 10 when covering quadratic and separable programming methods.

1.3.1 Excel Using the Solver Put the problem formulation into Excel. Note that you must have formulas in terms of cells for the objective function and the constraints. This is shown in Figures 1.1 and 1.2.

Figure 1.1

Screenshot for LP formulation in Excel.

Introduction to Optimization Models

Figure 1.2



11

Screenshot from Excel.

Highlight the objective function in cell C12, open the Solver, and select the solution method as SimplexLP (Figure 1.3).

Figure 1.3

Opening the Solver.

12 ◾

Nonlinear Optimization

Insert the decision variables into the By Changing Variable Cells; in this case, they are in cells C8 and C9 (Figure 1.4).

Figure 1.4

Entering the decision variables cells.

Enter the constraints by evoking the Add command (Figure 1.5).

Figure 1.5

Opening the constraints command.

Introduction to Optimization Models

◾ 13

Enter the constraints one at a time. Ensure that the equation for each constraint is written in terms of the decision variables in C8 and C9 (Figure 1.6).

Figure 1.6

Entering the constraints and Solve.

Now click on the Solver button, Solve. Save both the answer and sensitivity analysis worksheets (Figure 1.7).

14



Nonlinear Optimization

Figure 1.7

Save answers and sensitivity sheets.

View your solution and analysis reports. We see that we obtained our optimal solution as x1 = 180, x 2 = 260, and z = 56,400 (Figures 1.8–1.10).

Figure 1.8

The output.

Introduction to Optimization Models

Answer Report

Figure 1.9

The answer report.

Sensitivity Report

Figure 1.10

The sensitivity report.



15

16



Nonlinear Optimization

As expected, we have the same answers as we found earlier. We present the following example that we solved with the Solver technology. Example 1.6 Maximize Z = 25 x1 + 30x 2 Subject to : 2x1 + 3 x 2 ≤ 100 5x1 + 4 x 2 ≤ 120 x1 ≤ 10 x1, x 2 ≥ 0 We enter the decision variable in cells B6 and B7 and the objective function in cell E5 as 25∗B6 + 30∗B7.

> We enter the constraint formulas in cells D11, D12, and D13. The formula for cell D11 is 20∗B6 + 30∗B7, for cell D12 is 5∗B6 + 4∗B7, and for D13 is 10∗B6.

Introduction to Optimization Models

We open the Solver and enter objective function, decision cells, and constraints.

◾ 17

18



Nonlinear Optimization

We enter the constraints into Solver.

We have the problem set up. We click Solve.

Introduction to Optimization Models

◾ 19

We obtain the answers as x1 = 9, x 2 = 24, and Z = 972. Additionally, we can obtain reports from Excel. The two key reports are the answer report and the sensitivity report. The answer report also tells us which constraints are binding constraints. A binding constraint is satisfied completely. Answer Report

Sensitivity Report

20



Nonlinear Optimization

We find our solution as x1 = 9, x 2 = 24, and P = $972. From the standpoint of sensitivity analysis, Excel is satisfactory in that it provides shadow prices. Limitation: No tableaus are provided making it difficult to find alternate solutions.

1.3.2 Examples for Integer, Mixed-Integer, and Nonlinear Optimization Example 1.7: Emergency Services Recall this as example from Chapter 1. Here we formulate and present a solution. Solution We assume that due to the nature of the problem, a facility location problem, that we should decide to employ integer programming to solve the problem. Decision Variables ⎧⎪1 yi = ⎨ ⎪⎩0

if node is covered

⎪⎧1 xj = ⎨ ⎩⎪0

if ambulance is located in j

if node not covered

if not located in j

m = number of ambulances available hi = the population to be served at demand node i tij = shortest time from node j to node i in perfect conditions i = set of all demand nodes j = set of nodes where ambulances can be located Model Formulation Maximize Z = 50,000 y1 + 80,000 y 2 + 30,000 y3 + 55,000 y 4 + 35,000 y5 + 20,000 y6 Subject to: x1 + x 2 ≥ y1 x1 + x 2 + x 3 ≥ y 2 x 3 + x 5 + x 6 ≥ y3 x3 + x 4 + x6 ≥ y4 x 4 + x 5 + x 6 ≥ y5 x 3 + x 5 + x 6 ≥ y6 x1 + x 2 + x 3 + x 4 + x 5 + x 6 = 3

Introduction to Optimization Models

◾ 21

All variables are binary integers. Solution and Analysis: We find that we can cover all 270,000 potential patients with three ambulances posted in locations 1, 3, and 6. We can cover all 270,000 potential patients with only two ambulances posted in locations 1 and 6. If we only had one ambulance, we can cover at most 185,000 potential patients with the ambulance located in location 3. We will have 85,000 potential patients not covered. For management, they have several options that meet demand. They might use the option that is the least costly. Example 1.8: Optimal Path to Transport Hazardous Material FEMA is requesting a two-part analysis. They are concerned about the transportation of nuclear waste from the Savannah River nuclear plant to the appropriate disposal site. After the route is found, FEMA requests analysis as to the location and composition of clean-up sites. In this example, we only discuss the optimal path portion of the model using generic data. Consider a model whose requirement is to find the route from node A to node B that minimizes the probability of a vehicle accident. A primary concern is the intersection of the two interstates, I-95 and I-20. These two interstates converge in Florence, SC corridor where both interstate meets and converge in Florence, SC. To simplify the ability of the use of technology, we transform the model to maximize the probability of not having an accident. Maximize f ( x12 , x13 ,… , x 9,10 ) = (1 − p12 x12 ) (1− p13 x13 )(1− p9,10 x 9,10 ) Subject to: − x12 − x13 − x14 = −1 x12 − x 24 − x 26 = 0 x13 − x 34 − x 35 = 0 x14 + x 24 + x 34 − x 45 − x 46 − x 48 = 0 x 35 + x 45 − x 67 = 0 x 26 + x 46 − x 67 − x68 = 0 x 57 + x67 − x 78 − x 7,10 = 0 x 48 + x68 + x 78 − x8,10 = 0 x 79 − x9,10 = 0 x 7,10 + x8,10 + x9,10 = 1 non-negativity

22



Nonlinear Optimization

Example 1.9: Minimum Variance of Expected Investment Returns (Fox et al., 2013) A new company has $5,000 to invest, but the company needs to earn about 12% interest. A stock expert has suggested three mutual funds {A, B, and C} in which the company could invest. Based upon previous year’s returns, these funds appear relatively stable. The expected return, variance on the return, and covariance between funds are shown below:

Expected Value

0.14 Variance

Covariance

B

A 0.11

A

C 0.10

B

C

0.2

0.08

0.18

AB

AC

BC

0.05

0.02

0.03

Introduction to Optimization Models

◾ 23

Formulation: We use laws of expected value, variance, and covariance in our model. Let xj be the number of dollars invested in funds j (j = 1, 2, 3). MinimizeV I = Var ( Ax1 + Bx 2 + Cx 3 ) = x12 Var ( A ) + x 22 Var ( B ) + x 32  Var (C ) + 2x1x 2Cov ( AB ) + 2x1x3Cov ( AC ) + 2x 2 x 3Cov ( BC ) = 2x12 + 0.08x 22 + 0.18x32 + 0.10x1x 2 + 0.04x1x3 + 0.06x 2 x3 Our constraints include 1. the expectation to achieve at least the expected return of 12% from the sum of all the expected returns: 0.14x1 + 0.11x 2 + 0.1x 3 ≥ ( 0.12x 5,000 ) or 0.14x1 + 0.11x 2 + 0.10 x 3 ≥ 600 2. the sum of all investments must not exceed the $5,000 capital  x1 + x 2 + x 3 ≤ $5,000. The optimal solution is x1 = 1904.80, x 2 = 2381.00, x 3 = 714.20, z = $1880942.29 or a standard deviation of $1371.50. The expected return is 0.14(1904.8) + 0.11(2,381) + 0.1(714.2)/5,000 = 12%. This example was used as a typical standard for investment strategy. Example 1.10: Cable Installation Consider a small company that is planning to install a central computer with cable links to five new departments as shown schematically in Figure 3.7. According to their floor plan, the peripheral computers for the five departments will be situated as shown by the dark circles in Figure 1.11. The company wishes to locate the central computer so that the minimal amount of cable will be used to link to the five peripheral computers. Assuming that cable may be strung over the ceiling panels in a straight line from a point above any peripheral computer to a point above the central computer, the distance formula may be used to determine the length of cable needed to connect any peripheral computer to the central computer. Ignore all lengths of cable from the computer itself to a point above the ceiling panel immediately over that computer. That is, work only with lengths of cable strung over the ceiling panels. A sketch is provided in Figure 1.11.

24



Nonlinear Optimization

Figure 1.11

The grid for the five departments.

The coordinates of the locations of the five peripheral computers are listed below. X

Y

15

60

25

90

60

75

75

60

80

25

Assume the central computer will be positioned at coordinates (m, n) where m and n are the integers in the grid representing the office space. Determine the coordinates (m, n) for placement of the central computer that minimize the total amount of cable needed. Report the total number of feet of cable needed for this placement along with the coordinates (m, n). This is an unconstrained optimization model. We want to minimize the sum of the distances from each department to the placement of the central computer system. The distances represent cable lengths assuming that a straight line is the shortest distance between two points. Using the distance formula,

Introduction to Optimization Models



25

d = ( x − X 1 ) + ( y −Y1 ) , where d represents the distance (cable length in feet) between the location of the central computer (x, y) and the location of the first peripheral computer (X1,Y1). Since we have five departments, we define 2

2

5

Distance =



( x − X i )2 + ( y −Yi )

2

.

i =1

Using the gradient search method in the Excel Solver, we find our solution is, distance = 157.66 ft when the central computer is placed at coordinates (56.82, 68.07).

1.4 Exercises 1. Your company is considering for investments. Investment 1 yields a net present value (NPV) of $17,000; investment 2 yields an NPV of $23,000; investment 3 yields an NPV of $13,000; and investment 4 yields an NPV of $9,000. Each investment requires a current cash flow of investment 1, $6,000; investment 2, $8,000; investment 3, $5,000; and investment 4, $4,000. At present, $21,000 is available for investment. Formulate and solve as an Integer Programming problem assuming that you can only invest at most one time in each investment. 2. Your company is considering for investments. Investment 1 yields a net present value (NPV) of $17,000; investment 2 yields an NPV of $23,000; investment 3 yields an NPV of $13,000; and investment 4 yields an NPV of $9,000. Each investment requires a current cash flow of Investment 1, $6,000; investment 2, $8,000; investment 3, $5,000; and investment 4, $4,000. At present, $21,000 is available for investment. Formulate and solve as an Integer Programming problem assuming that you can only invest more than once in any investment. 3. For the cable installation example, assume that we are moving the computers around to the following coordinates and resolve. X

Y

10

50

35

85

60

77

75

60

80

35

26 ◾

Nonlinear Optimization

1.5 Review of the Simplex Method in Excel Using Revised Simplex This section prepares us to do separable and quadratic programming in Chapter 10. Key Formulas: Costs: C ′j = C BV B −1a j − C j RHS: b′ = B −1b Updated Column: Xj = B −1aj Z updated: Z = CBV B −1b Notes: The product form of the inverse is slick to update B −1. Rational: Pivoting is difficult and time consuming even for a computer. Steps in Revised Simplex Step 0: Note the columns under which B −1 will be read. Initially B −1 is the identity matrix, I. Step 1: For the current table, compute CBV B −1. Step 2: Price outs. Determine updated Cj for all non-basic variables to determine if we are optimal or if one of the non-basic variables wants to enter to basis. Step 3: Determine which row (variable) is in xj. Use the ratio test to compute the new b (B−1b) and the new xj (B−1aj). This gives us the new set of basic variables. Step 4: Obtain the new B −1 and return to Step 1. Example 1.11 Max Z = 3 x1 + x 2 + 4x 3 Subject to: (1)

x1 + x 2 + x 3 ≤ 6 2 x1 − x 3 ≤ 4 x 2 + x3 ≤ 2 x1, x 2 , x3 ≥ 0

(2) Put into standard form Z = −3 x1 − x 2 − 4 x 3 x1 + x 2 + x 3 + s1 = 6 2 x1 − x 3 + s 2 = 4 x 2 + x 3 + s3 = 2

Introduction to Optimization Models

◾ 27

Because of the above structure, we know BV = {s1, s 2, s3} and NBV = {x1, x 2, x 3}, so ⎡ 1 ⎢ B0 = B0 −1 = ⎢ 0 ⎢⎣ 0

0 1 0

⎤ ⎥ ⎥ ⎥⎦

0 0 1

Step 1: CBV B −1 = [0 0 0] Step 2: Compute price outs for cx1, cx 2, and cx 3. Use C ′j = C BV B −1a j − C j cx1 = −3 cx 2 = −1 cx 3 = −4 * most negative for a max problem so x 3 enters. Step 3: Update columns for x 3 and b. Because B −1 is currently the I, they do not change. ⎡ 6 ⎤ ⎡ 1 ⎤ ⎢ ⎥ ⎥ ⎢ x 3 = ⎢ −1 ⎥ , b = ⎢ 4 ⎥ 1 ⎥ ⎢⎣ 2 ⎥⎦ ⎦ ⎣⎢ Ratio test winner is 2/1, so s3 leaves. New B is {s1, s2, x 3}, and new NBV is {x1, x 2, s3}: ⎡ 1 ⎢ B=⎢ 0 0 ⎣⎢

0 1 0

1 −1 1

⎡ 1 ⎤ ⎥ ⎢ −1 ⎥  B = ⎢ 0 0 ⎦⎥ ⎣⎢

0 1 0

−1 1 1

⎤ ⎥ ⎥ ⎥⎦

Step 1: CBV B −1 = [0 0 4] and B −1 = [0 0 4] Step 2: Price outs for cx1, cx 2, and cs3 T

cx1 = ⎡ 0 ⎣

0

4 ⎤⎡ 1 ⎦⎣

2

0 ⎤ − 3 = −3 ⎦

cx 2 = ⎡ 0 ⎣

0

4 ⎤⎡ 1 ⎦⎣

0

1 ⎤ −1= 3 ⎦

cs3 = ⎡ 0 ⎣

0

4 ⎤⎡ 0 ⎦⎣

0

1 ⎤ −0= 4 ⎦

T

T

x1 wants to enter. A new column for x1 is found by using t B −1ax1 = [1 2 0]T B −1. b is = [4 6 2]T ratio shows s 2 leaves (ratio value is 3). New B is {s1, x1, x 3} and new NBV is {x 2, s 2, s3} ⎡ 1 ⎢ B=⎢ 0 ⎣⎢ 0

1 2 0

1 −1 1

⎡ 1 ⎤ ⎢ ⎥ −1 = B   ⎢ 0 ⎥ ⎢ 0 ⎦⎥ ⎣

−0.5 0.5 0

−1.5 0.5 1

⎤ ⎥ ⎥ ⎥ ⎦

28 ◾

Nonlinear Optimization

Step 1: CBV B −1 = [0 3 4] and B −1 = [0 1.5 5.5] Step 2: Price outs for {x 2, s 2, s3} cx 2 = 4.5 cs 2 = 1.5 cs3 = 5.5 Stop, no NBV want to enter we are optimal. Z =⎡ 0 ⎣ ⎡ 1 ⎢ Final b = ⎢ 0 ⎢ 0 ⎣

−0.5 0.5 0

1.5 −1.5 0.5 1

5.5 ⎤ ⎡ 6 ⎦⎣ ⎤ ⎥⎡ ⎥⎣ 6 ⎥ ⎦

4

T

2 ⎤ = 17 ⎦

4

T

2 ⎤ =⎡ 1 ⎦ ⎣

3

2 ⎤ ⎦

So Z = 17, s1 = 1, x 2 = 3, x 3 =2, x1 = s 2 = s3 = 0 Try this one next. Example 1.12 Max Z = 4 x1 + x 2 Subject to: x1 + x 2 ≤ 4 2 x1 + x 2 ≥ 6 2x2 ≥ 6 (Remember that B −1 is always found under the columns corresponding to the starting basis.) With problems with more than two variables, an algebraic method may be used. This method is called the simplex method. The simplex method, developed by George Dantzig in 1947, incorporates both optimality and feasibility tests to find the optimal solution(s) to a linear program (if an optimal solution exists). An optimality test shows whether or not an intersection point corresponds to a value of the objective function better than the best value found so far. A feasibility test determines whether the proposed intersection point is feasible. It does not violate any of the constraints. The simplex method starts with the selection of a corner point (usually the origin if it is a feasible point) and then, in a systematic method, moves to adjacent corner points of the feasible region until the optimal solution is found or it can be shown that no solution exists. We will use our computer chip example to illustrate. Maximize Profit Z = 140 x1 + 120x 2 2x1 + 4 x 2 ≤ 1, 400 ( assembly time )

4x1 + 3 x 2 ≤ 1,500 ( installation time ) x1 ≥ 0, x 2 ≥ 0

Introduction to Optimization Models



29

1.5.1 Steps of the Simplex Method 1. Tableau Format: Place the linear program in Tableau Format, as explained below: Maximize Profit Z = 140 x1 + 120x 2 2x1 + 4 x 2 ≤ 1, 400 ( assembly time ) 4x1 + 3 x 2 ≤ 1,500 ( installation time ) x1 ≥ 0, x 2 ≥ 0

To begin the simplex method, we start by converting the inequality constraints (of the form 0. The inequality 2x1 + 4x 2 < 1,400 states that the sum 2x1 + 4x 2 is ≤1,400. The slack variable “takes up the slack” between the values used for x1 and x 2 and the value 1,400. For example, if x1 = x 2 = 0, S1 = 14,000. If x1 = 240 and x 2 = 0, then 2(240) + 4(0) + S1 = 1,400, so S1 = 920. A unique slack variable must be added to each inequality constraint. Maximize Z = 140 x1 + 240x 2 Subject to : 2x1 + 4 x 2 + S1 ≤ 1,400 ( assembly time ) 4x1 + 3 x 2 + S2 ≤ 1,500 ( installation time ) x1 ≥ 0, x 2 ≥ 0,S1 ≥ 0,S2 ≥ 0 Adding slack variables makes the constraint set a system of linear equations. We write these with all variables on the left-hand side of the equation and all constants on the right-hand side. We will even rewrite the objective function by moving all variables to the lefthand side. Maximize Z = 120 x1 + 140x 2 is written as Z − 140 x1 − 120x 2 = 0

30



Nonlinear Optimization

Now, these can be written in the following form: Z − 140 x1 − 120x 2 = 0 2x1 + 4 x 2 + S1 = 1, 400 4x1 + 3 x 2 + S2 = 1,500 x1 ≥ 0, x 2 ≥ 0, S1 ≥ 0, S2 ≥ 0 or more simply in a matrix. This matrix is called the simplex tableau. Z

x1

x2

S1

S2

RHS

1

−140

−120

0

0

=

0

0

2

4

1

0

=

1,400

0

4

3

0

1

=

1,500

Because we are working in Excel, we will take advantage of a few commands, MINVERSE and MMULT to update the tableau. 2. Initial Extreme Point: The simplex method begins with a known extreme point, usually the origin (0, 0) for many of our examples. The requirement for a basic feasible solution gives rise to special simplex methods such as Big M and TwoPhase Simplex, which can be studied in a linear programming course. The Tableau previously shown contains the corner point (0, 0) and is our initial solution. Z

x1

x2

S1

1

−140

−120

0

0

=

0

0

2

4

1

0

=

1,400

0

4

3

0

1

=

1,500

We read this solution as follows: x1 = 0 x2 = 0 S1 = 1, 400 S2 = 1,500 Z =0

S2

RHS

Introduction to Optimization Models



31

As a matter of fact we see that the columns for variables Z, S1, and S 2 form a 3 × 3 identity matrix. These three are referred to as basic variables. Let’s continue to define a few of these variables further. We have five variables {Z, x 1, x 2 , S1, S 2} and  three equations. We can have at most three solutions. Z will always be a solution by convention of our tableau. We have two non-zero variables among {x 1, x 2 , S1, S 2}. These non-zero variables are called the basic variables. The remaining variables are called the non-basic variables. The corresponding solutions are called the basic feasible solutions (BFS) and correspond to corner points. The complete step of the simplex method produces a solution that corresponds to a corner point of the feasible region. These solutions are read directly from the tableau matrix. We also note that the basic variables are variables that have one of their columns consisting of 1 and the rest their column consisting of 0. We will add a column to label these as shown below: Basic Variable

Basic Variable

Basic Variable

Z

x1

x2

S1

S2

Z

1

−140

−120

0

0

=

0

S1

0

2

4

1

0

=

1,400

S2

0

4

3

0

1

=

1,500

RHS

3. Optimality Test: We need to determine if an adjacent intersection point improves the value of the objective function. If not, the current extreme point is optimal. If an improvement is possible, the optimality test determines which variable currently in the independent set (having value zero) should enter the dependent set as a basic variable and become nonzero. For our maximization problem, we look at the Z-Row (the row marked by the basic variable Z). If any coefficients in that row are negative, then we select the variable whose coefficient is the most negative as the entering variable. In the Z-Row, the coefficients are as follows:

Z

Z

x1

x2

S1

S2

1

−140

−120

0

0

The variable with the most negative coefficient is x1 with a value of −140. Thus, x 2 wants to become a basic variable. We can only have three basic variables in this

32



Nonlinear Optimization

example (because we have three equations), so one of the current basic variables {S1, S2} must be replaced by x1. Let’s proceed to see how we determine which of the existing variables are basic variables. 4. Feasibility Test: To find a new intersection point, one of the variables in the basic variable set must exit to allow the entering variable from Step 3 to become basic. The feasibility test determines which current dependent variable to choose for exiting, ensuring that we stay inside the feasible region. We will use the minimum positive ratio test as our feasibility test. The minimum rhs j positive ratio test is the MIN(RHSj/aj > 0). Make a quotient of the . aj Most Negative Coefficient (−30) Z

x1

Z

1

−140

S1

0

S2

0

x2

Ratio Test S1

S2

RHS

Quotient

−120

0

0

=

0

2

4

1

0

=

1,400

1,400/2 = 700

4

3

0

1

=

1,500

1,500/4 = 375∗

Note that we will always disregard all quotients with either 0 or negative values in the denominator. In our example, we compare {700, 375} and select the smallest non-negative value. This gives the location of the matrix pivot that we will perform. However, matrix pivots in Excel are not easy, so we will use the updated matrix B by swapping the second column with the column of the variable x 2. Then, we invert B to obtain B−1. Then, we multiply the original tableau by B −1.

In three iterations of the Simplex, we have found our solution.

Introduction to Optimization Models

◾ 33

The final solution is read as follows: Basic Variables x 2 = 260 x1 = 180 Z = 56,400 Non-basic variables S1 = S2 = 0 The final tableau is important also.

We look for possible alternate optimal solutions by looking in the Z-Row for costs of 0 for non-basic variables. Here, there are none. We also examine the cost coefficient for the non-basic variables and recognize them as reduced costs or shadow prices. In this case, the shadow prices are 6 and 32, respectively. Again if the cost of an additional unit of each constraints was the same, then adding an additional unit of constraint two produces the largest increase in Z (32 > 6).

References and Suggested Further Reading Albright, B. 2010. Mathematical Modeling with EXCEL. Jones and Bartlett Publishers, Burlington, MA, Chapter 7. Apaiah, R. and E. Hendrix. 2006. Linear programming for supply chain design: A case on Novel protein foods. Ph.D. Thesis, Wageningen University (Netherlands). Balakrishnan, N., B. Render, and R. Stair. 2007. Managerial Decision Making, 2nd Ed. Prentice Hall, Upper Saddle River, NJ. Bazarra, M.S., J.J. Jarvis, and H.D. Sheralli. 1990. Linear Programming and Network Flows. John Wiley & Sons, New York. Ecker, J. and M. Kupperschmid. 1988. Introduction to Operations Research. John Wiley and Sons, New York. Fishback, Paul E. 2010. Linear and Nonlinear Programming with Maple: An Interactive, Applications-Based Approach. CRC Press, Boca Raton, FL, http://www.mathplace.org/ C064X/main.pdf. Fox, W. 2012. Mathematical Modeling with Maple. Cengage Publishers, Boston, MA, Chapters 7–10.

34 ◾

Nonlinear Optimization

Fox, W.P. and F. Garcia. 2013. Modeling and linear programming in Engineering Management. In: Engineering Management, edited by Fausto Pedro García Márquez and Benjamin Lev, InTech, March 3, 2013, ISBN 978-953-51-1037-8. Giordano, F., W. Fox, and S. Horton. 2014. A First Course in Mathematical Modeling, 5th Ed. Cengage Publishers, Boston, MA, Chapter 7. Hiller, F. and G.J. Liberman. 1990. Introduction to Mathematical Programming. McGraw Hill Publishing Company, New York. McGrath, G. 2007. Email marketing for the U.S. Army and Special Operations Forces Recruiting. Master’s Thesis, Naval Postgraduate School. Rao, S.S. 1979. Optimization Theory and Application. John Wiley & Sons, New Delhi, India. Winston, W.L. 1994. Operations Research: Applications and Algorithms, 3rd Ed. Duxbury Press, Belmont, CA. Winston, W.L. 2002. Introduction to Mathematical Programming Applications and Algorithms, 4th Ed. Duxbury Press, Belmont, CA.

Chapter 2

Review of Differential Calculus 2.1 Limits The idea of a limit is one of the most basic ideas in calculus. The equation

lim f (x ) = c means that as x gets closer to a (but not equal to a), the value of f(x) x →a gets arbitrarily close to c. It is also possible that the lim f ( x ) will not exist. Limits x →a

can be viewed analytically, graphically, and by numerical tables. Let’s illustrate with some examples. Example 2.1: Consider the lim ( x 2 − 2 x + 4 ) x →2

a. Analytical. We substitute x = a into f(x) to determine whether or not f(a) exists and is a real value.

(

)

lim x 2 − 2 x + 4 = 22 – 2 ( 2 ) + 4 = 4. x→2

(

)

Since lim x 2 − 2 x + 4  = 4, the limit exists. As x → 2, f(x) approaches 4. x→2

b. Graphically. We see that Figure 2.1 shows that as x approaches 2, f(x) approaches 4. c. Numerical table. We see in Table 2.1 that as x approaches 2 from the left, the values get closer to 4, and as x approaches 2 from the right, f(x) approaches 4. We must allow x to approach 2 from both the left and the right in the limiting process to determine whether or not the limit exists.

35

36



Nonlinear Optimization

Figure 2.1

Plot of x2 − 2x + 4 as x approaches 2.

Table 2.1 Numerical Values Approaching Limit in Example 2.1 f(x)

X

X

f(x)

1.9

3.81

2.1

4.21

1.95

3.902a5

2.05

4.1025

1.99

3.9801

2.1

4.21

1.995

3.990025

2.005

4.010025

1.999

3.998001

2.001

4.002001

1.9999

3.99980001

2.0001

4.00020001

Example 2.2: Consider the lim

x →0

1 x

(a) Analytical. We substitute x = 0 for x and see that 1/0 is not defined. Therefore, we might conclude that limit does not exist (LDNE). We might check to see if the function can be reduced or simplified before we reach this conclusion as shown in Example 2.3.

Review of Differential Calculus

Figure 2.2



Plot of 1/x.

(b) Graphically. In Figure 2.2, we see that as we approach 0 from the left and the right, the function approaches different quantities, ±∞. We conclude the LDNE. (c) Numerical table. In Table 2.2, we clearly see that the values from the left and the right are not tending toward the same values.

Table 2.2 Left and Right Limits for Example 2.2 X

f(x)

X

f(x)

−1

−1

1

1

−0.95

−1.052631579

0.95

1.052631579

−0.9

−1.111111111

0.9

1.111111111

−0.5

−2

0.5

2

−0.1

−10

0.1

10

−0.05

−20

0.05

20

−0.0001

−10000

0.0001

10000

37

38  ◾  Nonlinear Optimization

Example 2.3:  Consider the lim

x →2

(x

2

− 4)

(x − 2)

(a) Analytical. If we merely substitute, we get 0/0, which is an indeterminate form. We do not want to conclude the LDNE until we exhaust the following rules. If we have an indeterminate form such as 0/0 or ∞/∞, then we might try simplification of the functions using algebra or we might use L’Hopital’s rule. There are two conditions, which must be met. They are as follows: 1. Differentiability: For a limit approaching c, the original functions must be differentiable either side of c, but not necessarily at c. Likewise g′(x) is not equal to zero at either side of c. 2. The limit must exist

Lim x → cf ′ ( x ) g ′ ( x )

Then, L’Hopital’s rule states that lim

x→a

f (x) f ′(x) = lim . g ( x ) x→a g ′ ( x )

2x Therefore, if we employ L’Hopital’s rule, then we have lim = 4. The limit does x→2 1 exist, and it is 4. d. Graphically. In Figure 2.3, we see that as we approach 2 from the left or the right, we approach f(x) = 4. e. Numerical table. Table 2.3 shows that as x approaches 2, f(x) approaches 4.

Figure 2.3  Plot of x2 − 4/(x − 2) as x approaches 2.

Review of Differential Calculus  ◾  39 Table 2.3  Limit of x2 − 4/(x − 2) as x Approaches 2 Numerically X

f(x)

X

f(x)

1.9

3.9

2.1

4.1

1.95

3.95

2.05

4.05

1.99

3.99

2.1

4.1

1.995

3.995

2.005

4.005

1.999

3.999

2.001

4.001

1.9999

3.9999

2.0001

4.0001

2.2 Continuity We begin with a definition of continuity. A function f(x) is continuous at a point a if lim f ( x ) = f ( a ) and is continux →a

ous on an interval [a, b] if the function is continuous at every point in the interval. If f(x) is not continuous at x = a, we say that f(x) is discontinuous (or has a discontinuity) at a. Recall from your study of functions that often discontinuities are points NOT in the domain of the input variable x. The above definition requires three conditions hold for f(x) to be continuous at a. 1. The function, f(a), is defined (a is in the domain of x). 2. The lim f ( x ) exists. x →a

3. The lim f ( x ) = f ( a ) . x →a

Also recall the following two facts from calculus. a. Polynomials are continuous everywhere. That is they are continuous on the open interval (−∞, ∞). b. Rational functions are continuous wherever they are defined. That is, they are continuous over the domain of x. Sometimes, we might restrict the domain of a function so that we can make the functions continuous over the restricted domain.

40  ◾  Nonlinear Optimization

Example 2.4:  Show That the Function f ( x ) = x 2 + 13 − x Is Continuous at x = 4 f ( 4 ) = 19



lim x 2 + 13 − 4 = 19



x→4

Since f ( 4 ) = lim x 2 + 13 − 4 = 19, then f(x) is continuous at x = 4. x→4

Example 2.5:  Determine If f(x) = ln(x − 2) Is Continuous at x = 2 Since f(2) = ln(0), that is not defined. Therefore, f(x) is not continuous at x = 2. Recall from pre-calculus that the ln(x) is only defined for x > 0.  25 x + 5 Example 2.6:  Consider the Cost Function c ( x ) =  15 x + 3 Is c(x) continuous at x = 100? We find C(100) = 2,505.

0 ≤ x ≤ 100 x > 100

lim c ( x ) = 2,505



x →100−



x →100+

lim c ( x ) = 1,503

Since the limits from the left and right are not equal, the LDNE as x → 100. The function is not continuous at x = 100.

2.3 Differentiation The derivative of a function, f(x), is defined as the limit quotient



f ( a + x ) − f (a ) df = f ´( x ) lim x →0 dx x

If the limit does not exist, then the function has no derivative at x = a. The geometric interpretation of f(a) is that it represents the slope of the tangent line to f(x) at the point, x = a. The derivative is also the instantaneous rate of change. Recall from pre-calculus that the concept of average rate of change between two points a and b is

Average rate of change =

f (b) − f (a ) b−a

Review of Differential Calculus  ◾  41

As we allow the difference b − a to approach 0, the average rate of change becomes the instantaneous rate of change: let b − a = Δx, we have the definition of the derivative lim



∆x →0

f ( a + ∆x ) − f (a ) ∆x

Example 2.7:  Consider the Function, f(x) = (0.5 − x)2 + 4

a. Determine the average rate of change from x = 2 to x = 8. f (8) = 60.25, f ( 2 ) = 6.25 The average rate of change is 9. b. Determine the instantaneous rate of change at x = 4. f ′ ( 4 ) = 1 − 2 ( 4 ) = −7. c. Determine what this instantaneous rate of change means.

In Figure 2.4, we see the plot of the function and its tangent line at x = 4. The slope of the tangent line is negative (recall that it was just found that the slope was −7). We provide some basic rules in Table 2.4 for finding the derivatives of a function.

Figure 2.4  Plot of the function and its tangent line at x = 4.

42  ◾  Nonlinear Optimization Table 2.4  Basic Differentiation

d (a ) = 0 dx

d d 1 du [ln u ] = log e u = dx u dx dx

d (x) = 1 dx

d 1 du  log a u  = log a e u dx dx 

d du ( au ) = a dx dx

d u du e = eu dx dx

d du dv dw (u + v − w ) = + − dx dx dx dx

d n du a = a n ln a dx dx

d dv du (uv ) = u + v dx dx dx

d v du dv u = vu v −1 + lnuu v dx dx dx

d  u  1 du u dv −  = dx  v  v dx v 2 dx

d du sin u = cos u   dx dx

d m du u = nu n−1 dx dx

( )

d du cos u = − sin u dx dx

( u ) = 2 1u du dx

d du tan u = sec 2 u dx dx

d dx

d  1 1 du   = − 2 dx u u dx

d du cot u = − csc 2 u dx dx

d  1 n du  n  = − n+1 dx u u dx

d du sec u = sec u tan u   dx dx

d d du f ( u )) =  f ( u ) ( dx du dx

d du csc u = − csc u cot u   dx dx

Review of Differential Calculus  ◾  43

2.3.1 Increasing and Decreasing Functions If f ′(x) > 0, then f(x) is increasing, and if f ′(x)  1, then f ′(p)  cf ( x ′ ) + (1 − c ) f ( x ′ ) hold for all 0 < c < 1.

To gain some additional insights, let’s view these geometrically. Let f(x) be a function of a single variable. In Figure 2.5 and the definitions above, we find that f(x) is convex if and only if for any line segment, the line segment is always above the curve. In Figure 2.6 and the definitions above, we find that f(x) is concave if and only if for any line segment, the line segment is always below the curve.

Figure 2.5  Example of a convex function.

Review of Differential Calculus  ◾  45

Figure 2.6  Example of a concave function. Example 2.9: Determine the Convexity of f(x) = x 2

We see the function is convex.

46  ◾  Nonlinear Optimization

Example 2.10:  Determine the Convexity of e x

Example 2.11:  Determine the Convexity of ln(x)

Review of Differential Calculus  ◾  47

Example 2.12:  Determine the Convexity of x 3

This function is both convex and concave. It is convex for x > 0 and concave for x  0 for all x in S. 2. Suppose f ″(x) exists for all x in a convex set S. Then, f(x) is concave on S if and only if f ″(x) 0 and 0) or concave region (x < 0).

Exercises Problems 1 − 5: Find each limit (if it exits):

(

)

1. lim x 3 + 2 x − 21 x→4

x2 + x x→∞ x t2 −1 3. lim t→1 t − 1 tan( y) 4. lim y→0 y

2. lim

x x→0 x 6. Given some function f(x), state and give an example (i.e., a graph) of three fundamental types of discontinuities. 5. lim

( ) 8. Differentiate y = ( sin x ) 7. Differentiate y = sin x 2 2

( )

9. Find all first- and second-order partial derivatives for f ( x, y ) = exp xy 2 .

10. Show that the following functions are convex using the definition of convexity. Verify your result by using the second derivative test. a. 3x + 4 b. −x 1/2, for x ≥ 0 11. Characterize the function x 3 − 9 x 2 + 24x − 10 in terms of any convexity and/ or concavity properties over R, by any method. 12. Show e x − x − 1 is convex using the definition of convexity. Verify your result using the second derivative test.

Review of Differential Calculus

◾ 49

References and Suggested Reading Stewart, J. Calculus, 8th ed. Cengage Publishers, Boston, MA, 2016. Winston, W. Introduction to Mathematical Programming: Applications and Algorithms, 2nd ed. Duxbury Press, Belmont, CA, 1995.

Chapter 3

Single-Variable Unconstrained Optimization 3.1 Introduction Consider an oil-drilling rig that is 6.5 miles offshore. The drilling rig is to be connected by an underwater pipe to a pumping station. The pumping station is connected by land-based pipe to a refinery, which is 15.7 miles down the shoreline from the drilling rig (see Figure 3.1). The underwater pipe costs $32,575 per mile, and the land-based pipe costs $14,348 per mile. You are to determine where to place the pumping station to minimize cost of the pipe as shown in the figure. Oil Rig

6.5 miles

Shoreline

Shoreline Pumping Station

Figure 3.1

Refinery

Oil pumping station location example.

51

52  ◾  Nonlinear Optimization

In this chapter, we will discuss models that require single-variable calculus to solve. We will review the calculus concepts for optimization and then apply them to application. We will use Maple to assist us.

3.2 Single-Variable Optimization and Basic Theory We want to solve problems of the form: max ( or min ) f ( x ) x ∈( a , b )



If a = −∞ and b = ∞ then we are looking at R 2 —the xy plane. If either a, b, or both a and b are restricted then we must consider possible endpoints in our solution. We will examine three cases. Case 1

Points where a 0, so we found the minimum. Thus, if the pumping station is located at 15.7 − 1.2508 = 14.449 miles from the refinery we will minimize the total cost at a cost of $299,823.82.

Exercises (question 1-6 modified from Giordano et al, 2014) 1. Consider an industrial situation where it is necessary to set up an assembly line. Suppose that each time the line is set-up a cost c in incurred. Assume c is in addition to the cost of producing any item and is independent of the amount produced. Suggest submodels for the production rate. Now assume a constant production rate k and a constant demand rate r. What assumptions are implied by the model in the figure shown below? Assume a storage cost of s (in dollars per unit per day) and compute the optimal length of the production run P* in order to minimize the costs. List all of your assumptions. How sensitive is the average daily cost to the optimal length of the production run?

66



Nonlinear Optimization

Q

Slope is k-r Slope is -r

T | |

P | t |

Time in days

2. Consider a company that allows backordering. That is, the company notifies customers that a temporary stock-out exists and that their order will be filled shortly. What conditions might argue for such a policy? What effect does such a policy have on storage costs? Should costs be assigned to stockouts? Why? How would we make such an assignment? What assumptions are implied by the model in the figure shown below? Suppose a “loss of goodwill cost” of w dollars per unit per day is assigned to each stock-out. Compute the optimal order quantity Q* and interpret your model. Compare to a economic order quantity model (EOQ).

Q

T

Single-Variable Unconstrained Optimization



67

3. In the inventory model discussed in the text, we assumed a constant delivery cost that is independent of the amount delivered. Actually, in many cases, the cost varies in discrete amounts depending on the size of the truck needed, the number of platform cars required, and so forth. How would you modify the model in the text to take into account these changes? We also assumed a constant cost for raw materials. However, oftentimes bulk-order discounts are given. How would you incorporate these discounts into the model? 4. What is the optimal speed and safe following distance that allows the maximum flow rate (cars per unit time)? The solution to the problem would be of use in controlling traffic in tunnels, for roads under repair, or for other congested areas. In the following schematic, l is the length of a typical care and d is the distance between cars.

|

l

|

d

|

Velocity . Distance Let’s assume a car length of 16.3 ft. The safe stopping distance d = 1.1v + 0.054v2 was determined, where d is measured in feet and v in miles per hour. Find the velocity in miles per hour and the corresponding following distance d that maximizes traffic flow. How sensitive is the solution to changes in v? Can you suggest a practical rule? How would you enforce it? 5. Consider an athlete competing in the shot put. What factors influence the length of his or her throw? Construct a model that predicts the distance thrown as a function of the initial velocity and angle of release. What is the optimal angle of release? If the athlete cannot maximize the initial velocity at the angle of release you purpose, should he or she be more concerned with satisfying the angle of release or generating a high initial velocity? What are the trade-offs? 6. The store transportation manager is responsible for periodically buying new trucks to replace older trucks in his company’s fleet of vehicles. He is expected to determine the time a truck should be retained so as to minimize the average cost of owning the truck. Assume the purchase price of a new truck is $20000 with a trade-in. Also assume the maintenance cost (in dollars) per truck for t years can be expressed analytically by the following empirical model: C(t) = 640+180 (t-10)t, where t is the time in years the company owns the truck. Justify that the flow rate is given by f =

68



Nonlinear Optimization

Projects You are a volunteer working with the Peace Corps in a country in Central America. Your group has just finished building a flight strip that will be used to bring supplies to a village. There is another village 100 km away that is inaccessible by vehicle from the flight strip (see the diagram below).

Village A River Village B & Airfield 30 km 30 km Bridge Old Road Old Road 50 km

River

50 km

Your group has been asked to build a road connecting the two villages. YOU MUST CROSS THE EXISTING BRIDGE. Currently, there is an unimproved dirt road 30 km south of both villages that connects with the bridge. Assume the Old Road is straight and in an East-West direction. You want to build the most cost effective road connecting the two villages. After surveying the Old Road and the terrain between the two villages, you estimate that the cost of materials and equipment to improve the Old Road is $120,000 per kilometer, whereas to construct a new road will cost $220,000 per kilometer. 1. Draw a diagram that includes definitions of any variables you choose to use. 2. Analysis of the problem should include at least the answers to the following questions (all of this can be neatly handwritten): – What is the cost function for the roads? – What are the lengths of the newly constructed road and the refurbished Old Road that will minimize the cost of the road complex that connects the two villages? – What is the minimum cost of the road complex? – If the intersection (new road to Old Road) closest to Village B cannot be positioned at the site that provides the minimum cost, would it be better to move the intersection toward Village B or toward Village A along the Old Road? And why?

Single-Variable Unconstrained Optimization



69

References and Suggested Reading Fox, W. Mathematical Modeling with Maple. Cengage Publishers, Boston, MA, 2013. Giordano, F., Fox, W., Horton, S. A First Course in Mathematical Modeling, 5th ed. Cengage Publishers, Boston, MA, 2014. Stewart, J. Calculus, 8th ed. Cengage Publishers, Boston, MA, 2016. Winston, W. Introduction to Mathematical Programming: Applications and Algorithms, 2nd ed. Duxbury Press, Belmont, CA, 1995.

Chapter 4

Numerical Search Techniques in SingleVariable Optimization 4.1 Single-Variable Techniques The basic approach of most numerical methods in optimization is to produce a sequence of improved approximations to the optimal solution according to a specific scheme. We will examine both elimination methods (Golden section and Fibonacci) and interpolation methods (Newton’s). In numerical methods of optimization, a procedure is used in obtaining values of the objective function at various combinations of the decision variables, and conclusions are then drawn regarding the optimal solution. The elimination methods can be used to find an optimal solution for even discontinuous functions. An important relationship (assumption) must be made to use these elimination methods. The function must be unimodal. A unimodal function is one that has only one peak (maximum) or one valley (minimum). This can be stated mathematically as follows:

A function f(x) is unimodal if (1) x 2 < x* implies that f(x 2) < f(x1), and (2) x1 > x* implies that f(x1) < f(x 2), where x* is a minimum and x1 < x 2. A function f(x) is unimodal if (1) x 2 > x* implies that f(x 2) > f(x1), and (2) x1 < x* implies that f(x1) > f(x 2), where x* is a maximum and x1 < x 2.

71

72



Nonlinear Optimization

Figure 4.1

Graphical examples of unimodal function.

Some examples of unimodal functions are shown in Figure 4.1a−c. Thus, as seen, unimodal functions may or may not be differentiable. Thus, a function can be a non-differentiable (corners) or even a discontinuous function. If a function is known to be unimodal in a given interval, then the optimum (maximum or minimum) can be found as a smaller interval. In this section, we will learn many techniques for numerical searches. For the elimination methods, we accept an interval answer. If a single value is required, then we usually evaluate the function at each endpoint of the final interval and the midpoint of the final interval, and take the optimum of those three values to approximate our single value. Table 4.1 provides a preview listing of the numerical methods that we cover in this chapter. It is not a complete list of all numerical methods available.

Numerical Search Techniques



73

Table 4.1 Some Numerical Methods Elimination Methods

Interpolation Methods

Unrestricted Search

Modified Newton’s Method

Exhaustive Search Dichotomous Search Golden Section Method Fibonacci’s Search

4.1.1 Unrestricted Search The unrestricted search method is used when the optimum needs to be found, but we have no known interval of uncertainty. This will involve a search with a fixed step size. This method is not very computationally effective. 1. 2. 3. 4. 5.

6. 7. 8. 9.

Start with a guess, say x1. Find f 1 = f(x1). Assume a step size, S, find x 2 = x1 + S. Find f 2 = f(x 2). For a minimization problem: If f 2 < f1, then the solution interval cannot possibly lie in x < x1. So, we find points x 3, x4, …, xn; this is continued until an increase of the function is found. The search is terminated at xi and xi−1. If f 2 > f 1, then the search goes in the reverse direction. If f 1 = f2, then either x1 or x 2 is optimum. If the search proceeds too slowly, an acceleration step size, a constant c times S, c*S, can be used. Example 4.1: Find the Minimum of the Function −x/2

x≤2

x−3

x>2

f(x) =

Using an unrestricted search method with an initial guess as 1.0, and a step size = 0.4. x1 = 1 and f ( x1 ) = f 1 = −0.5 x 2 = x1 + S = 1+ 0.4 = 1.4, f (1.4 ) = f 2 = −0.7

74 ◾

Nonlinear Optimization

f 1 > f 2 , so x 3 = 1.8, f 3 = −0.9, f 2 > f 3 x 4 = 2.2, f 4 = −0.8, f 4 > f 3 . Stop Thus, the optimum (minimum) must lie between 1.8 and 2.2. If a single value is required, we evaluate: f (1.8) = −0.9 f ( 2.2 ) = −0.8 f

((1.8 + 2.2) / 2) = f ( 2) = −1.

Since f(2) is the smallest value of f(x), we will use x = 2 as our approximation to the minimum yielding a value of f(2) = –1.

4.1.2 Exhaustive Search This method can be used to solve problems where the interval is finite. It might also be used to find the interval where the function is unimodal. In general, let’s define an interval from a to b. Next, we divide the interval into n segments, (b − a)/n. Let the step size from a to b be s = (b − a)/n. Create a table of values for x from a to b with the step size, s. Compute f(x) for all values and look for unimodality as well as the interval with the largest f(x) assuming we are solving a maximization problem. Assume we are maximizing f(x) = x(1.5 − x) over the interval from [0,1] with a step size of 0.10. In Table 4.2, we see the values of x and f(x). Since x 7 = x 8, the assumption of unimodality gives the final interval of uncertainty as [0.7, 0.8]. If we take the midpoint, x = 0.75, we find it is the better approximation to the true optimum. In fact, it is the true optimum.

4.1.3 Dichotomous Search The following is the Dichotomous Algorithm: (1) Initialize a distinguishable constant 2ε > 0 (ε is a very small number, like 0.01). Table 4.2

Exhaustive Search of f(x) = x(1.5−x)

x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f(x)

0

0.14

0.26

0.36

0.44

0.5

0.54

0.56

0.56

0.54

0.5

Numerical Search Techniques

◾ 75

(2) Select a length of uncertainty for the final interval, t > 0 t is also small). (3) Calculate the number of iterations required, N, using: 0.5n = t / (b − a ) (4) Let k = 1. Main Steps 1. If (b − a) < t, then stop because (b − a) in the final interval. If (b − a) > t, then let x1 =

(a + b ) (a + b ) − e , x2 = +e 2 2

2. Perform comparisons of function values at these points Minimization Problem If f(x1) < f(x2)

If f(x1) > f(x2)

a=a

a = x1

b = x2

b=b

k=k+1

k=k+1

Return to Main Step 1

Return to Main Step 1

Example 4.2: Minimize f(x) = x 2+2x over the Interval [−3,6] Using Dichotomous Search Let t = 0.2 and e = 0.01. Find n using 0.5n = t/(b − a) 0.5n = 0.2 / ( 6 − ( −3)) = 0.2 / 9 0.5n = (.2 / 9 ) n ln ( 0.5) = ln ( 0.2 / 9 ) n = 5.49 or 6 ( rounding up )

76  ◾  Nonlinear Optimization

a

b

x1

x2

f(x1)

f(x2)

k

−3

6

1.49

1.51

5.2001

5.3001

1

−3

1.51

−0.755

−0.735

−0.9399

−0.9297

2

−3

−0.735

−1.8775

−1.8575

−0.2299

−0.2646

3

−1.8775

−0.735

−1.3162

−1.2962

−0.8999

−0.9122

4

−1.31625

−0.735

−1.0356

−1.0156

−0.9987

−0.9997

5

−1.03563

−0.735

−0.8953

−0.8753

−0.9890

−0.9844

6

−1.03563

−0.875

The final interval is [−1.0353, −0.875]. The length of this interval is 0.1603, which is less than our tolerance of t = 0.2. The value of n refers to the number of x pairs observed.

4.1.4 Golden Section Search Golden section search is a search procedure that utilizes the golden ratio. To better understand the golden ratio, consider a line segment over the interval that is divided into two separate regions as shown in Figure 4.2. These segments are divided into the golden ratio if the length of the whole line is to the length of the larger part as the length of the larger part is to the length of the smaller part of the line. Symbolically, this can be written as 1 r = r 1− r



Algebraic manipulation of the golden ratio relationship yields, r 2 + r − 1 = 0. Solving this function for its roots (using the quadratic formula) gives us two real solutions:

5 −1 − 5 −1 , r2 = 2 2

r1 =

Only the positive root, r1, satisfies the requirement of residing on the given line segment. The numerical value of r1 is 0.618. This value is known as the golden ratio. | | 0

|

| | | r 1

Figure 4.2  Line segment illustration.

Numerical Search Techniques

◾ 77

This ratio is, among its properties, the limiting value for the ratio of the consecutive Fibonacci sequences, which we will see in the next method. It is noted here that there is also a Fibonacci search method that could be used in lieu of the Golden section method. In order to use the Golden section search procedure, we must ensure that certain assumptions hold. These key assumptions are as follows:

(1) the function must be unimodal over a specified interval, (2) the function must have an optimal solution over a known interval of uncertainty, and (3) we must accept an interval solution since the exact optimal cannot be found by this method.

Only an interval solution, known as the final interval of uncertainty, can be found using this technique. The length of this final interval is controllable by the user and can be made arbitrarily small by the selection of a tolerance value. The final interval is guaranteed to be less than this tolerance level. Linear search procedures use an initial interval of uncertainty to iterate to the final interval of uncertainty. The procedure is based, as shown earlier, on solving the unique positive root of the quadratic equation, r 2 + r = 1. The positive root from the quadratic formula with a = 1, b = 1, and c = −1 is as follows: r=

−b ± b 2 − 4ac = 0.618 2a

4.1.5 Finding the Maximum of a Function over an Interval with Golden Section This search procedure to find a maximum is iterative, requiring evaluations of f(x) at experimental points x1 and x 2, where x1 = b − r(b − a) and x 2 = a + r(b − a). These experimental points will lie between the original interval [a, b]. These experimental points are used to help determine the new interval of search. If f(x 1) < f(x 2), then the new interval is [x 1, b], and if f(x 1) > f(x 2), then the new interval is [a, x 2]. The iterations continue in this manner until the final interval length is less than our imposed tolerance. Our final interval contains the optimum solution. It is the size of this final interval that determines our accuracy in finding the approximate optimum solution. The number of iterations required to

78



Nonlinear Optimization

achieve this accepted interval length can be found as the smallest integer greater than k, where k equals [1, 4]: Tolerance (b − a ) . ln(0.618)

ln k=

Often we are required to provide a point solution instead of the interval solution. When this occurs, the method of selecting a point is to evaluate the function, f(x), at the endpoints of the final interval and at the midpoint of this final interval. For maximization problems, we select the value of x that yields the largest f(x) solution. For minimization problems, we select the value of x that yields the smallest f(x) solution. The Golden section algorithm used is shown in Figure 4.3. Although the Golden section method can be used with any unimodal function to find the maximum (or minimum) over a specified interval, its main advantage comes when normal calculus procedures fail. We begin with an easy problem. To find a maximum solution togiven a function, f(x), on the interval [a, b] where the function, f(x), is unimodal. INPUT: endpoints a, b; tolerance, t OUTPUT: final interval [a i, bi], f(midpoint) Step 1. Initialize the tolerance, t >0 . Step 2. Set r=0.618 and define the test points x1 = a + (1-r)(b-a) x2 = a + r(b-a) Step 3. Calculate f(x1) and f(x2) Step 4. Compare f(x1) and f(x2) a. If f(x1)< f(x2), then the new interval is [x1, b]: a becomes the previous x1 b does not change x1 becomes the previous x2 Find the new x2 using the formula in Step 2. b. If f(x1)>f(x2), then the new interval is [a, x 2]: a does not change b becomes the previous x2 x2 becomes the previous x1 Find the new x1 using the formula in Step 2. Step 5. If the length of the new interval from Step 4 is less than the tolerance specified, then stop. Otherwise go back to Step 3. Step 6. Estimate x* as the midpoint of the final interval and compute, f(x*), the estimated maximum of the function. STOP

Figure 4.3

Golden section algorithm.

Numerical Search Techniques  ◾  79 Table. 4.3  Golden Section Results Iteration

a

B

x1

x2

f(x1)

f(x2)

abs(b−a)

0

0

3

1.146

1.854

13.2707

13.9787

3

1

1.146

3

1.85423

2.29177

13.9788

13.9149

1.854

2

1.146

2.29177

1.58368

1.85409

13.8267

13.9787

1.14577

3

1.58368

2.29177

1.85417

2.02128

13.9787

13.9995

0.70809

4

1.85417

2.29177

2.02134

2.12461

13.9995

13.9845

0.4376

5

1.85417

2.12461

1.95748

2.0213

13.9982

13.9995

0.27044

6

1.95748

2.12461

2.02132

2.06077

13.9995

13.9963

0.16713

7

1.95748

2.06077

1.99694

2.02131

14

13.9995

0.10329

8

1.95748

2.02131

1.98186

1.99693

13.9997

14

0.06383

Maximize f(x) = −x 2 + 4x + 10 over the interval 0  0.1, so we continue finding new values for x1 and x 2. The continuation of these iterations until our interval is less than 0.1 is provided in Table 4.3 of results. We continue via technology to obtain the final solution interval.

4.1.6 Golden Section Search with Technology 4.1.6.1 Excel Golden Search We developed a simple macro in Excel. We enter the interval, the tolerance, and the function.

80



Nonlinear Optimization

We execute the macro with our example.

The final interval is [1.95748052, 2.021311]. The midpoint is 1.98939 with a functional value of 13.99988.

4.1.6.2 Maple Golden Search We created a Proc program in Maple to perform the algorithm listed in Figure 4.3. >f:=x->-(x^2) + 4 * x +10; f := x -> -x + 4 x + 10 >GOLD(f, 0, 3, .1);

The interval [a, b] is [0.00, 3.00], and user-specified tolerance level is 0.10000. The first two experimental endpoints are x1 = 1.146 and x 2 = 1.854.

Numerical Search Techniques  ◾  81

Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

1.8540

2.2918

13.2707

13.9787

[1.1460, 3.0000]

3

1.5837

1.8540

13.9787

13.9149

[1.1460, 2.2918]

4

1.8540

2.0213

13.8267

13.9787

[1.5837, 2.2918]

5

2.0213

2.1245

13.9787

13.9995

[1.8540, 2.2918]

6

1.9573

2.0213

13.9995

13.9845

[1.8540, 2.1245]

7

2.0213

2.0607

13.9982

13.9995

[1.9573, 2.1245]

8

1.9968

2.0213

13.9995

13.9963

[1.9573, 2.0607]

9

1.9818

1.9968

14.0000

13.9995

[1.9573, 2.0213]

The midpoint of the final interval is 1.989315 and f(midpoint) = 14.000. The maximum of the function is 13.998 and the x value = 1.957347 PROC Program for Maple >restart; >GOLD:=proc(f::procedure,a::numeric,b::numeric,T::numeric) >local x1,x2; >x1:=a+0.382*(b-a); >x2:=a+0.618*(b-a); >printf("The interval [a,b] is [% 4.2f,% 4.2f]and user specified tolerance level is% 6.5f.\n",a,b,T); >### WARNING: %x or %X format should be %y or %Y if used with floating point arguments >printf("The first 2 experimental endpoints are x1= % 6.3f and x2 = % 6.3f. \n",x1,x2); >printf(" \n"); >printf(" \n"); >N:=ceil((ln(T/(b-a))/ln(0.618))); >printf(" Iteration x(1) x(2) f(x1) f(x2) Interval \n"); >iterate(f,a,b,N,x1,x2); >val:=f(mdpt); >printf(" \n"); >printf(" \n"); >printf("The midpoint of the final interval is% 9.6f and f(midpoint) = % 7.3f. \n",mdpt, val); >printf(" \n"); >printf(" \n"); >### WARNING: %x or %X format should be %y or %Y if used with floating point arguments >printf("The maximum of the function is % 7.3f and the x value = % 9.6f \n",fkeep,xkeep);

82 ◾

Nonlinear Optimization

>printf(" \n"); >printf(" \n"); >end: >iterate:=proc(f::procedure,a::numeric,b::numeric, N::posint,x 1::numeric,x2::numeric) >local x1n,x2n,an,bn,i,fx1,fx2,j,f1,f2,fmid; >global mdpt,fkeep,xkeep; >i:=1; >x1n(1):=x1; >x2n(1):=x2; >an(1):=a; >bn(1):=b; >i:=1; >for j from 1 to N do >fx1(i):=f(x1n(i)); >fx2(i):=f(x2n(i)); >if fx1(i)an(i+1):=x1n(i); >bn(i+1):=bn(i); >x1n(i+1):=x2n(i); >x2n(i+1):=an(i+1)+.618∗(bn(i+1)-an(i+1)); >else >an(i+1):=an(i); >bn(i+1):=x2n(i); >x2n(i+1):=x1n(i); >x1n(i+1):=an(i+1)+.382∗(bn(i+1)-an(i+1)); >fi; >i:=i+1; >printf(" % 3.0f % 11.4f % 10.4f % 10.4f %10.4f [% 6.4f, % 6.4f]\n",i,x1n(i),x2n(i),fx1(i-1),fx2(i-1),an(i),bn(i)); >mdpt := (an(i) + bn(i))/2; >if (i=N) then >if (f(an(i)) > f(bn(i)) or f(an(i)) > f(mdpt)) then fkeep := f(an(i)); xkeep := an(i); > else > if (f(bn(i)) > f(mdpt)) then >fkeep := f(bn(i)); xkeep := bn(i); >else >fkeep := f(mdpt); xkeep := mdpt; >fi; >fi; >fi; >od; >end:

4.1.6.3 MATLAB Golden Search We script a program for MATLAB for the algorithm shown in Figure 4.3.

Numerical Search Techniques

◾ 83

Script Code f = inline('-(x^2)+4*x+10’); %f = inline('(x=.5).*(x-.5).^2') x = linspace(.5,1, 101); plot(x, f(x)) N = 20; a0 = 0.0 ; b0 = 3.0; %input a and b r = (sqrt(5)-1)/2; alist = zeros(N,1); blist = zeros(N,1); a = a0; b = b0; s = a + (1-r)*(b-a); t = a + r*(b-a); f1 = f(s); f2 = f(t); for n = 1:N if f1 < f2 b = t; t = s; s = a+(1-r)*(b-a); f2 = f1; f1 = f(s); else a = s; s = t; t = a+r*(b-a); f1 = f2; f2 = f(t); end alist(n) = a; blist(n) = b; end disp(' a b f(a) f(b) b-a disp(' ') alist = [a0;alist]; blist = [b0; blist]; [alist, blist, f(alist), f(blist), blist-alist]

')

Highlight script code and run selection. Source: Modified from (www.math.umd.edu/*jcooper/working/workingmfiles/codes.html) Example Maximize -x^2+4*x+10

84



Nonlinear Optimization

Output MATLAB out = 0 0 1.1459 1.1459 1.5836 1.8541 1.8541 1.9574 1.9574 1.9574 1.9818 1.9818

3.0000 3.0000 3.0000 2.2918 2.2918 2.2918 2.1246 2.1246 2.0608 2.0213 2.0213 2.0062

3.0000 3.0000 1.8541 1.1459 0.7082 0.4377 0.2705 0.1672 0.1033 0.0639 0.0395 0.0244

>> midpoint=(1.9818+2.0062)/2 midpoint = 1.9940 >> -(midpoint^2)+4*midpoint+10 ans = 14.0000 >>

Another Matlab script yields the optimal x and f(x). %------------------------GOLDEN SECTION METHOD---------%--------------------------------------------------------------------% modified from mailto: [email protected] %------------------------------------------------------------------------f=inline('x^2-4*x-10'); figure; hold on; a=0; % start of interval b=3; % end of interval epsilon=0.000001; % accuracy value iter= 50; % maximum number of iterations tau=double((sqrt(5)-1)/2); % golden proportion coefficient, around 0.618 k=0; % number of iterations x1=a+(1-tau)*(b-a); % computing x values x2=a+tau*(b-a); f_x1=f(x1); % computing values in x points

Numerical Search Techniques

◾ 85

f_x2=f(x2); plot(x1,f_x1,'rx') % plotting x plot(x2,f_x2,'rx') while ((abs(b-a)>epsilon) && (k 1 - exp(-x) + -----

Numerical Search Techniques

Iteration



87

x(1)

x(2)

f(x1)

f(x2)

Interval

2

4.7215

7.6400

1.1153

1.0748

[0.0000, 12.3600]

3

2.9185

4.7215

1.1659

1.1153

[0.0000, 7.6400]

4

1.8036

2.9185

1.2012

1.1659

[0.0000, 4.7215]

5

2.9185

3.6069

1.1920

1.2012

[1.8036, 4.7215]

6

2.4925

2.9185

1.2012

1.1899

[1.8036, 3.6069]

7

2.2295

2.4925

1.2036

1.2012

[1.8036, 2.9185]

8

2.4925

2.6553

1.2021

1.2036

[2.2295, 2.9185]

9

2.3921

2.4925

1.2036

1.2033

[2.2295, 2.6553]

10

2.4925

2.5548

1.2034

1.2036

[2.3921, 2.6553]

11

2.4543

2.4925

1.2036

1.2036

[2.3921, 2.5548]

12

2.4925

2.5164

1.2036

1.2036

[2.4543, 2.5548]

13

2.5164

2.5310

1.2036

1.2036

[2.4925, 2.5548]

14

2.5072

2.5164

1.2036

1.2036

[2.4925, 2.5310]

15

2.5164

2.5219

1.2036

1.2036

[2.5072, 2.5310]

16

2.5128

2.5164

1.2036

1.2036

[2.5072, 2.5219]

17

2.5107

2.5128

1.2036

1.2036

[2.5072, 2.5164]

18

2.5128

2.5142

1.2036

1.2036

[2.5107, 2.5164]

19

2.5120

2.5128

1.2036

1.2036

[2.5107, 2.5142]

20

2.5128

2.5134

1.2036

1.2036

[2.5120, 2.5142]

21

2.5125

2.5128

1.2036

1.2036

[2.5120, 2.5134]

22

2.5128

2.5131

1.2036

1.2036

[2.5125, 2.5134]

x+1 > GOLD(f,0,20,.001);

The interval [a, b] is [0.00, 20.00], and user-specified tolerance level is 0.00100. The first two experimental endpoints are x1 = 7.640 and x 2 = 12.360. The midpoint of the final interval is 2.512961 and f(midpoint) = 1.204. The maximum of the function is 1.204 and the x value = 2.512705. Again, assuming that we desire a specific numerical value as the solution, our solution is x = 2.512705 with f(2.512705) = 1.204.

88



Nonlinear Optimization

4.1.8 Fibonacci’s Search Fibonacci’s search is a search procedure that utilizes the ratio of Fibonacci’s numbers to set up experimental points in a sequence. The Fibonacci numbers are a sequence that follows the rules: Fo = 1 F1 = 1 Fi = Fi−1 + Fi−2 This generates the sequence {1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …}.

(1) the function must be unimodal over a specified interval, (2) the function must have an optimal solution over a known interval of uncertainty, and (3) we must accept an interval solution since the exact optimal cannot be found by this method.

The limiting value for the ratio of the consecutive Fibonacci sequences is the golden ratio 0.618. It is noted here that the Golden section search method could be used in lieu of the Fibonacci method. However, the Fibonacci search converges faster than the Golden section method. In order to use the Fibonacci search procedure, we must ensure that certain assumptions hold. These key assumptions are as follows: Only an interval solution, known as the final interval of uncertainty, can be found using this technique. The length of this final interval is controllable by the user and can be made arbitrarily small by the selection of a tolerance value. The final interval is guaranteed to be less than this tolerance level. Line search procedures use an initial interval of uncertainty to iterate to the final interval of uncertainty.

4.1.8.1 Finding the Maximum of a Function over an Interval with the Fibonacci Method This search procedure to find a maximum is iterative, requiring evaluations of f(x) at experimental points x1 and x 2, where x1 = a + (Fn−2/Fn)(b − a) and x 2 = a + (Fn−1/ Fn) (b − a). These experimental points will lie between the original interval [a, b]. These experimental points are used to help determine the new interval of search. If f(x1) < f(x 2), then the new interval is [x1, b], and if f(x1) > f(x 2), then the new interval is [a, x 2]. The iterations continue in this manner until the final interval length is less

Numerical Search Techniques



89

than our imposed tolerance. Our final interval contains the optimum solution. It is the size of this final interval that determines our accuracy in finding the approximate optimum solution. The number of iterations required to achieve this accepted interval length can be found as the smallest Fibonacci number from the sequence that satisfies the inequality: Fk >

(b − a ) . Tolerance

Often we are required to provide a point solution instead of the interval solution. When this occurs, the method of selecting a point is to evaluate the function, f(x), at the endpoints of the final interval and at the midpoint of this final interval. For maximization problems, we select the value of x that yields the largest f(x) solution. For minimization problems, we select the value of x that yields the smallest f(x) solution. The algorithm used is shown in Figure 4.4.

To find a maximum solution togiven a function, f(x), on the interval [a, b] where the function, f(x), is unimodal. INPUT: endpoints a, b; tolerance, t, Fibonacci sequence OUTPUT: final interval [a i, bi], f(midpoint) Step 1. Initialize the tolerance, t >0 . Step 2. Set Fn> (b-a)/t as the smallest Fn and define the test points x1 = a + (F n-2/F n) (b-a) x2 = a + (F n-1/F n) (b-a) Step 3. Calculate f(x1) and f(x2) Step 4. Compare f(x1) and f(x2) a. If f(x1)< f(x2), then the new interval is [x1, b]: a becomes the previous x1 b does not change x1 becomes the previous x2 n = n-1 Find the new x2 using the formula in Step 2. b. If f(x1)>f(x2), then the new interval is [a, x 2]: a does not change b becomes the previous x2 x2 becomes the previous x1 n = n-1 Find the new x1 using the formula in Step 2. Step 5. If the length of the new interval from Step 4 is less than the tolerance specified, the stop. Otherwise go back to Step 3. Step 6. Estimate x* as the midpoint of the final interval and compute, f(x*), the estimated maximum of the function. STOP

Figure 4.4 Fibonacci’s algorithm.

90



Nonlinear Optimization

Although the Fibonacci method can be used with any unimodal function to find the maximum (or minimum) over a specified interval, its main advantage comes when normal calculus procedures fail. Consider the following example: Maximize f ( x ) = − 2 − x − 5 − 4x − | 8 − 9x | over the interval 0 < x < 3. In calculus, absolute values are not differentiable because they have corner points. Thus, taking the first derivative and setting it equal to zero is not an option. Another method needs to be used to find the solution. We used the Fibonacci method to solve this problem and other examples. Example 4.5 Maximize f(x) = −|2 − x| − |5 − 4x| − |8 − 9x| over the interval 0 < x < 3 The interval [a, b] is [ 0.00, 3.00], and user specified tolerance level is 0.10000. The first two experimental endpoints are x1= 1.147 and x2 = 1.853. Iteration 1 2 3 4 5 6

x(1) 1.1471 0.7059 1.1471 0.9706 0.8824 0.7941

x(2) 1.8529 1.1471 1.4118 1.1471 0.9706 0.8824

f(x1) f(x2) -3.5882 -11.2353 -5.1176 -3.5882 -3.5882 -5.9412 -2.8824 -3.5882 -2.6471 -2.8824 -3.8824 -2.6471

[ [ [ [ [ [

Interval 0.0000, 1.8529] 0.7059, 1.8529] 0.7059, 1.4118] 0.7059, 1.1471] 0.7059, 0.9706] 0.7941, 0.9706]

The midpoint of the final interval is 0.882353 and f(midpoint) = –2.647. The maximum of the function is –2.882 and the x value = 0.970588.

In this example, we want a specific point as our solution. The midpoint yields the maximum value of f(x). Thus, we will use x =0.970588 with f(0.970588) = −2.882 as our solution. Example 4.6 Maximize the Function f(x) = 1 − exp(−x)+1/(1 + x) over the interval [0, 20] The interval [a, b] is [ 0.00, 20.00], and user-specified tolerance level is 0.10000. The first two experimental endpoints are x1= 7.639 and x2 = 12.361. Iteration 1 2

x(1)

x(2)

f(x1)

f(x2)

7.6395 4.7210

12.3605 7.6395

1.1153 1.1659

1.0748 1.1153

Interval [ 0.0000, 12.3605] [ 0.0000, 7.6395]

Numerical Search Techniques

3 4 5 6 7 8 9 10

2.9185 1.8026 2.9185 2.4893 2.2318 2.4893 2.4034 2.4893

4.7210 2.9185 3.6052 2.9185 2.4893 2.6609 2.4893 2.5751

1.2012 1.1919 1.2012 1.2036 1.2021 1.2036 1.2034 1.2036

1.1659 1.2012 1.1900 1.2012 1.2036 1.2033 1.2036 1.2036

[ [ [ [ [ [ [ [

0.0000, 1.8026, 1.8026, 1.8026, 2.2318, 2.2318, 2.4034, 2.4034,

◾ 91

4.7210] 4.7210] 3.6052] 2.9185] 2.9185] 2.6609] 2.6609] 2.5751]

The midpoint of the final interval is 2.489270 and f(midpoint) = 1.204. The maximum of the function is 1.204 and the x value = 2.403433.

Again, assuming that we desire a specific numerical value as the solution, our solution is x = 2.489270 with f(2.489270) = 1.204.

4.2 Interpolation with Derivatives: Newton’s Method 4.2.1 Finding the Critical Points (Roots) of a Function Newton’s method has been adapted to solve nonlinear optimization problems. For a function of a single variable, the adaptation is straightforward. Newton’s method is applied to the derivative of the function we wish to optimize, for the function’s critical points occur where the derivative’s roots are found. When finding the critical points of the function, Newton’s method is based on the derivative of the quadratic approximation of the function f ( x ) at the point x k : q( x ) = f ( x k ) + f ′ ( xk )( x − xk ) +

1 2

f ′′ ( xk ) ( x − xk )

2

The result, q ′( x ), is a linear approximation of f ′( x ) at the point x k . Setting q ′( x ) = 0 and solving for x yields the formula xk +1 = xk −

f ′ ( xk ) f ′′ ( xk )

where xk+1 ≡ x . Newton’s method can be terminated when | x k +1 − xk |< ε , where ε is a prespecified scalar tolerance or when |f ′(x)| < ε. In order to use Newton’s method to find the critical points of a function, the function’s first and second derivatives must exist in the neighborhood of interest. Also note that when the second derivative at xk is zero, the point xk+1 cannot be computed. It is important to first master the computations required in the algorithm. It is also noted that Newton’s method finds only the approximate critical value, but

92  ◾  Nonlinear Optimization

it does not know whether it is finding a maximum or a minimum. The sign of the second derivative may be used to determine if we have a maximum or a minimum.

4.2.2 The Basic Application Consider any simple polynomial, such as f ( x ) = 5 x − x 2, whose critical point can easily be found by calculus, taking the first derivative and setting it equal to zero. We find that the critical point x = 2.5 yields a maximum of the function. We apply k

xk

f ′( x )

f ′′( x )

xk+1

|xk−xk+1|

0

1

−3

−2

2.5

1.5

1

2.5

0

−2

2.5

0

Newton’s method to find the critical points. This requires first finding f ′( x ) and f ′′( x ) and second using a computation device to perform the iterations: f ′ ( xk ) or f ′( x ) = 5 − 2 x and f ′′( x ) = −2 Newton’s method uses xk +1 = xk − f ′′ ( xk ) (5 − 2 x ) xk +1 = xk − . (−2) Starting at x0 = 1 yields: Starting at other values also yields x = 2.5 . Since this simple quadratic function has a derivative that is a linear function, the linear approximation of the derivative will be exact regardless of the starting point, and the answer will be confirmed at the second iteration. Newton’s method produces the critical values of f ′(x) without regard to the point xk being a maximum or a minimum. We know we have found a maximum by looking at the entries in the table for f ″(x). Since f ″(x) at x = 2.5 is −2, which is less than or equal to 0, we know we found the maximum of the function. Example 4.7:  Minimize f(x) = x 2 + 2x We will start our guess with x = 4, and use a stopping criteria of ε = 0.01. f ( x ) = x 2 + 2x f ′ ( x ) = 2x + 2

f ′′ ( x ) = 2

Numerical Search Techniques  ◾  93 x 2 = x1 − f ′ ( x1 ) f ′′ ( x1 ) x 2 = 4 − (10 / 2 ) = 4 − 5 = −1



x 3 = x 2 − f ′ ( x 2 ) f ′′ ( x 2 ) x 2 = −1 − ( 0 / 2 ) = −1

Stop

f ′ ( x ) = 0 < 0.01 or x 2 − x1 = 0 < 0.01



Example 4.8:  Maximize f(x) = −2x 3 + 10x − 10 We will begin at x = 1 and use a stopping criteria of ε = 0.01

K

X

f ′(x)

f ″(x)

x(k + 1)

1

1

4

−12

1.333333

2

1.333333

−0.66667

−16

1.291667

3

1.291667

−0.01042

−15.5

1.290995

4

1.290995

−2.7E-06

−15.4919

1.290994

f ′ ( x ) = −6 x 2 + 10

f ′′ ( x ) = −12 x At x = 1, f ′ (1) = 4, f ′′ (1) = −12.



x 2 = 1 − ( −4 /12 ) = 1.33333

Neither |f ′(x)| nor |xk + 1−xk| is less than ε, so we continue. We summarize in the following table: Since f ′(x) = |−2.7E−06| g := -2*x^3+10*x-10; g := -2 x3 + 10 x - 10 >g1 := diff(g, x); 2 g1 := -6 x + 10 2

>NewtonsMethod(g1, x = 1, output = sequence); 1, 1.333333333, 1.291666667, 1.290994624, 1.290994449 >NewtonsMethod(g1, x = 1, view = [-2 .. 2, DEFAULT], output = plot);



95

96 ◾

Nonlinear Optimization

>g2 := diff(g1, x); g2 := -12 x >subs(x = 1.290994449, g2); -15.49193339

Since f ″ < 0 then 1.29099449 yield a maximum for f(x). The value of f(x) is found as, >subs(x = 1.290994449, g); -1.393370341

We obtain a plot to see visually.

4.2.6 Newton’s Method for Critical Points with MATLAB We enter the following code in script: % Modified Newton Raphson Method for critical points clear all close all clc % Change here for different functions f=@(x) -2*(x)^3+10*(x)-10 %this is the derivative of the above function df=@(x) -6*(x^2)+10 dff=@(x)-12*x

Numerical Search Techniques



97

% Change lower limit 'a' and upper limit 'b' a=1; b=5; x=a; for i=1:1:100 x1=x-(df(x)/dff(x)); x=x1; end sol=x; test2=dff(x); fprintf('Approximate Root is %10.15f',sol) fprintf(' 2nd deriv test is %10.15f \n',test2) a=1;b=5; x=a; er(5)=0; for i=1:1:5 x1=x-(df(x)/dff(x)); x=x1; er(i)=x1-sol; end plot(er) xlabel('Number of iterations') ylabel('Error') title('Error Vs. Number of iterations')

Note that we imbed the function as well as the first and second derivative in the script before we run the script. Solution Let f ′ be a function that has opposite sign values at each end of some specified interval. Then, by the Intermediate-Value Property (IVP) of continuous functions (mentioned in many basic college algebra texts), we are guaranteed to find a root between the endpoints of the given interval. Specifically, the IVP states that given two points (x1, y1) and (x 2, y2) with y1 ≠ y2 on a graph of the continuous function f, the function f takes on every value between y1 and y2. Thus, with values having opposite signs, there must be a value for which f ′(x) = 0. @(x)-6*(x^2)+10 dff = function_handle with value: @(x)-12*x

98  ◾  Nonlinear Optimization

Step 1 . Find two values a and b where f’(a) and f’(b) have opposite signs. Step 2 . Fix a tolerance for the final interval of the solution [af,bf] so that |bf -af| < tolerance. Step 3 . Find the mid-point, mi = (bi+ai)/2 Step 4 . Compute f’(ai), f’(bi), and f’(mi). Step 5 . Determine if f’(ai)*f’(mi) < 0. If true, then a i =a i bi = m i Otherwise, a i=m i bi = b i Step 6 . If f’(mi) ≠ 0 and the new |bi -ai| > tolerance then using the new interval [a i,bi] go back to Step 3 and repeat the process. Otherwise, STOP

Figure 4.6  The bisection with derivatives algorithm.

Approximate root is 1.290994448735806, and the second derivative test is –15.491933384829668

4.2.7 The Bisection Method with Derivatives To utilize bisection method with derivatives correctly there are certain properties that must hold for the given function: The algorithm is shown in Figure 4.6. Example 4.9

Minimize f ( x ) = x 2 + 2 x , − 3 < x < 6. Let ε = 0.2

The number of required observations is found by solving the formula (0.5)n = t/(b−a). We find that n = 6 observations. f ′ ( x ) = 2x + 2

k =1

1. xmp = 0.5(6−3) = 1.5, f ′(1.5) = 5, f ′(1.5) > 0 so a = −3 b = 1.5

x mp = −0.75

Numerical Search Techniques

◾ 99

2. f ′(−0.75) = 0.5 > 0 so a = −3 b = −0.75 x mp = −1.875 3. f ′(−1.875) = −1.75 < 0 so

a = −1.875 b = −75 xmp = −1.3125

4. f ′(−1.3125) = −0.625 < 0 so a = −1.3125 b = −0.75 x mp = −1.03125 5. f ′(−1.03125) = −0.0625 < 0 so a = −1.30125 b = −0.75 x mp = −0.89025 6. f ′(−0.89025) = 0.2195 >0 a = 1.030125 b = −0.89025 The final interval is less than 0.2, so we stop. Our solution is between [−1.030125, −0.89025]. Just as previously discussed, if we need a single value we can evaluate the derivative of the function, f ′, at a, at b, and at the midpoint selecting the value that is closest to f ′(x) = 0 from those points.

100



Nonlinear Optimization

f ′ ( −1.030125) = −0.0625 f ′ ( −0.89025) = 0.2195 f ′ ( −0.96075) = 0.0705 x = −1.030125 with f ′(x) = −0.0625 is best. The exact value, via single-variable calculus, for the maximization of f is the solution x = −1.

Exercises Use Golden section method, Fibonacci’s method, and Newton’s method to perform at least two iterations in the solution process for the Exercises 1–4: 1. Maximize f(x) = −x 2 − 2x on the closed interval [−2, 1] using a tolerance for the final interval of 0.6. (Hint start Newton’s method at x = −0.5.) 2. Maximize f(x) = −x 2 − 3x on the closed interval [−3, 1] using a tolerance for the final interval of 0.6. (Hint start Newton’s method at x = 1.) 3. Minimize f(x) = x 2 + 2x on the closed interval [−3, 1] using a tolerance for the final interval of 0.5. (Hint start Newton’s method at x = −3.) 4. Minimize f(x) = −x + ex over the interval [−1, 3] using a tolerance of 0.1. (Hint start Newton’s method at x = −1.) 5. List at least two assumptions required by both Golden section and Fibonacci’s search methods. 6. Consider minimizing f(x) = −x + ex over the interval [−1, 3]. Assume your final interval yielded a solution within the tolerance of [−0.80, 0.25]. Report a single best value of x to minimize f(x) over the interval.

Projects 1. Write a computer program in Maple that uses a one-dimensional search algorithm, say Golden section search, instead of calculus to perform iterations of gradient search. Use your code to find the maximum of f ( x , y ) = xy – x 2 – y 2 − 2x – 2 y + 4 2. Write a computer program in Maple, Excel, or MATLAB that uses a onedimensional search algorithm, say Fibonacci’s search, instead of calculus to perform iterations of gradient search. Use your code to find the maximum of f ( x , y ) = xy – x 2 – y 2 − 2x – 2 y + 4

Numerical Search Techniques



101

References and Suggested Further Readings Bazarra, M., C. Shetty, and H.D. Scherali 1993. Nonlinear Programming: Theory and Applications. New York: Wiley. Fox, W.P. 1992. “Teaching nonlinear programming with Minitab”. COED Journal, Vol. II(1), pages 80–84. Fox, W.P. 1993. “Using microcomputers in undergraduate nonlinear optimization”. Collegiate Microcomputer, Vol. XI(3), pages 214–218. Fox, W.P. and J. Appleget. 2000. “Some fun with Newton’s Method”. COED Journal, Vol. X(4), pages 38–43. Fox, W.P. and W. Richardson, “Mathematical modeling with least squares using MAPLE”. Maple Application Center, Nonlinear Mathematics, October 2000. Fox, W.P. and M. Witherspoon. 2001. “Single variable optimization when calculus fails: Golden section search methods in nonlinear optimization using MAPLE”. COED, Vol. XI(2), pages 50–56. Fox, W.P., F. Giordano, S. Maddox, and M. Weir. 1987. Mathematical Modeling with Minitab. Monterey, CA: Brooks/Cole. Fox, W.P., F. Giordano, and M. Weir. 1997. A First Course in Mathematical Modeling, 2nd Edition. Monterey, CA: Brooks/Cole. Fox, W.P., F. Giordano, and M. Weir. 2003. A First Course in Mathematical Modeling, 3rd Edition. Monterey, CA: Brooks/Cole. Meerschaert, M. 1993. Mathematical Modeling. Academic Press, San Diego, CA. Phillips, D.T., A. Ravindran, and J. Solberg, 1976. Operations Research. New York: John Wiley & Sons.Rao, S.S. 1979. Optimization: Theory and Applications. New Delhi, India: Wiley Eastern Limited. Press, W.H., B. Flannery, S. Teukolsky, and W. Vetterling. 1987. Numerical Recipes. New York: Cambridge University Press, pages 269–271. Winston, W. 2002. Introduction to Mathematical Programming: Applications and Algorithms, 4th Edition. Belmont, CA: Duxbury Press, ITP.

Chapter 5

Review of Multivariable Differential Calculus 5.1 Introduction: Basic Theory and Partial Differentiation Functions of several variables are usually visualized through their 3D graph. Another method is through the use of level curves of the function using equations such as f(x1, x 2) = k. Consider the function f ( x, y ) = 9 − x 2 − y 2 We might plot the function as shown in Figure 5.1. The next topic that we should look at is that of level curves or contour curves. The level curves of the function z = f(x, y) are two-dimensional curves we get by setting z = k, where k is any number. So the equation of the level curves is f(x, y) = k. We note that sometimes the equation will be in the form f(x, y, z) = 0, and in these cases, the equation of the level curves is f(x, y, k) = 0. You’ve probably seen level curves (or contour curves, whatever you want to call them) before. If you’ve ever seen the elevation map for a piece of land, this is nothing more than the contour curves for the function that gives the elevation of the land in that area. Of course, we probably don’t have the function that gives the elevation, but we can at least graph the contour curves. Let’s do a quick example of this. We continue with our function, f ( x, y ) = 9 − x 2 − y 2 . We show a contour plot of f(x, y) in Figure 5.2. 103

104  ◾  Nonlinear Optimization

Figure 5.1  Three-dimensional plot of f(x, y).

Figure 5.2  Contour plot of f(x, y).

Review of Multivariable Differential Calculus  ◾  105

Figure 5.3  3D plot of f(x, y) with contours.

From the 3D plot and the contour plot, we obtain a fair approximation of the location of maximums, minimums, and even saddle points. However, obtaining the two plots over-laid gives us a better perspective. In Maple, we can obtain a 3D plot with contours as shown in Figure 5.3. We created Figure 5.4 in Maple using the following procedure: coolplot := proc (k, y, x, a, b) local tf, p, q; p := plot3d(k, x = a .. b, y = a .. b); q := contourplot(k, x = a .. b, y = a .. b, contours = 50, axes = BOXED, color = BLACK); tf := transform(proc (x, y) options operator, arrow; [x, y, -2.5] end proc); display({tf(q), p}) end proc; local tf,p,q; p := plot3d(k,x=a..b,y=a..b); q := plots:-contourplot(k,x=a..b,y=a..b,contours=50,axes=BOXED ,color=BLACK); tf := plottools:-transform((x,y)->[x,y,-2.5]); plots:display({tf(q),p}) end proc

Sometimes the plots alone do not tell the whole story. We might need derivative and check if the function is increasing or decreasing. Again, sometimes graphs alone do not tell the whole story. We might need to take derivatives and see where the function is increasing and/or decreasing. We consider function f of n > 1 variables using the notation f(x1, x 2 , …, xn) to denote such as function. We define a partial derivative as follows.

106



Nonlinear Optimization

Figure 5.4 Overlaid plot of 3D function and contour plot. (Source: Maple COOLPLOT.)

Definition 5.1 The partial derivative of f(x1, x 2 , …,xn) with respect to the variable xi is written as f ( x1 ,# , x n + Δxi ,…x n ) − f ( x1 , x 2 ,…, x n ) ∂f = lim ∂xi Δx →0 Δx We do not use the formal limit notation in order to compute partial derivatives. We need to think about a partial derivative as an instantaneous rate of change. It is also the tangent plane to the surface at a point. Notations for partial derivatives If z = f(x, y), we write f x ( x, y ) = f x =

∂f ∂z ∂ = f ( x, y ) = ∂x ∂ x ∂x

f y ( x, y ) = f y =

∂f ∂ ∂z = f ( x, y ) = ∂y ∂y ∂y

Rules for finding partial derivatives of z = f(x, y): 1. To find fx, regard any variable y as a constant and differentiate with respect to x. 2. To find f y, regard any variable x as a constant and differentiate with respect to y.

Review of Multivariable Differential Calculus ◾

Example 5.1 Find the partial derivatives of z = x 3 + x 2y − 3y 2. Solution ∂z = 3 x 2 + 2 xy ∂x ∂z = x2 − 6y ∂y

Example 5.2 Find the partial derivatives of z = f(x, y) = 6x 2 − 3y 2. Solution f x = −2 x f y = −6 y The geometric interpretation of f x is seen as the slope of the plane tangent to the surface at a point (x, y). To obtain a plot of the tangent plane at (0, 0), we obtain Figure 5.5.

Figure 5.5

Surface and tangent plane.

107

108



Nonlinear Optimization

HIGHER PARTIAL DERIVATIVES If f is a function of two or more variables, then its first partial derivatives, f x and f y, are also functions of two or more variables. Therefore, we can consider their partial derivatives. We define the second partial derivatives as follows: f xx =

∂2 f ∂x 2

f yy =

∂2 f ∂ y2

f xy =

∂2 f ∂x ∂ y

f yx =

∂2 f ∂ y ∂x

We point out that f xy = f yx attributed to Clairaut’s theorem (for more information, see https://en.wikipedia.org/wiki/Clairaut%27s_theorem). We will use the second partials as part of sufficient conditions in multivariable optimization—both unconstrained (Chapter 6) and constrained (Chapters 7 and 8). Example 5.3 Find all the first and second partials derivatives of the function, Z = f ( x, y ) = −x 3 + 2 y 2 − 2x 2 y 3 Solution First, we find the first partial derivatives. f x = −3 x 2 − 4 xy 3 f y = 4 y – 6x 2 y 2 Then, we take the second partial derivatives. f xx = −6 x − 4 y 3 f yy = 4 − 12 x 2 y f xy = f yx = −12xy 2

Review of Multivariable Differential Calculus ◾

109

5.2 Directional Derivatives and the Gradient In this section, we will introduce a type of derivative, called a directional derivative, which enables us to find the rate of change of a function of two or more variables in any direction. Suppose that we wish to find the rate of change of z at the point (x 0, y 0) in the direction of an arbitrary unit vector u = [a, b]. To do this, we consider the surface S with equation z = f(x, y), and we let z 0 = f(x 0, y 0), and then the point P(x 0, y 0, z 0) lies on S. The vertical plane that passes through P in the direction of u intersects S in a curve C. The slope of the tangent line T to C at P is the rate of change of z in the direction of u. We formally define the directional derivative. Definition 5.2 The directional derivative of f at the point (x 0, y 0) in the direction of a unit vector u = [a, b] is Du ( x 0 , y0 ) = lim h→0

f ( xo + ha, y0 + h * b ) − f ( x0, , y0 ) , if the limit exist. h

Theorem 5.1 If f is a differentiable function of both x and y, then f has a directional derivative in the direction of any unit vector, u = [a, b] and Du f ( x, y ) = f x ( x , y ) a + f y ( x, y ) b Let’s begin with the function f(x, y) = x 2 + 2y2 at the point (2, 3) in the direction of [3, 4]. The unit vector is [3/5, 4/5]. Dx = 4 ( 3 / 5) + 12 ( 4 / 5) To visualize what we are doing, we look at the surface together with a plane containing the direction vector as displayed in Figure 5.6. We also look at the tangent and the surface as displayed in Figure 5.7. Example 5.4 Find the directional derivative of z = f (x, y) = x 3 − 3xy + 4y 2 in the direction of ⎡ 3 1⎤ u=⎢ , ⎥ at the point (1, 2). ⎣ 2 2⎦

110 ◾

Nonlinear Optimization

Figure 5.6

Plot of surface, directional derivative and unit vector.

Figure 5.7

Repeat of Figure 5.6 with added tangent plane.

Solution Du f ( x, y ) = f x ( x , y ) a + f y ( x, y ) b f x = 3x 2 − 3 y , f y = −3x + 8 y So, f x(1, 2) = −3 and f y(1, 2) = 13.

Review of Multivariable Differential Calculus ◾

111

Thus,

(

)

Du f ( x, y ) = f x x , ya + f y ( x, y ) b = −3 *

3 1 13 − 3 3 + 13 * = 2 2 2

The directional derivative represents the rate of change of z in the direction of u. This is the slope of the tangent line to the curve of intersection of the surface z ⎡ 3 1⎤ and the plane defined by the points (1, 2) in the direction of u = ⎢ , ⎥ . Notice ⎣ 2 2⎦ from linear algebra that the directional derivative is just the dot product of the vectors, ∇f ⋅ u. Let’s define the gradient vector.

Definition 5.3 If f is a function of two or more variables, f(x1, x 2, …, xn) the gradient of f is the vector of first partial derivatives:

( f x1 , f x 2 ,…, f xn ) The notation that we adopt for the gradient of f is ∇f = [ f x 1 , f x 2 , … , f xn ] Example 5.5 Find the gradient of z = f (x, y) = x 3 − 3xy + 4y 2. Solution fx = 3 x 2 − 3 y fy = 3x + y 2 ∇f = ⎡⎣ f x 1, f x 2 ,…, f xn ⎤⎦ = ⎡3 ⎣ x − 3 y , 3x + y ⎤⎦

Example 5.6 Create a unit vector for the gradient in Example 5.5 at the point (−2, 1).

112  ◾  Nonlinear Optimization

Figure 5.8  Plot of f(x1, x2) = 55x1 − 4x12 + 135x2 − 15x22 – 100. Solution ∇f =  f x 1, f x 2 ,…, f xn  = [ 9,14 ]



∇f ( −2, 1) = 92 + 142 = 277

 9 14  , . The unit vector is  277 277   An important concept of the gradient is that the gradient vector points uphill. In other words, it is pointing its way toward the maxima of f. Negative ∇f points down the hill toward the minima. The key word is toward. It points in the direction of maximum gain from the point. Think of this as scaling a mountain. We move from point to point always moving up-hill and always moving closer to the summit. Example 5.7 Let’s consider the function f ( x1, x 2 ) = 55 x1 − 4 x12 + 135 x 2 − 15 x 22 − 100. If we stood at the point (1, 2), there are infinitely many directions in which we could walk. Each direction has a certain steepness associated with it. Here we want to know which direction is the steepest. The gradient vector gives us

Review of Multivariable Differential Calculus  ◾  113

Figure 5.9  Contour plot of f(x1, x2) = 55x1 − 4x12 + 135x2 − 15x22 – 100. the direction in which z is increasing fastest and whose magnitude gives the rate of change in that direction. This is displayed in Figure 5.8.The contour plot is shown in Figure 5.9 with a point (1, 2) and the gradient vector from (1, 2) vector in the direction of greatest increase from that point. Notice, it does not point to the top of the hill. The vector points in the direction of greatest increase. We move in that direction until we find another direction of greatest increase. We will discuss this more when we examine the Method of Steepest Ascent in Chapter 7.

Exercises 1. Find the directional derivative of f ( x , y ) = 1 + 2 x y at the point (3, 4) in the  direction of v = 4, −3 . 2. Near a buoy, the depth of a lake at the point with coordinates (x, y) is z = 200 + 0.02 x 2 − 0.01 y 3, where x, y, and z are measured in meters. A fisherman in a small boat starts at the point (80, 60) and moves toward the buoy, which is located at (0, 0). Is the water under the boat getting deeper or shallower when he departs? Explain. 3. Suppose you are climbing a hill whose shape is given by the equation z = 1,000 − 0.01x 2 − 0.02 y 2 and you are standing at a point with coordinates

114 ◾

Nonlinear Optimization

(60, 100, 764). In which direction should you proceed initially in order to reach the top of the hill fastest? What is the slope in this direction? 4. Your student is preparing for his upcoming fitness test. As a mentor, you are helping him with his physical training by taking him out for a run to the Stony Lonesome gate. As he huffs and puffs up the hill, you are considering how you can maximize his workout by running the steepest slopes. The shape of the hill is defined by the equation z = x 2 y − ln(x) on the domain D = ( x , y) ∈ ℜ 2 1 ≤ x ≤ 10 . You are currently at the point P0 = (1, 4) runI ning in the direction of the vector v (t ) = 3, 4 .

{

}

a. Find the rate of change in elevation of the hill at your current position and in your current direction of travel. b. Are you running in the direction of the steepest slope possible at this point? Justify your answer

References and Suggested Reading Fox, W. Mathematical Modeling with Maple, Cengage Publishers, Boston, MA, 2013. Giordano, F., Fox, W., Horton, S. A First Course in Mathematical Modeling, 5th ed. Cengage Publishers, Boston, MA, 2014. Stewart, J. Calculus, 8th ed. Cengage Publishers, Boston, MA, 2016. Winston, W. Introduction to Mathematical Programming: Applications and Algorithms, 2nd ed. Duxbury Press, Belmont, CA, 1995.

Chapter 6

Models Using Unconstrained Optimization: Maximization and Minimization with Several Variables 6.1 Introduction Consider a small company that is planning to install a central computer with cable links to five new departments. According to their floor plan, the peripheral computers for the five departments will be situated as shown by the dark circles in Figure 6.1. The company wishes to locate the central computer so that the minimal amount of cable will be used to link the five peripheral computers. Assuming that cable may be strung over the ceiling panels in a straight line from a point above any peripheral to a point above the central computer, the distance formula may be used to determine the length of cable needed to connect any peripheral to the central computer. Ignore all lengths of cable from the computer itself to a point above the ceiling panel immediately over that computer. That is, work only with lengths of cable strung over the ceiling panels (see Fox et al., 2004). 115

116



Nonlinear Optimization

Figure 6.1

The grid for the five departments.

The coordinates of the locations of the five peripheral computers are listed in Table 6.1. Assume the central computer must be positioned at coordinates (m, n) where m and n are integers in the grid representing the office space. Determine the coordinates (m, n) for the placement of the central computer that minimize the total amount of cable needed. Report the total number of feet of cable needed for this placement along with the coordinates (m, n). To model and solve problems like this, we need to learn about multivariable unconstrained optimization. Table 6.1

Grid Coordinates of Five Departments

X

Y

15

60

25

90

60

75

75

60

80

25

Models Using Unconstrained Optimization



117

We will discuss how to find an optimal solution (if one exists) for the following unconstrained nonlinear optimization problem: Maximize ( or minimize ) f ( x1 , x 2 ,# , x n ) over R n We assume that the first and second partial derivatives of f(x1, x 2, …, xn) exist and are continuous at all points in the domain of f. Let ∂ f (x1 ,x2 ,#,xn ) ∂xi be the partial derivative of f(x1, x 2, …, xn) with respect to xi. Candidate critical points (stationary points) are found where ∂ f ( x1 ,x2 ,#, xn ) = 0, for i = 1, 2,#, n. ∂xi This sets up a system of equations that when solved yields the critical point (if one or more is found), that satisfies all partial derivatives.

Theorem 6.1 ∂ f (x1 ,x2 ,# xn ) = 0, for i = 1, 2, …, n. ∂xi ∂ f (x1 ,x2 ,# xn ) = 0, for We have previously defined all points that satisfy ∂xi i = 1, 2, …, n as critical points (stationary points). Not all critical points (stationary points) are local extremum. If a stationary point is not a local extremum (a maximum or a minimum), then it is called a saddle point.

If x is a local extremum, then x satisfies

6.2 The Hessian Matrix How do we determine the convexity of functions of more than one variable?

Definition 6.1 A function f(x) is convex if

( )

(

)

( )

(

)

( )

f x (1) + λ x (2) − x (1) ≤ f x (1) + λ f (x (2) − f x (1)

118



Nonlinear Optimization

for every x (1) and x (2) in its domain and every λ ∈ [0, 1]. Similarly, f(x) is concave if

( )

(

)

( )

(

)

( )

f x (1) + λ x (2) − x (1) ≥ f x (1) + λ f (x (2) − f x (1)

for every x (1) and x(2) in its domain and every λ ∈ [0, 1]. We introduce the Hessian matrix that allows us to determine the convexity of multivariable functions. As we will see, the Hessian matrix provides us with additional information about the critical points as well.

Definition 6.2 The Hessian matrix is an n × n matrix of the second partial derivatives of a multi∂2 f variable function f(x1, x 2, …, xn) where the ijth entry is ∂xi x j ⎡ ∂2 f ⎢ 2 ⎢ ∂x1 ⎢ 2 ⎢ ∂ f ⎢ ∂x2 x1 ⎢ ∂2 f H=⎢ ⎢ ∂x3 x1 ⎢ ⎢ # ⎢ ∂2 f ⎢ ⎢ ∂xn x1 ⎢ ⎣

∂2 f ∂x1 x2

∂2 f ∂x1 x3

#

∂2 f ∂x1 xn

∂2 f ∂x22 #

# ∂2 f ∂ xn2

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

In a 2 × 2 case, ⎡ ∂2 f ⎢ 2 ⎢ ∂x1 H=⎢ 2 ⎢ ∂ f ⎣⎢ ∂x2 x1

∂2 f ∂x1 x2 ∂2 f ∂x22

⎤ ⎥ ⎥ ⎥ ⎥ ⎦⎥

∂2 f ∂2 f We note that the mixed partials are always equal as you will = ∂ x1 ∂ x 2 ∂ x 2 ∂ x1 see in our examples.

Models Using Unconstrained Optimization

Example 6.1 If f ( x1, x 2 ) = x12 + 3x 22 , find the Hessian matrix. ∂f ∂f = 6x 22 = 2 x1, ∂x1 ∂x 2 ∂2 f =2 ∂x12

∂2 f =6 ∂x 22 ∂2 f ∂2 f = =0 ∂x1x 2 ∂x 2x1 ⎡ 2 H =⎢ ⎣ 0

0 ⎤ ⎥. 6 ⎦

Example 6.2 If f ( x1, x 2 ) = −x12 − 3 x 22 + 3 x1 ⋅ x 2 , find the Hessian matrix. ∂f ∂f = −2 x1 + 3 x 2 , = −6x 2 + 3 x1 ∂x 2 ∂x1 ∂2 f = −2 ∂x12 ∂2 f = −6 ∂x 22 ∂2 f ∂2 f = =3 ∂x1x 2 ∂x 2 x1 ⎡ −2 H =⎢ ⎣⎢ 3

3 ⎤ ⎥ − 6 ⎦⎥

◾ 119

120  ◾  Nonlinear Optimization

Definition 6.3 The ith leading principal minor of an n × n matrix is the determinant of any i × i matrix obtained by deleting n − i rows and the corresponding n − i columns of the matrix. Example 6.3 Given a 3 × 3 Hessian matrix,  2  H = 0  4



   

Det [ 2 ] = 2. b. eliminating rows 1, 3 and columns 1, 3 yields the matrix [1]. Det [1] = 1.



4 5 3

1. There are three first leading principal minors (i = 1, so eliminate 3 − 1 = 2 rows and 2 columns); a. eliminating rows 2, 3 and columns 2, 3 yields the matrix [2].



0 1 5

c. eliminating rows 1, 2 and columns 1, 2 yields the matrix [3]. Det [ 3] 3.



Note: These first leading principal minors are the entries of the main diagonal.

2. There are three second leading principal minors (i = 2, so we eliminate 3 − 2 = 1 row and 1 column). a. eliminating row 3 and column 3 yields the matrix  2   0





 2 0   , Det  1   0

0  =2 1 

b. eliminating row 2 and column 2 yields the matrix  2   4

 2 4   , Det  3   4

4   = 6 − 16 = −10 3 

Models Using Unconstrained Optimization  ◾  121

c. eliminating row 1 and column 1 yields the matrix  2   0



 2 0   , Det  1   0

0   = 3 − 25 = −22 1 

3. There is only one third leading principal minor (i = 0, so eliminate no rows or columns).  2  Det  0  4



0 1 5

4 5 3

   = −60 

Definition 6.4 The kth leading principal minor of an n × n matrix is the determinant of the k × k matrix obtained by deleting the last n − k rows and n − k columns of the matrix. Example 6.4 Given a 3 × 3 Hessian Matrix,







 2  H = 0  4

0 1 5

4 5 3

   

1. The first leading principal minor is the determinant of the 1 × 1 matrix obtained by deleting the last 3 − 1 = 2 rows and columns, yielding the matrix [2], with Det [2] = 2. 2. The second leading principal minor is the determinant of the 2 × 2 matrix obtained by deleting the last 3 − 2 = 1 rows and columns, yielding the matrix:  2 Det   0

0  =2 1 

3. The third leading principal minor is the determinant of the 3 × 3 matrix obtained by deleting the last 3 − 3 = 0 rows and columns, yielding the matrix:  2  Det  0  4

0 1 5

4 5 3

   = −60 

122  ◾  Nonlinear Optimization Notice that if you examine the matrix, H, the leading principal minors are just the determinant of the square matrices along the main diagonal. So, how do we use all these determinants that give the principal minor and leading principal minors of the Hessian matrix to determine the convexity of the multivariate function?

Theorem 6.2a Let f ( x1 ,x2 , xn ) be a function with continuous second-order partial derivatives for every point in the domain of f. Then, f ( x1 ,x2 , ,xn ) is a convex function if all the leading principal minors of the Hessian matrix are non-negative.

Theorem 6.2b Let f ( x1 ,x2 , , xn ) be a function with continuous second-order partial derivatives for every point in the domain of f. f ( x1 ,x2 , , xn ) is a concave function if all the non-zero leading principal minors of the Hessian matrix k follow the sign of ( −1) , where k represents the order of the principal minors (k = 1, 2, 3, …, n).

Theorem 6.2c If the leading principal minors do not follow either Theorem 6.2a or 6.2b, then f (x1 ,x2 , , xn ) is neither a convex function nor a concave function. Example 6.5 Determine the convexity of the function using the Hessian matrix f ( x1, x 2 ) = x12 + 3 x 22 .



 2 H =  0

0   6 

The first leading principal minors are Det [2] = 2 > 0, Det[6] = 6 > 0.  2 The second leading principal minor is Det   0

0   = 12 > 0. 6 

Since all principal minors are non-negative, we can classify f ( x1, x 2 ) = x12 + 3 x 22 as a convex function by Theorem 6.2a. The graph is shown in Figure 6.2.

Models Using Unconstrained Optimization  ◾  123

Figure 6.2 

Graph of f ( x1 , x 2 ) = x1 + 3 x 2 . 2

2

Example 6.6 Determine the convexity of the function using the Hessian matrix f ( x 1, x 2 ) = − x12 − 3x 22 + x1 ⋅ x 2 .



 −2 H =  1

1   −6 

The first leading principal minors are Det[−2] = −2  0 and Det[−6] = −6  z 2 := x1 − 3 x 2 + x1 ⋅ x 2 ;



z 2 := x12 − 3 x 22 + x1x 2

> h:= Hessian (z2, [x1, x 2]);



 2 h :=   1

1   −6 

The Hessian matrix is indefinite because f xx > 0 and f yy 0

Positive Semi-Definite

≥0

Negative Definite

0. Positive Semi-definite: all leading principal minors are non-negative (some are zero). Negative Definite: the leading principal minors follow the signs of (−1)k, where k represents the order of the leading principal minor. For example, with k = 1, 2, 3 the signs of the leading principal minors are minus, positive, minus. Negative Semi-definite: all non-zero valued leading principal minors follow the signs of (−1)k, where k represents the order of the leading principal minor. For example, with k = 1, 2, 3 the signs of the leading principal minors are minus, positive, minus. Some leading principal minors have value zero. Indefinite: Some leading principal minors do not follow any of the rules for positive definite, positive semi-definite, negative definite, or negative semi-definite above. The Hessian matrix is not always a matrix of all constants. If the Hessian is a function of the independent variables, its definiteness might vary from one value of x to another. To test the definiteness of the Hessian at a point x* , it is necessary to evaluate the Hessian at the point x*. For example, consider the following Hessian: ⎡ 2x1 H (X ) = ⎢ ⎢⎣ x 2

x2 4

⎤ ⎥ ⎥⎦

The values of x1 and x 2 determine whether the matrix is positive definite, positive semi-definite, negative definite, negative semi-definite, or indefinite. There exists a relationship between the Hessian matrix definiteness and the classification of stationary points (extrema) as maximum, minimum, saddle points, or inconclusive. Table 6.2 summarizes these results. In this table, k indicates the order of the leading Principal Minors of the Hessian. The ith PM is found by eliminating the n − i rows and the corresponding columns of the matrix. The first leading PMs are always the main diagonal of the original Hessian matrix.

Models Using Unconstrained Optimization



127

Table 6.2 Summary of Hessian Results Determinants: k, Leading Principal Minors (PM)

Results

Conclusions about Stationary Points

Hk > 0

Positive Definite, f convex

Minima

Hk ≥ 0

Positive Semi-Definite, f convex

Local minima

Hk follows the signs of ( −1)

Negative Definite, f concave

Maxima

Hk either 0 (not all 0) or

Negative Definite, f concave

Local maxima

Hk not all 0 and none of the above

Indefinite, f neither

Saddle point

Hk all 0″s

Indefinite

Inconclusive

k

follows ( −1)

k

Example 6.8 Suppose the Hessian of the function is given by: ⎡ −2 H (x ) = ⎢ ⎣ −1

−1 −4

The first leading PMs are −2, −4 and the second leading PM is the determinant of the 2 × 2 matrix, 7. Since the first leading PMs are negative and the second leading PM is positive, we find that they follow the rule that the leading PM follows k the sign of ( −1) , where k represents the order of the leading principal minor. We would conclude that the function is concave and that any corresponding stationary point found was a maximum. The Hessian matrix is negative definite (ND). Example 6.9 ⎡ 3 ⎢ H (x ) = ⎢ 6 ⎢ ⎣ 9

2 5 8

1 4 7

⎤ ⎥ ⎥ ⎥ ⎦

The first leading PMs are 3, 5, and 7. Note that they are the entries of the main diagonal. The second leading PMs are the determinants found by eliminating the row and column containing the firstt PM, one at a time. This yields three 2 × 2 submatrices.

128



Nonlinear Optimization

Eliminating the row and column with 3 yields the 2 × 2 matrix ⎡ 5 A=⎢ ⎣⎢ 8

4 ⎤ ⎥ 7 ⎦⎥

with a determinate value of 35 − 32 or 3. Eliminating the row and column with 5 yields the 2 × 2 matrix ⎡ 3 B=⎢ ⎣⎢ 9

1 ⎤ ⎥ 7 ⎦⎥

with a determinate value of 21 − 9 or 12. Eliminating the row and column with 7 yields the 2 × 2 matrix ⎡ 3 C =⎢ ⎣⎢ 6

2 ⎤ ⎥ 5 ⎦⎥

with a determinate value of 15 − 12 or 3. The third leading PM is the determinate of the 3 × 3 original matrix. This value is zero, 0. The Hessian follows the Positive Semi-Definite form where all leading Principal Minors are greater than or equal to zero. The function would be convex and any corresponding stationary points would be local minima. Example 6.10 ⎡ −2 H (x ) = ⎢ ⎣ −4

−4 ⎤ ⎥ −3 ⎦

has first leading PM of −2 and −3 while the second leading PM is −10. This is an indefinite Hessian and any stationary point corresponding to this Hessian would be a saddle point.

6.3 Unconstrained Optimization We put the theorems and definitions from the previous sections to work in this section. Example 6.11 Find and classify all the stationary points of f (x , y) = 55 ⋅ x − 4 ⋅ x 2 + 135 ⋅ y − 15 ⋅ y 2 − 100

Models Using Unconstrained Optimization



∂f = 55 − 8x = 0 ∂x ∂f = 135 − 30 y = 0 ∂x These solve as x =

55 135 . There is only stationary point. and y = 8 30 ⎛ 55 135 ⎞ ⎡ H⎜ , = ⎝ 8 30 ⎠⎟ ⎢⎢⎣

−8 0

0 −30

⎤ ⎥ ⎥⎦

The first PM are −8, −30 which follow (−1)1. The second PM is 240 > 0 which follows (−1)2. ⎛ 55 135 ⎞ ⎛ 55 135 ⎞ The function, f, is concave at ⎜ , , therefore, ⎜ , represents the ⎝ 8 30 ⎟⎠ ⎝ 8 30 ⎟⎠ maximum of f. ⎛ 55 135 ⎞ f ⎜ , = 329.81 ⎝ 8 30 ⎟⎠

Example 6.12 Find all the local maxima, local minima, and saddle points for f ( x1, x 2 ) = x1x 22 + x13 x 2 − x1x 2 We find the partial derivatives and set them equal to zero to find the stationary points. ∂f = x 22 + 3 x12 x 2 − x 2 = 0 ∂ x1 ∂f = 2 x1x 2 + x13 x 2 − x1 = 0 ∂ x1

(

)

(

)

x 22 + 3 x 12x 2 = 0 or

x 2 x 2 + 3x12 − 1 = 0

2 x1x 2 + x13 − x1 = 0 or

x1 2x 2 + x12 − 1 = 0

The candidate points are: x 2 = 0 or

x1 =

1− x 2 3

(6.1)

129

130



Nonlinear Optimization

x1 = 0 or

1− x12 2

(6.2)

Thus, the following points will work for equations (6.1) and (6.2) alternating letting x 2 = 0 and x1 = 0: (0, 0) (1, 0) (0, 1) (−1, 0) Also we need to find solutions when x1 =

1− x 2 1− x12 and x 2 = . 3 2

2

⎛ 1− x2 ⎞ ⎛ 1 − x2 ⎞ 1− ⎜ 1− ⎜ 3 ⎟⎠ ⎝ ⎝ 2 ⎟⎠ . By substitution, x 2 = = 2 2 2x2 = 1 −

1 x2 + 3 3

5 2 x2 = 3 3

x2 =

2 If x 2 = , then x1 = 5

2 5

1 5 =± 5 5

⎛ 5 2⎞ ⎛ 5 2⎞ , − , (0),(1,0),(0,1),(−1,0) ⎝⎜ 5 5 ⎠⎟ ⎝⎜ 5 5 ⎠⎟ We have six stationary points that need to be tested with the Hessian to see if they are local maxima, local minima, or saddle points. We need the Hessian matrix. ∂f = x 22 + 3 x12 x 2 − x 2 = 0 ∂x1 ∂f = 2 x1x 2 + x13 − x1 = 0 ∂ x1

Models Using Unconstrained Optimization

∂f = 6x1x 2 ∂x1

∂f = 2x1 ∂x1

⎡ 6x1x 2 H =⎢ ⎢ 2 x 2 + 3 x12 − 1 ⎣

⎡ 0 H (0,0) = ⎢ ⎣ −1

−1 ⎤ ⎥ 0 ⎦

⎡ 0 H (1,0) = ⎢ ⎣ 2

2 ⎤ ⎥ 1 ⎦

⎡ 0 H(0,1) = ⎢ ⎣ 1

1 ⎤ ⎥ 0 ⎦

⎡ 0 H (−1,0) = ⎢ ⎣ 2

2 x 2 + 3 x12 − 1 ⎤ ⎥ ⎥ 2x1 ⎦

2 ⎤ ⎥ −2 ⎦

⎡ 5 ⎢ 12 ⎛ 5 2⎞ ⎢ 25 H⎜ , = ⎝ 5 5 ⎟⎠ ⎢ 2 ⎢ ⎢⎣ 5

⎡ 5 ⎢ −12 ⎛ 25 5 2⎞ ⎢ H ⎜− , = ⎝ 5 5 ⎟⎠ ⎢ 2 ⎢ ⎢⎣ 5

2 5 2

5 5

⎤ ⎥ 4 ⎥ ⎥ , determinant : 5 ⎥ ⎥⎦

2 5 −2

5 5

⎤ ⎥ 4 ⎥ ⎥ , determinant : 5 ⎥ ⎥⎦



131

132  ◾  Nonlinear Optimization

Point

First PM

Second PM

Classification and Result

(0,0)

0,0 both non-negative

−1

Neither, saddle point

(1,0)

0,0 both non-negative

−4

Neither, saddle point

(0,1)

0,0 both non-negative

−1

Neither, saddle point

(−1,0)

0, −2 follows (−1)1

−4

Neither, saddle point

 5 2  5 , 5 

12

 5 2 − ,  5 5 

5 5 , both non-negative ,2 25 5

−12

5 5 follows (−1)1 ,−2 25 5

4 5 4 (follows (−1)2 5

The graph of x1x 22 + x13 x 2 − x1x 2 is shown in Figure 6.5. Example 6.13 Find and classify all the stationary points of the function,

f ( x , y ) = 2 xy + 4x + 6 y − 2 x 2 − 2 y 2 .

Figure 6.5  Graph of x 1x 22 + x 13 x 2 − x 1x 2 .

f is convex, local minimum f is concave, local maximum

Models Using Unconstrained Optimization



f ( x , y ) = 2xy + 4 x + 6 y − 2 x 2 − 2 y 2 ∂f = 2 y + 4 − 4x ∂x ∂f = 2x + 6 − 4 y ∂x

To find where

∂f ∂f = = 0, we solve ∂x ∂ y −4 x + 2 y = −4 2x − 4 y = 6

⎡ −4 ⎢ ⎣ 2

2 −4

⎡ ⎢ 1 ⎤ −4 ⎢ , row echelon ⎥ ⎢ −6 ⎦ ⎢ 0 ⎣ x=

0 1

7 3 8 3

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

7 8 , y= 3 3

⎡ −4 H =⎢ ⎣ 2

2 ⎤ ⎥ −4 ⎦

The first leading PM are −4, −4 following (−1)k, k = 1. The second leading PM is 16 − 4 = 12 > 0 following (−1)k, k = 2. Thus, H is negative definite, f is concave, and (x*, y*) = (7/3, 8/3) is a global maxima (see Figure 6.6). Example 6.14 Least Squares Model In a least squares model, fitting a line, y = a + bx, we want to minimize the sum of squared error, f (a ,b) =



∂f ⎛ = 2⎜ ⎝ ∂a

i =n i =1



( yi − a − bxi ) . 2

i =n i =1

( yi − a − bxi ) ( −1)⎞⎟⎠ = 0

133

134  ◾  Nonlinear Optimization

Figure 6.6  Graph of f(x,y) = 2xy + 4x + 6y − 2x2 − 2y2. ∂f  = 2  ∂a





i =n i =1

( yi − a − bxi ) ( − xi ) = 0

∂f ∂f = = 0 hold when we find (a, b) such that the following hold: ∂a ∂b na + b

a





i =n i =1

xi + b

∑x = ∑



i

i =n i =1

xi2 =



i =n i =1

i =n i =1

yi

x i yi

These equations are called the normal equations for least squares. 2n

H=



2





The first leading PM are 2n, 2

i =n i =1

i =n i =1

∑ 2∑ 2

xi

i =n i =1

i =n i =1

xi xi2

xi2 both are > 0.

The second leading PM is 2n > 0. H is positive definite, f is strictly convex, so (a,b)* will be a global minimum. Now let’s look at this with some data:

X

1

2

3

Y

2

4.8

7

Models Using Unconstrained Optimization

Min S = ( 2 − a − b ) + ( 4.8 − 2a – b ) + ( 7 − 3a – b ) 2

2



135

2

With substitution of the coordinates we have Minimize S := ( 2 – a − b ) + ( 4.8 − 2a − b ) + ( 7 − 3a – b ) 2

2

2

We take the partial derivatives with respect to a and b and get equations (6.1) and (6.2). 28a + 12b – 652

(6.1)

12a + 6b − 276

(6.2)

We solve each equation equal to 0 and obtain values for a and b. a=2.5, b = –0.40 The least squares equation is y = –0.40 x + 2.5 We obtain the Hessian, h as ⎡ 28 12 ⎤ h := ⎢ ⎥ The Hessian matrix is positive definite, so we have found 6 ⎦ ⎣ 12 the minimum. Example 6.15 Consider the following problem where we are looking to see if an island exists in a harbor. The following function represents the topographic underwater region of the harbor. We obtain a 3D plot and a contour plot to assist our analysis in Figure 6.7a and b. Our function f(x, y) is: f(x, y) = −300*y3 − 695*y2 + 7*y − 300*x3 − 679*x2 − 235*x + 570

Figure 6.7

(a) and (b) 3D plot and contour plot of our function in Example 6.15.

136



Nonlinear Optimization

We take the partial derivatives and set the two equations equal to zero: −900x2 – 1,358x – 235 = 0 and −900y2 – 13,990y + 7 = 0 to obtain a set of real critical points: (−0.199399, 0.005019) (−1.30948, 0.00501965) (−0.1993991, −1.549464) (−1.549464, −1.3094897) We summarize the results: Point

f(x, y)

Hessian Definiteness

592.258

(−0.199399, 0.005019)

Result

Negative definite

Maximum

(−1.30948, 0.00501965)

397

Negative definite

Maximum

(−0.1993991, −1.549464)

28.81

Indefinite

Saddle

(−1.549464, −1.3094897)

−176

Positive definite

Minimum

We find that at x= −0.1993991232 and y = 0.0050019656649 with a maximum height of f(x,y) = 592/2577681 units. The Hessian is negative definite at that point indicating that we found the maximum. If we assume that Z = 0 is sea level. Then an island exists in the harbor that is 592.2577681 ft above sea level.

Exercises 1. Indicate both the “definiteness” {positive definite, positive semi-definite, negative definite, negative semi-definite, and indefinite} of the following Hessian matrices and indicate the concavity of the function from which these Hessian matrices were derived. Assume the Hessian, H is: ⎡ 2 a) ⎢ ⎣⎢ 3

3 ⎤ ⎥ 5 ⎥⎦

⎡ −3 d) ⎢ ⎢⎣ 4 ⎡ 2 ⎢ g) ⎢ 2 ⎢⎣ 3

⎡ 4 b) ⎢ ⎣ 3 ⎡ 6x e) ⎢ ⎢⎣ 0

4 ⎤ ⎥ −5 ⎦⎥ 2 6 4

3 4 4

3 ⎤ ⎥ 2 ⎦

⎤ ⎥ ⎥ ⎥ ⎦

⎡ 1 ⎢ 2 h) ⎢ ⎣⎢ 2

0 ⎤ ⎥ 2x ⎦ 2 1 2

2 2 1

⎤ ⎥ ⎥ ⎥ ⎦

⎡ c) ⎢ −2 ⎣ 1

1 ⎤ −2 ⎦⎥

⎡ x2 f) ⎢ ⎢⎣ 0

0 ⎤ ⎥ 2y ⎥ ⎦

Models Using Unconstrained Optimization

◾ 137

2. Using the Hessian matrix, H, determine the convexity and then find the critical points and classify them for the following: a) b) c) d) e) f)

f(x,y) = x 2 + 3xy − y2 f(x,y) = x 2 + y2 f(x,y) = −x 2 − xy − 2y2 f(x,y) = 3x + 5y − 4x 2 + y2 − 5xy f(x,y,z) = 2x + 3y + 3z − xy + xz − yz − x 2 − 3y2 − z2 Determine the values of a, b, and c such that ax 2 + bxy + cy 2

is convex? Is concave? 3. Find and classify all the extreme points for the following: a. b. c. d. e.

f(x,y) = x 2 + 3xy − y2 f(x,y) = x 2 + y2 f(x,y) = −x 2 − xy − 2y2 f(x,y) = 3x + 5y − 4x 2 + y2 − 5xy f(x,y,z) = 2x + 3y + 3z − xy + xz − yz − x 2 −3y2 − z2

4. Find and classify all critical points of f(x,y) = e(x−y) + x2 + y2 5. Find and classify all critical points of f(x,y) = (x 2 + y2)1.5 − 4(x 2 + y2) 6. Consider a small company that is planning to install a central computer with cable links to five departments. According to their floor plan, the peripheral computers for the five departments will be situated as shown by the dark circles in Figure 6.8. The company wishes to locate the central computer so that the minimal amount of cable will be used to link to the five peripheral computers. Assuming that cable may be strung over the ceiling panels in a straight line from a point above any peripheral to a point above the central computer, the distance formula may be used to determine the length of cable needed to connect any peripheral to the central computer. Ignore all lengths of cable from the computer itself to a point above the ceiling panel immediately over that computer. That is, work only with lengths of cable strung over the ceiling panels. The coordinates of the locations of the five peripheral computers are listed in Table 6.3. Assume the central computer will be positioned at coordinates (m,  n) where m and n are integers in the grid representing the office space. Determine the coordinates (m, n) for the placement of the central computer that minimize the total amount of cable needed. Report the total number of feet of cable needed for this placement along with the coordinates (m, n).

138



Nonlinear Optimization

Figure 6.8

The grid for the five departments.

Table 6.3 Grid Coordinates of Five Departments X

Y

15

60

25

90

60

75

75

60

80

25

7. Find all the extrema and then classify the extrema for the following functions: a) f(x,y) = x 3 − 3xy2 + 4y4. b) w(x,y,z) = x 2 + 2xy − 4z + yz2 8. Three oil fields are located according to a rectangular coordinate system. Each field produces an equal amount of oil. A pipeline is to be laid from each oil field to a centrally located refinery. If the oil wells are located at coordinates (0,0), (12,6), and (10, 20), where should the refinery be located to minimize the total squared Euclidean distance:

Models Using Unconstrained Optimization

◾ 139

3

∑( x − a ) + ( y − b ) 2

i

2

i

i=1

9. Given that the first partials of a function f(x,y) evaluated at the point (0,0) is (−5,1) and that the Hessian matrix, H(x), is as follows: ⎡ 6 H (x) = ⎢ ⎣ −1

−1 ⎤ ⎥ 2 ⎦

Using your knowledge of partial derivatives and Hessians, determine the values of (x,y) that minimize the function. Provide the value of f(x,y). Show all work. 10. Find and classify all stationary points of k ( x , y ) = −5 + 3x 3 + 7x 2 + 2x + 7 y 2 − y + 3 y3. 11. Shipping port and island. We have a function that represents the water in the region where we want to consider building a port. The function is given by the expression: f ( x , y) = 10 ⋅ (3 + x)3 + 3 ⋅ (6 + x) − 20 ⋅ (2 + y)2 −

10 ⋅ (2 + y)5 − 20 ⋅ (3 + y ) − 30 ⋅ cos(10 + (6 + x )3 ) 11 − 30x − 25 x 2

+4 ⋅ sin ( 28 + 3 x + 6 y + x ⋅ y ) We want to identify the general location of the entry point that will cause problems for ship ping entering into the region.

6.4 Eigenvalues We define an eigenvalue as the solution to the determinant of (A − λI) = 0. Eigenvalues may be used to determine the convexity of multivariate functions as well as the nature of the stationary points. The matrix A in the definition is the Hessian matrix, H. Let’s define the Eigenvalues as the vector, E. If the values of the vector E are: All > 0, then the function is strictly convex and the stationary point is a minimum. All > 0, then the function is convex and the stationary point is a minimum. All < 0, then the function is strictly concave and the stationary point is a maximum. All < 0, then the function is concave and the stationary point is a minimum.

140 ◾

Nonlinear Optimization

The eigenvalues do not follow the above and not all 0 then the function is not concave or convex (or both) If all the eigenvalues are equal to 0, then the stationary point is inconclusive. Example 6.16 ⎡ −2 Given H = ⎢ ⎣ 0

0 ⎤ ⎥ find the eigenvalues and classify the stationary point −2 ⎦

⎤ ⎡ −2 − λ 0 (1,1), find the determinant of Det ( H ) = ⎢ ⎥=0 0 −2 − λ ⎦⎥ ⎢⎣ (−2−λ)2 = 0 so λ = −2 and −2. Both are 0 . Step 2. Set x=x0 and define the gradient at that point. ’ f(x0 ) Step 3. Calculate the maximum of the new function f(xi+ti ’ f(xi)), where ti> 0, byfinding the value of ti. Step 4.

Find the new xi point by substituting ti into xi+1 = xi + ti ’ f(xi)

Step 5. If the length (magnitude) of x, defined by 1

|| x||= ( x12 + x22 + ... + xn2 ) 2 , is less than the tolerance specified, then continue. Or if the absolute magnitude of gradient is less than our tolerance (derivative approximately zero) Otherwise, go back to Step 3. Step 6. Use x*as the approximate stationary pointand compute, f(x*), the estimated maximum of the function. STOP

Figure 7.2

Steepest Ascent Algorithm.

newly calculated gradient as far as we can so long as it continues to improve f. This continues until we achieve our maximum value within some specific tolerance (or margin of acceptable error). Figure 7.2 displays an algorithm for the Method of Steepest Ascent using the gradient. Example 7.1 f ( x , y ) = − ( x − 2 ) − ( y − 3) 2

2

By inspection we know that the solution is the point (2,3) and f(2,3) = 0. So let’s see how we get to approximate that answer with gradient search starting from [0,0] with a tolerance of t = 0.01. The gradient, ∇f = ⎡⎣ −2 ( x − 2 ) , −2 ( y − 3) ⎤⎦ . If we evaluate at [0,0] we find [4,6]. The magnitude of the vector [4,6] is 7.211, which is not less than t = 0.01. Our new point is [0 + 4t, 0 + 6t]. We substitute into f(4t,6t) = −(4t − 2)2 − (6t − 3)2. We take the first derivative and set it equal to 0, −104t + 52 = 0, and solve for t. We find t = 1/2 and the second derivative is −104 which is less than 0, so we found the value of t that maximizes f(t). Our new point is found by substituting t = ½ into [0 + 4t, 0 + 6t]. Our new point is [2,3]. We evaluate the gradient at [2,3] to find ∇f = [ 0, 0 ]. The magnitude of ∇f is 0, which is less than t = 0.01, so we terminate the search. Our results are [x, y] = [2,3] with f(2,3) = 0.

146



Nonlinear Optimization

GRADIENT SEARCH METHOD WITH TECHNOLOGY Example used will be Example 7.1 from Section 7.3. Example 7.2 Maximize f ( x1, x 2 ) = 2x1x 2 + 2x 2 − x12 − 2x 22 EXCEL UNCONSTRAINED OPTIMIZATION WITH THE SOLVER The Excel solver uses a GRG numerical search method. To solve this problem, we create two decision variables, x 1 and x 2, and initialize them both at 0, say c5 and c6. We then enter the objective function as a function of these cells: C 7 = 2 * c 5 * c 6 + 2 * c 6 − c52 − 2 * c 62 We open the Solver:

Click on Solve and obtain.

Multivariate Optimization Search Techniques ◾

We also obtain an answer report:

MAPLE UNCONSTRAINED OPTIMIZATION We program the algorithm for descent method and then modify f(x1, x 2) for maximization problems by changing f to −f. We enter the starting point, tolerance, and the maximum number of iterations that we will allow. The following screenshot includes the proc program and our example.

147

148 ◾

Nonlinear Optimization

We note the slight difference in our answers between Excel and Maple.

MATLAB UNCONSTRAINED OPTIMIZATION First, we modify the script by James Allison (on file exchange, https://www.mathworks.com/matlabcentral/fileexchange/35535-simplified-gradient-descent-optimization_function [xopt,fopt,niter,gnorm,dx] = grad_descent(varargin) % grad_descent.m demonstrates how the gradient descent method can be used % to solve a simple unconstrained optimization problem. Taking large step % sizes can lead to algorithm instability. The variable alpha below % specifies the fixed step size. Increasing alpha above 0.32 results in

Multivariate Optimization Search Techniques ◾

% instability of the algorithm. An alternative approach would involve a % variable step size determined through line search. % % This example was used originally for an optimization demonstration in ME % 149, Engineering System Design Optimization, a graduate course taught at % Tufts University in the Mechanical Engineering Department. A % % Author: James T. Allison, Assistant Professor, University of Illinois at % Urbana-Champaign Modified by Dr. W. Fox on Sept 2, 2019 % Date: 3/4/12 if nargin==0 % define starting point x0 = [0 0]'; elseif nargin==1 % if a single input argument is provided, it is a user-defined starting % point. x0 = varargin{1}; else error('Incorrect number of input arguments.') end % termination tolerance tol = 1e-6; % maximum number of allowed iterations maxiter = 100; % minimum allowed perturbation dxmin = 1e-6; % step size ( 0.33 causes instability, 0.2 quite accurate) alpha = 0.1; % initialize gradient norm, optimization vector, iteration counter, perturbation gnorm = inf; x = x0; niter = 0; dx = inf; % define the objective function: f = @(x1,x2) -(-x1.^2 +2* x1.*x2 - 2*x2.^2 +2*x2); % plot objective function contours for visualization: figure(1); clf; ezcontour(f,[-5 5 -5 5]); axis equal; hold on % redefine objective function syntax for use with optimization: f2 = @(x) f(x(1),x(2)); % gradient descent algorithm: while and(gnorm>=tol, and(niter = dxmin)) % calculate gradient:

149

150



Nonlinear Optimization

g = grad(x); gnorm = norm(g); % take step: xnew = x - alpha*g; % check step if *isfinite(xnew) display(['Number of iterations: ' num2str(niter)]) error('x is inf or NaN') end % plot current point plot([x(1) xnew(1)],[x(2) xnew(2)],'ko-') refresh % update termination metrics niter = niter + 1; dx = norm(xnew-x); x = xnew; end xopt = x; fopt = f2(xopt); niter = niter - 1; % define the gradient of the objective function g = grad(x) g = [2*x(1)-2*x(2) -2+4*x(2)-2*x(1)];

OBTAIN THE SOLUTION Type >>

[xopt,fopt,niter,gnorm,dx] = grad_descent

We obtain the following output: >> [xopt,fopt,niter,gnorm,dx] = grad_descent xopt = 0.9996 0.9998 fopt = -1.0000 niter = 100 gnorm = 3.7197e-04 dx = 3.7197e-05 >>

Multivariate Optimization Search Techniques ◾ 151

The above figure is a graphical solution to our problem showing the iterations from [0,0] to the terminal point [0.9996, 0.9998].

7.3 Examples of Gradient Search Although the Steepest Ascent method (gradient method) can be used with any multivariable function to find the maximum (or minimum with steepest descent), its main advantage comes when normal calculus procedures fail to be shown in Example 7.3. To illustrate the basic concepts, consider the following example that can be solved with calculus: Example 7.3 Maximize f ( x1, x 2 ) = 2x1x 2 + 2x 2 − x12 − 2x 22 The gradient of f(x1, x 2), ∇f is found using the partial derivatives as shown in the last chapter. The gradient is the vector [2x 2 − 2x1, 2x1 + 2 − 4x 2].

152



Nonlinear Optimization

∇f(0,0) = [0,2]. From (0,0), we move along (up) the x 2 axis in the direction of [0,2]. How far do we go? We need to maximize the function starting at the point (0,0) using the function f(xi + ti ∇f(xi)) = f(0 + 0t, x 0 + 2t) = 2(2t) − 2(2t)2 = 4t − 8t 2 . This function can be maximized be using any of the one-dimensional search techniques that we discussed in Chapter 4. This function can also be maximized by simple single variable calculus: df = 0 = 4 − 16t = 0, t = 0.25 dt The new point is found by substitution into xi+1 = xi + ti ∇f(xi). So, x1 = [0 + 0(.25), 0 + 2(0.25)], [0,0.5] The magnitude of x1 is 0.5 which is not less than our tolerance of 0.01 (chosen arbitrarily). Since we are not optimal, we continue. Repeating the calculations from the new point [0, 0.5]. ITERATION 2 The gradient vector is [2x 2 − 2x1, 2x1 + 2 − 4x 2]. ∇f(0, 0.5) = [1,1]. From (0, 0.5), we move in the direction of [1,0]. How far do we go? We need to maximize the function starting at the new point (0, 0.5) using the function f(xi + ti∇f(xi)) = f(0 + 1t, 0.5 + 0t) = 2(t)(0.5) + 2(0.5) − t 2 − 2(0.5)2 = −t 2 + t + 0.5. This function can also be maximized be using any of the one-dimensional search techniques that we discussed in Chapter 4. This function can also be maximized by simple single variable calculus: df = 0 = −2t + 1 = 0, t = 0.50 dt The new point is found by substitution into xi + 1 = xi + ti ∇f(xi). So, x1 = [0 + 1(0.5)0.5 + 0(0.5)], [0.5, 0.5]. The magnitude of x1 is 0.5 = 0.707, which is not less than our tolerance of 0.01 (chosen arbitrarily). The magnitude of ∇f = 1 which is also not mg:=convert(sqrt(dotprod(rv,rv)),float); > printf("%12.4f",mg); > if(mg=max) then > goto(label_6); > else > numIter:=numIter+1; > fi; > v1:=x1pt+t*rv[1]; > v2:=x2pt+t*rv[2]; > newt:=evalf(subs({x1=v1,x2=v2},f1)); > numfeval:=numfeval+1; > lam:=fsolve(diff(newt,t)=0,t,maxsols=1); > nv1:=evalf(subs({t=lam},v1));

168



Nonlinear Optimization

> nv2:=evalf(subs({t=lam},v2)); > printf(" (%8.4f,%8.4f)%13.4f\n",x1pt,x2pt,lam); > x1pt:=nv1; > x2pt:=nv2; > goto(label_7); > label_6; > printf("\n\n-----------------------------------------"); > printf("---------------------------------------------"); > printf("\n\n Approximate Solution: "); > printf(" (%8.4f,%8.4f)\n",x1pt,x2pt); > Fvalue:=evalf(subs(x1=x1pt,x2=x2pt,f)); > printf(" Maximum Functional Value: "); > printf("%21.4f",Fvalue); > printf("\n Number gradient evaluations:"); > printf("%22d",numgeval); > printf("\n Number function evaluations:"); > printf("%22d",numfeval); > printf("\n\n-----------------------------------------"); > printf("---------------------------------------------"); > end:

7.5.2 Newton’s Method for Optimization in Maple Newton’s Method Steepest:=proc(f,f1::procedure,f2::procedure,n ::posint,tol::numeric,x11::numeric,x22::numeric): > t1:=tol; > print(tol); > x:=x11; > y:=x22; > label_6: > dq:=D[1](f1):q:=evalf(dq(x,y)):#print(q); > dr:=D[2](f1):r:=evalf(dr(x,y)):#print(r); > ds:=D[1](f2):s:=evalf(ds(x,y)):#print(s); > dt:=D[2](f2):t:=evalf(dt(x,y)):#print(t); > printf("\nHessian: [ %8.3f %8.3f ]\n",q,r); > printf(" [ %8.3f %8.3f ]\n",s,t); > A := array([[q,r],[s,t]]); > printf("eigenvalues:%8.3f %8.3f\n", evalf(Eigenvals(A)) [1],evalf(Eigenvals(A))[2]); > A := matrix(2,2, [q,r,s,t]); > printf("pos def: %s\n",definite(A, 'positive_def')); > u:=-1*f1(x,y):#print(u); > v:=-1*f2(x,y):#print(v); > dv:=q*t-r*s:#print(dv); > newx:=x+((u*t-v*r)/dv); > newy:=y+((q*v-s*u)/dv);

Multivariate Optimization Search Techniques ◾ 169 > xx:=x; > yy:=y; > x:=newx; > y:=newy; > printf("new x=%8.3f new y=%8.3f\n",newx,newy); > if (evalf(sqrt((newx-xx)^2+(newy-yy)^2)) < t1) then > printf("\n\nfinal new x=%8.3f final new y=%8.3f\n",newx,newy); > printf("final fvalue is %8.3f",evalf(subs({x1=newx,x2=newy },f))); > else > goto(label_6); > end if; > #print(newx,newy); > end:

Newton’s Method 1 Newtons:=proc(f,n::posint,tol::numeric,x11::numeric,x22::nume ric): t1:=tol; #print(tol); df1:=diff(f,x1); df2:=diff(f,x2); f1:=unapply(df1,x1,x2); f2:=unapply(df2,x1,x2); x:=x11; y:=x22; label_6: dq:=D[1](f1):q:=evalf(dq(x,y)):print(q); dr:=D[2](f1):r:=evalf(dr(x,y)):print(r); ds:=D[1](f2):s:=evalf(ds(x,y)):print(s); dt:=D[2](f2):t:=evalf(dt(x,y)):print(t); printf("\nHessian: [ %8.3f %8.3f ]\n",q,r); printf(" [ %8.3f %8.3f ]\n",s,t); A := array([[q,r],[s,t]]); printf("eigenvalues:%8.3f %8.3f\n", evalf(Eigenvals(A)) [1],evalf(Eigenvals(A))[2]); A := Matrix(2,2, [q,r,s,t]); printf("nef def: %s\n",IsDefinite(A, 'query' = 'negative_semidefinite')); u:=-1*f1(x,y):print(u); v:=-1*f2(x,y):print(v); dv:=evalf(q*t-r*s):#print(dv); newx:=evalf(x+((u*t-v*r)/dv)); newy:=evalf(y+((q*v-s*u)/dv)); xx:=x:print(xx); yy:=y; x:=newx;

170



Nonlinear Optimization

y:=newy; printf("new x=%8.3f new y=%8.3f\n",x,y); if (evalf(sqrt((newx-xx)^2+(newy-yy)^2)) < t1) then printf("\n\nfinal new x=%8.3f final new y=%8.3f\n",newx,newy); printf("final fvalue is %8.3f",evalf(subs({x1=newx,x2=newy },f))); else goto(label_6); end if; #print(newx,newy); end:

Exercises 1. Given: MAX f(x,y) = 2xy + 2y − 2x 2 − y2 Assume our tolerance for the magnitude of the gradient is 0.10. a. Start at the point (x,y) = (0,0). Perform 2 complete iterations of gradient search. For each iteration clearly show Xn, Xn+1, ∇f(Xn), and t*. Justify that we will eventually find the approximate maximum. b. Use Newton’s method to find the maximum starting at (x,y) = (1,1). Clearly show Xn, Xn+1, ∇f(Xn), and H−1 for each iteration. Clearly indicate when the stopping criterion is achieved. 2. Given: MAX f(x,y) = 3xy − 4x 2 − 2y2 Assume our tolerance for the magnitude of the gradient is 0.10. a. Start at the point (x,y) = (1,1). Perform 2 complete iterations of gradient search. For each iteration clearly show Xn, Xn+1, ∇f(Xn), and t*. Justify that we will eventually find an approximate maximum. b. Use Newton’s method to find the maximum starting at (x,y) = (1,1). Clearly show Xn, Xn+1, ∇f(Xn), and H−1 for each iteration. Clearly indicate when a stopping criterion is achieved. 3. Apply the modified Newton’s method (multivariable) to find the following: a. MAX f(x,y) = −x 3 + 3x + 8y − 6 y2 start at (1,1). Why can’t we start at (0,0)? b. MIN f(x,y) = −4x + 4x 2 − 3y + y2 start at (0,0). c. Perform 3 iterations to MIN f(x,y) = (x−2)4 +(x−2y)2, start at (0,0). Why is this problem not converging as quickly as problem (b)? 4. Use the gradient search to find the approximate minimum to f(x,y) = (x − 2)2 + x + y2. Start at (2.5,1.5).

Multivariate Optimization Search Techniques ◾

171

Projects Chapter 7 1. Write a computer program in Maple that uses a one dimensional search algorithm, say Golden section search, instead of calculus to perform iterations of gradient search. Use your code to find the maximum of f ( x , y ) = xy – x 2 – y 2 − 2x – 2 y + 4 2. Write a computer program in Maple that uses a one dimensional search a lgorithm, say Fibonacci search, instead of calculus to perform iterations of gradient search. Use your code to find the maximum of f ( x , y ) = xy – x 2 – y 2 − 2x – 2 y + 4

References and Suggested Reading Bazarra, M., C. Shetty, and H.D. Scherali, 1993. Nonlinear Programming: Theory and Applications. New York: Wiley. Fox, W., 1992. Teaching nonlinear programming with Minitab. COED Journal, Vol. II(1), pages 80–84. Fox, W.P., 1993. Using microcomputers in undergraduate nonlinear optimization. Collegiate Microcomputer, Vol. XI(3), pages 214–218. Fox, W. and J. Appleget, 2000. Some fun with Newton’s Method. COED Journal, Vol. X(4), pages 38–43. Fox, W.P. and W. Richardson, 2000. Mathematical modeling with least squares using MAPLE, Maple Application Center, Nonlinear Mathematics, October 2000. Fox, W.P., F. Giordano, S. Maddox, and M. Weir, 1987. Mathematical Modeling with Minitab. Monterey, CA: Brooks/Cole. Fox, W.P., F. Giordano, and M. Weir, 1997. A First Course in Mathematical Modeling, 2nd Edition. Monterey, CA: Brooks/Cole. Meerschaert, M., 1993. Mathematical Modeling. San Diego, CA: Academic Press. Phillips, D.T., A. Ravindran, and J. Solberg, 1976. Operations Research. New York: John Wiley & Sons.Rao, S.S., 1979. Optimization: Theory and Applications, New Delhi, India: Wiley Eastern Limited. Press, W.H., B. Flannery, S. Teukolsky, and W. Vetterling, 1987. Numerical Recipes, New York: Cambridge University Press, pages 269–271. Winston, W., 1995. Introduction to Mathematical Programming: Applications and Algorithm, 2nd Edition. Boston, MA: Duxbury Press, ITP.

Chapter 8

Optimization with Equality Constraints 8.1 Introduction A company manufactures new E-phones that are supposed to capture the market by storm. The two main inputs components of the new E-phone are the circuit board and the relay switches that make the phone faster, smarter, and have more 1

1

memory. The number of E-phones to be produced is estimated to E = 200a 2 b 4 , where E is the number of phones produced while a and b are the number of circuit board hours and the number of relay hours worked, respectively. Such a function is known to Economists as a Cobb-Douglas function. Our laborers are paid by the type of work they do: the circuit boards and the relays for $5 and $10 an hour. We want to maximize the number of E-phones to be made if we have $150,000 to spend on these components in the short run. Problems such as this can be modeled using constrained optimization. We begin our discussion with equality constrained optimization and then we discuss inequality constrained optimization.

8.2 Equality Constraints Method of Lagrange Multipliers Lagrange multipliers can be used to solve nonlinear optimization problems (called NLPs) in which all the constraints are equality constrained. We consider the following type of NLPs: Max ( Min ) z = f ( x1 , x 2 , x3 ,#, xn

(8.1) 173

174



Nonlinear Optimization

Subject to: g1 ( x1 , x 2 ,…, xn ) = b1 g 2 ( x1 , x 2 ,…, x n ) = b2  g m ( x1 , x 2 ,…, x n ) = bm In our E-phones example, we find that we can build an equality constrained model. We want to maximize 1

1

E = 200a 2 b 4 Subject to the constraint 5a + 10b = 150,000

8.3 Introduction and Basic Theory In order to solve NLPs in the form of (8.1), we associate a Lagrangian multiplier, λi, with the ith constraint and form the Lagrangian equation. We adopt the bi − gi(X ) format: m

L( X , λ ) = f (X ) +

∑ λ (b − g (X )) i

i

i

(8.2)

i=1

The computational procedure for Lagrange multipliers requires that all the partials of this Lagrangian function must equal zero. These partials are the necessary conditions of the NLP problem. These are the conditions required for x = {x1, x2, …, xn} to be a solution to (8.1). The Necessary Conditions ∂L / ∂ Xj = 0 ∂ L / ∂λi  = 0

( j = 1, 2,#,n variables )

(8.3a)

(i = 1, 2,#, mconstraints)

(8.3b)

Definition 8.1 x is a regular point if and only if ∇gi(x), I = 1,2,…,m are linearly independent.

Optimization with Equality Constraints



175

Theorem 8.1 a. Let (8.1) be a maximization problem. If f is a concave function and each gi(x) is a linear function, then any point satisfying (8.3a,b) will yield an optimal solution. b. Let (8.1) be a minimization problem. If f is a convex function and each gi(x) is a linear function, then any point satisfying (8.3a,b) will yield an optimal solution. Recall from Chapter 6 that we used the Hessian matrix to determine if a function was convex, concave, or neither. We also note that the above theorem limits our constraints to linear functions. What if we have nonlinear constraints? We can use the bordered Hessian in sufficient conditions. Given the bivariate Lagrangian function as in L ( x1 , x 2 , λ ) = f ( x1 , x 2 ) +

m

∑ λ (b − g ( x , x )) i

i

1

2

i=1

The bordered Hessian is ⎡ 0 ⎢ BdH = ⎢ g1 ⎢ ⎢⎣ g 2

g1

g2

f 11 − λ g11

f 12 − λ g12

f 21 − λ g 21

f 22 − λ g 22

⎤ ⎥ ⎥ ⎥ ⎦⎥

We find the determinant of this bordered Hessian as Equation (8.4) ⎡ 0 ⎢ | BdH | = det ⎢ g1 ⎢ ⎢⎣ g 2

g1

g2

f 11 − λ g11

f 12 − λ g12

f 21 − λ g 21

f 22 − λ g 22

⎤ ⎥ ⎥ ⎥ ⎥⎦

= g1 g 2 ( f 21 − λ g 21 ) + g 2 g 1 ( f 12 − λ g 12 ) − g 22 ( f 11 − λ g11 ) − g12 ( f 22 − λ g 22 ) (8.4) The sufficient condition for a maximum, in the bivariate case with one constraint, is the determinant of its bordered Hessian is positive when evaluated at the critical point. The sufficient condition for a minimum, in the bivariate case with one constraint, is the determinant of its bordered Hessian is negative when evaluated at the critical point.

176



Nonlinear Optimization

If x is a regular point and gi(x) = 0 (constraints are satisfied), then M = {y|∇ gi(x) ∙ y = 0} defines a plane tangent to the feasible region at x. Lemma 8.1 If x is regular and gi(x) = 0, and ∇gi(x) ∙ y = 0, then ∇f(x) ∙ y = 0. Note that the Lagrange multiplier conditions are exactly the same for a minimization problem as a maximization problem. This is the reason that these conditions alone are not sufficient conditions. Thus, a given solution can either be a maximum or a minimum. In order to determine whether the point found is a maximum, minimum, or saddle point, we will use the Hessian. The Lagrange multiplier, λ, has an important modeling interpretation. It is the “shadow price” for scarce resources. Thus, λi is the shadow price of the ith constraint. Thus, if the right-hand side is increased by a small amount Δ, in a maximization or a minimization problem, then the optimal solution will change by λiΔ. We will illustrate the shadow price both graphically and computationally.

8.4 Graphical Interpretation of Lagrange Multipliers The method of Lagrange multipliers is based on its geometric interpretation. This geometric interpretation involves the gradients of both the function and the constraints. Initially, let’s consider only one constraint, g ( x1 , x 2 ,#, xn ) = b So that the Lagrangian equation simplifies to ∇f = λ∇g . The solution is the point in x where the gradient vector, ∇g(x), is perpendicular to the surface. The gradient vector, ∇f, always points in the direction in which f increases fastest. At both maximums and minimums, this direction must also be perpendicular to S. Thus, since both ∇f and ∇g point along the same perpendicular line, then ∇f = λ∇g. In the case of multiple constraints, the geometrical arguments are similar (Figure 8.1). Let’s preview a graphical solution to our first computational example. Maximize z = −2 x 2  −2 y 2 + xy + 8x + 3 y s.t. 3x + y = 6

Optimization with Equality Constraints



177

Direction of increase

Optimal point where the constrain tis tangent to level curve

Figure 8.1

One equality constraint.

We obtained a contour plot of z from Maple and overlaid the single constraint onto the contour plot (see Figure 8.2). What information can we obtain from this graphical representation? First, we note that the unconstrained optimal does not lie on the constraint. We can estimate the unconstrained optimal (x*,y*) = (2.3, 1.3).

Figure 8.2 Contour plot of function, f, and the equality constraint example g(x) = 3x + y = 6.

178



Nonlinear Optimization

Figure 8.3

Added to resource, g(x) = 8.45 = 3x + y.

The optimal constrained solution lies at the point where the constraint is tangent to a contour of the function, f. This point is labeled, X*, and is estimated as (1.8, 1.0). We see clearly that the resource does not pass through the unconstrained maximum and thus it can be modified (if feasible) until the line passes through the unconstrained solution. At that point, we would no longer add (or subtract) any more resources (see Figure 8.3). We can gain valuable insights about the problem if we are able to plot the information.

8.5 Computational Method of Lagrange Multipliers Consider the set of equations in (8.3). This gives m + n equations in the m + n unknowns (xj, λi). Generally speaking, this is a difficult problem to solve without a computer. Also, since the Lagrange multipliers are necessary conditions only (not sufficient), we may find solutions (xj, λi) that are not optimal for our NLP. We need to be able to determine the classification of points found in the solution to the necessary conditions. Commonly used methods as justification are as follows: a. Hessian matrix b. Bordered Hessian, Det [BH] = 0. We will illustrate these, where feasible, in the following examples with Maple.

Optimization with Equality Constraints



Example 8.1 (revisited) Maximize z = −2 x 2 − 2 y 2 + xy + 8x + 3 y s.t. 3x + y = 6 We set up the Langrangian, L: L = −2 x 2 − 2 y 2 + xy + 8x + 3 y − λ (6 − 3x − y) We obtain a 3D plot with contours as shown in Figure 8.4. The Lagrangian function is L( x , y, λ ) = 2x 2 + y 2 − xy − 8x − 3 y + λ ⎡⎣3x + y − 6 ⎤⎦ The necessary conditions are Lx = 4 x − y + 8 + 3λ = 0 Ly = 2 y − x − 3 + λ = 0 Lλ = 3 x + y − 6 = 0 We solve these three linear equations and obtain λ = 0.76089, x = 1.6739, y = 0.97826, L = 10.4456. Our Hessian and bordered Hessian are negative definite, so we found the maximum. The interpretation of the λ is essential. It is interpreted as an increase in our constraint by one unit from 3x + y = 6 to 3x + y = 7 that our objective function increases (because λ is positive) from 10.4456 to 10.4456 + 0.7089 (8.1).

Figure 8.4

3D plot with contours.

179

180



Nonlinear Optimization

Lagrange Method with Technology In our example, we had three equations and three unknowns to solve. There exist several commands in Maple to obtain solutions depending on the output we are looking to achieve. We will illustrate various methods to include with(Student[Mul tivariateCalculus]), with(Optimization), with(Maximization), with(Minimization), as well as a set of commands to use the CAS procedures in Maple. Student[MultivariateCalculus][LagrangeMultipliers] - solve types of optimization problems using the method of Lagrange multipliers Calling Sequence LagrangeMultipliers(f(x,y,..), [g(x,y,..), h(x,y,..),..], [x,y,..], opts) Parameters f(x,y,..) - algebraic expression; objective function [g(x,y,..), h(x,y,..),..] - algebraic expression; constraint functions, assumed equal to 0 [x,y,..] - list of names; independent variables opts - (optional) equation(s) of the form option=value where option is one of constraintoptions, levelcurveoptions, pointoptions, output, showconstraints, showlevelcurves, showpoints, title, or view; output options

The Maple command with(Optimization) is used. Under the optimization options, we find: Optimization[NLPSolve] - solve a nonlinear program Calling Sequence NLPSolve(obj, constr, bd, opts) NLPSolve(opfobj, ineqcon, eqcon, opfbd, opts) Optimization[Minimize] - minimize an objective function, possibly subject to constraints Optimization[Maximize] - maximize an objective function, possibly subject to constraints Calling Sequence Minimize(obj, constr, bd, opts) Maximize(obj, constr, bd, opts) Minimize(opfobj, ineqcon, eqcon, opfbd, opts) Maximize(opfobj, ineqcon, eqcon, opfbd, opts)

We might also just use Maple as a computational tool. We will illustrate all three methods. We also point out that critical elements of the solutions with the Lagrange multiplier are the value and interpretation of the multiplier, λ. We can set up in Maple a system of equations and obtain a solution.

Optimization with Equality Constraints



181

Method 1 Set up the equations and use Maple as a computational tool. >L := -2*x^2-2*y^2+x*y+8*x+3*y+l*(6-3*x-y); L := -2 x2 - 2 y2 + x y + 8 x + 3 y + l (6 - 3 x - y) >LDx := diff(L, x); LDx := -4 x + y + 8 - 3 l >LDy := diff(L, y); LDy := -4 y + x + 3 - l >LDl := diff(L, l); LDl := 6 - 3 x - y >fsolve({LDl = 0, LDx = 0, LDy = 0}, {l, x, y}); {l = 0.7608695652, x = 1.673913043, y = 0.9782608696} >subs({l = .7608695652, x = 1.673913043, y = .9782608696}, L); 10.44565217

We find the solution to all variables and the Lagrange multiplier. Method 2 Using NLPSolve >obj1 := -2*x^2+x*y-2*y^2+8*x+3*y; obj1 := -2 x2 + x y - 2 y2 + 8 x + 3 y >with(Optimization); >NLPSolve(obj1, {3*x+y = 6}, maximize, assume = nonnegative); [10.4456521739130430, [x = 1.67391304347826, y = 0.978260869565217]]

We obtain the solution, but we do not capture the value of the Lagrange multiplier. Caution: If we try to fix the problem by placing the Lagrangian function into the solver, we do not get the same value of λ. >NLPSolve(L, {3*x+y = 6}, maximize, assume = nonnegative); [10.4456521739130430, [l = 1.00000000000000, x = 1.67391304347826, y = 0.978260869565217]]

Method 3 Maximization or Minimization function >Maximize(obj1, {3*x + y = 6}); [10.4456521739130, [x = 1.67391304347826, y = 0.978260869565217]]

We obtained the correction solution without the value of the multiplier, λ. Caution. Again if we try to remedy this we do not get the correct value of λ.

182



Nonlinear Optimization

> Maximize (L, {3x + y = 6}, assume = nonnegative); Warning, problem appears to be unbounded [10.4456521739130, [l = 0., x = 1.67391304347826, y = 0.978260869565217]]

We recommend setting up the Lagrangian and solving using Maple as a computational tool for these problems or using the with(Student[MultivariateCalculus]) routine. Method 4 with(Student[MultivariateCalculus]): >with(Student[MultivariateCalculus]); >LagrangeMultipliers(obj1, [3*x+y-6], [x, y], output = detailed); [ x = 77/46, y = 45/46. λ1= 35/46. Obj1= 961/92] >evalf(%); [[x = 1.673913043, y = 0.9782608696, λ1 = 0.7608695652,

We may obtain a plot: > with (Student [Multivariate Calculus]): > Lagrange Multipliers (obj, [3x+ y − 6], [x, y], output = plot, showlevelcurves = true)

Optimization with Equality Constraints



183

Thus, our solution is x* = 1.673913043 y* = 0.978260896

λ * = 0.7608695652 We evaluate the function to obtain its value of f(x*, y*) = 10.4456. We have a solution, but we need to know whether this solution represents the maximum or the minimum of the Lagrangian function. We use the Hessian matrix in our justification. We could use either the Hessian or the bordered Hessian described below to justify that we have found the correct solution to our problem to maximize L. 1. Hessian (a) if you have with(linalg) use the following commands: > h:=hessian(L,[x,y]);

⎡ −4 h := ⎢ ⎣ 1

1 ⎤ −4 ⎥⎦

> det(h);

15 The Hessian is negative definite for all values of (x,y), so the regular point, also called the stationary point (x*, y*), is a maximum. (b) with(VectorCalculus): > with (VectorCalculus): > h1: = Hessian(L,[x,y]);

⎡ −4 h1 := ⎢ ⎣ 1

1 ⎤ 4 ⎥⎦

The determinant is 15, so the Hessian is negative definite. > det(h1);

15

184



Nonlinear Optimization

(2) Bordered Hessian. > bdh:= matrix ([−4, 1, −3],[1, −4, −1],[−3, −1,0]);

⎡ −4 ⎢ bdh := ⎢ 1 −3 ⎣⎢

1 −4 −1

−3 −1 0

⎤ ⎥ ⎥ ⎥⎦

> det (bdh);

46 Since the determinant is positive, we have found the maximum at the critical point. Either method of the Hessian and bordered Hessian works in this example to determine that we have found the maximum of L. Now, let’s interpret the shadow price, λ = 0.76. If the right-hand side of the constraint is increased by a small amount Δ, then the function will increase by 0.76Δ. Since this is a maximization problem, we would add to the resource if possible because it improves the value of the objective function. From the graph, it can be seen that the incremental change must be small or the objective function will begin to increase. If we increase the RHS by one unit so that g(x) = 3x + y = 7, the solution at the new point (x**, y**) should yield a functional value, f(x**, y**) ≈ old f + λ = 5.810625 + 5.075 = 10.885625 . In actuality, changing the constraints yields an actual solution of 11.04347826. The increase was about 0.60. Example 8.2: Multiple Constraints Consider the following problem: Minimize w = x 2 + y 2 + 3 * z s.t.

x+ y=3

x + 3 y + 2z = 7 We will illustrate only two methods: CAS and with(Student[MultivariateCalculus]). Method 1: CAS L ( x , y , z, λ1, λ2 ) = x 2 + y 2 + 3z + λ1 ⎡⎣ x + y − 3⎤⎦ + λ2 ⎡x ⎣ + 3 y + 2z − 7 ⎤⎦

Optimization with Equality Constraints

◾ 185

2 2 > f 1 := x + y + 3z;

f 1 := x 2 + y 2 + 3z

> c1 := 3 − x − y;

c1 := 3 − x − y;

(

)

> c 2 := 7 − x + 3 y + 2z ;

c 2 := 7 − ( x + 3 y + 2z ) ; > L := f 1 + l1 ⋅ c1 + l 2 ⋅ c 2

L := x 2 + y 2 + 3z + l1 ( x + y − 3) + 12 ( 7 x − 3 y − 2z ) > nc:=grad(L, [x, y, z, l1,l2]);

nc := > lg sol := solve

{{zx + l

({2x + l

1

1

}

− l 2 , 2 y + l1 − 3l 2 , x + y − 3, 7 − x − 3 y − 2z }

+ l 2 , 2 y + l1 + 3l 2 ,3 + 2l 2 , x + y − 3, x + 3 y + 2z − 7} ,

{ x , y , z,l1,l 2 }) ; lgsol := ⎡x ⎣ = 0.75, y = 2.25, z = −0.25l1 = 0,l 2 = −1.5⎤⎦ > subs

({l

2

= 1.500000000, z = −0.250000000,l 2 = 0, y = 2.250000000,

)

x = 0.7500000000} , L ; Justification with the Hessian

(

)

> h2 := Hessian L ,[x, y, z ] ;

⎡ 2 ⎢ h2 := ⎢ 0 ⎢⎣ 0

0 2 0

0 0 2

⎤ ⎥ ⎥ ⎥⎦

The Hessian is always positive semi-definite. The function is convex, and our critical point is a minimum.

186



Nonlinear Optimization

Method 2 > with (Student[Multivariate Calculus]): > Lagrange Multipliers (f1 [cons1, cons2], [x, y, z], output = detailed);

x=

3 9 1 3 39 , y = , z = − , λ1 = 0, λ2 = − , x 2 + y 2 + 3z = 4 4 4 2 8

> evalf(%);

⎡x ⎣ = 0.7500000000, y = 2.250000000, z = −0.2500000000, λ1 = 0, λ2 = −1.500000000, x 2 + y 2 + 3z = 4.875000000⎤⎦ Justification with the Hessian: > h2:=Hessian(L,[x, y, z]);

⎡ 2 ⎢ h2 := ⎢ 0 ⎢⎣ 0

0 2 0

0 0 0

⎤ ⎥ ⎥ ⎥⎦

The Hessian is always positive semi-definite. The function is convex, and our critical point is a minimum. Let us interpret the shadow prices, λ1 and λ2. If we could only spend an extra dollar on one of the two resources, which one would we spend it on? The values of the shadow prices are 0 and 1.5, respectively. Since the shadow price is ∂w / ∂b , we would not spend an extra dollar on resource number 2 because it will cause the objective function to increase by ∼$1.50. Lagrange Multipliers with Excel Steps using the Solver are as follows: 1. Define and initialize the decision variables (set initially at 0). 2. Define the objective function as a function of the decision variable cells. 3. Define the equality constraints, each as function of the decision variable cells. Place the right-hand-side values into a separate cell. 4. Highlight the objective function, open the Solver, into the cells for the variables that change (the decision variables), enter the constraints, click on of off the non-negativity, and solve. 5. Save the answer and sensitivity analysis sheets. We note from Figures 8.5 and 8.6 that we have found the solution and Lagrange multipliers.

Optimization with Equality Constraints

Figure 8.5

Screenshot of Excel.

Figure 8.6

Sensitivity analysis report from Excel.

◾ 187

188



Nonlinear Optimization

8.6 Applications with Lagrange Multipliers Example 8.3: Cobb–Douglas Function Recall the problem suggested in the introduction of the chapter. A company manufactures new E-phones that are supposed to capture the market by storm. The two main inputs components to the E-phone are the circuit board and the relay switches. The number of E-phones produced is estimated to E = 300a1/2b1/4 , where E is the number of phones produced while a and b are the number of circuit board hours and the number of relay hours worked, respectively. Such a function is known to Economists as a Cobb–Douglas function. Our laborers are paid by the type of work they do: the circuit boards and the relays for $6 and $12 an hour. We want to maximize the number of E-phones to be made if we have $160,000 to spend on these components in the short run. We set up the Lagrangian as L = 300 A 0.5B 0.25 + λ (160,000 − 6A − 12B ) Our partial derivatives are ∂L 150B 0.25 = − 6λ ∂A A 0.5

and

∂L 75A 0.5 = 0.75 − 12λ ∂B B The partial with respect to λ yields the constraint 160000-6A-12B. The solution to all three partials set equal to 0 is a = 17,777.78, b = 4444.44, λ = 1.5309, and the value of the function is 326,598.63. Thus, we find that we can make 326,598.63 E-phones using 17,777.78 relays and 4,444.44 circuit board hours of labor. We also see that an increase of one unit in dollars value nets us an increase in production of 1.5309 E-phones. We set up our boarded Hessian and find the determinant is positive (determinant = 0.223209), so we found a maximum. The bordered Hessian, h1 is shown as ⎡ −0.0002583446215 ⎢ h1 := ⎢ 0.0005166892429 ⎢ −6 ⎣

0.0005166892429 −0.003100135456 −12

−6 −12 0

⎤ ⎥ ⎥ ⎥ ⎦

Example 8.4: Oil Transfer You are employed as a consultant for a small oil transfer company. The management desires a minimum cost policy due to the restricted tank storage space. Historical records have been studied and a formula has been derived that describes system costs:

Optimization with Equality Constraints

∑( X N

f (X ) =

An Bn )

n=1

+



(Hn X n )

n

2

where: An is the fixed costs for the nth item. Bn is the withdrawal rate per unit time for the nth item. Hn is the holding costs per unit time for the nth item. The tank space constraint is given by: N

g(x ) =

∑t X n

n

=T

n=1

where: tn is the space required for the nth item (in correct units) T is the available tank space (in correct units) You determine the following information: Item (n)

An ($)

Bn

Hn ($)

tn (ft3)

1

9.6

3

0.47

1.4

2

4.27

5

0.26

2.62

3

6.42

4

0.61

1.71

You measure the storage tanks and find only 22 ft3 of space available. We want to find the optimal solution as a minimum cost policy. First, we will solve the unconstrained problem. If we assume that λ = 0, we find an unconstrained optimal solution. L = ( 9.6 )

3 x 5 y 4 z + ( 0.47 ) + ( 4.27 ) + ( 0.26 ) + ( 6.42 ) + ( 0.61) x 2 y 2 z 2

We take all the partial derivatives and set them equal to zero. ∂L 1 = −28.8 2 + 0.235 = 0 ∂x x ∂L 1 = −21.35 2 + 0.130 = 0 ∂y y

189

190



Nonlinear Optimization

∂L 1 = −25.68 2 + 0.305 = 0 ∂z z The solution that we want is the one where x, y, z > 0, although we have many solutions.

{z = −9.175877141, y = −12.81525533, x = −11.07037450} { x = 11.07037450, z = −9.175877141, y = −12.81525533} , {z = 9.175877141, y = −12.81525533, x = −11.07037450} ,

{z = 9.175877141, x = 11.07037450, y = −12.81525533} , { y = 12.81525533, z = −9.175877141, x = 11.07037450} , { x = 11.07037450, y = 12.81525533, z = −9.175877141} , {z = 9.175877141, y = 12.81525533, x = −11.07037450}

{z = 9.175877141, x = 11.07037450, y = 12.81525533} The only useful solution is where x, y, z > 0, x = 11.07037450, y = 12.81525533, z = 9.175877141 This solution is (x*, y*, z*) = (11.07, 12.82, 9.176). This solution provides an unconstrained upper bound since those values do not satisfy the constraint, 1.4x + 2.62 y + 1.71z = 22. We set up the constrained model as follows. Let x = item 1 y= item 2 z = item 3 L( x , y, z, λ ) = ( 9.6 )( 3) / x + 0.47x / 2 + ( 4.27 )( 5) / y + 0.26 y / 2 + ( 6.42 )( 4 ) / z + 0.61z / 2 + λ ⎡⎣1.4x + 2.62 y + 1.71z − 22⎤⎦ Lλ  = 1.4 x + 2.62 y + 1.71z − 22 = 0 Lλ  = 1.4 x + 2.62 y + 1.71z − 22 = 0 Lλ  = 1.4 x + 2.62 y + 1.71z − 22 = 0 Lλ  = 1.4 x + 2.62 y + 1.71z − 22 = 0

Optimization with Equality Constraints

◾ 191

We solve these equations to obtain x = 4/76027695. Y = 0.213131453, z = 4.044536153, and λ = 0.7396771332. The value of the objective function is L = 21.8136118. Do we have the minimum? The Hessian matrix, H, is as follows: > h:=hessian(L,[x,y,z]);

⎡ 1 ⎢ 57.6 3 x ⎢ ⎢ h := ⎢ 0 ⎢ ⎢ ⎢ 0 ⎣⎢

0

0

42.70

1 y3

0

0

51.36

1 z3

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦⎥

> h1:=det(h);

h1 := 126320.9472

1 x 3 y 3z 3

> subs({y = 3.213131453, z = 4.044536153, x = 4.761027695, l1 = .7396771332},h1);

0.533123053 The Hessian is positive definite at our critical point x = 4.761027695 , y = 3.213131453, z = 4.044536153, x = 4.761027695. Therefore, the solution found is the minimum for this convex function. Should we add storage space? We know from the unconstrained solution that, if possible, we would add storage space to decrease the costs. Additionally, we have found the value of λ, 0.7396771332, which suggests that any small increase (Δ) in the RHS of the constraint causes the objective function to decrease by ~0.74Δ. The cost of the extra storage tank would have to be less than the savings incurred by adding the tank.

Exercises 1. Solve the following constrained problems a)

Minimize x 2 + y 2 subject to x + 2 y = 4

b)

Maximize ( x − 3) + ( y − 2 ) 2

subject to x + 2 y = 4

2

192 ◾

c)

Nonlinear Optimization

Maximize x 2 + 4 xy + y 2

d) Maximize x 2 + 4 xy + y 2

subject to x 2 + y 2 = 1

subject to x 2 + y 2 = 4 x + 2y = 4

2. Maximize 3x 2 + y 2 + 2xy + 6x + 2 y 2x − y = 4

subject to

Did you find the maximum? Explain. 3. Find and classify the extrema for f ( x, y, z ) = x 2 + y 2 + z 2 s.t. x 2 + 2 y 2 − z 2 = 1 4. Given two manufacturing processes both use resource b. We want to Maximize f 1 ( x1 ) + f 2 ( x 2 ) subject to x1 + x 2 = b If f 1(x1) = 50 − (x1 − 2)2 and f 2(x 2) = 50 − (x 2 − 2)2, analyze this process to a. determine the amount of x1 and x 2 to use to maximize the process. b. determine the amount of resource, b, to use. 5. Maximize Z = −2 x 2 − y 2 + xy + 8x + 3 y subjected to: 3 x + y = 10 x 2 + y 2 = 16 6. Use the method of Lagrange multipliers to find the maxima for Maximize

f ( x , y,w ) = xyw

subjected to: 2x + 3 y + 4w = 36 Determine how much f(x, y, w) would change if one more unit was added to the constraint.

Optimization with Equality Constraints



193

Projects 1. Suppose a newspaper publisher must purchase three kinds of paper stock. The publisher must meet their demand, but desire to minimize their costs in the process. They decide to use an Economic Lot Size model to assist them in their decisions. Given an Economic Order Quantity (EOQ) Model with constraints where the total cost is the sum of the individual quantity costs: C (Q1 , Q 2 ,Q3 ) = C (Q1 ) + C (Q 2 ) + C (Q 3 ) C (Qi ) = ai d i Qi + hiQi 2 where: d is the order rate. h is the cost per unit time (storage). Q/2 is the average amount on hand. a is the order cost. The constraint is the amount of storage area available to the publisher so that he can have the three kinds of paper on hand for use. The items cannot be stacked, but can be laid side by side. They are constrained by the available storage area, S. The following data are collected: Type I

Type II

Type III

d

32 rolls/week

24

20

a

$25

$18

$20

h

$1/roll/week

$1.5

$2.0

s

4 sqft/roll

3

2

You have 200 sq ft of storage space available. Required: a. Find the levels of quantity that are the Unconstrained minimum total cost and show that these values would not satisfy the constraint. What purpose do these values serve? b. Find the constrained optimal solution by using the Lagrange multipliers assuming we will use all 200 sq feet. c. Find and interpret the shadow prices. 2. Resolve the tank storage problem to determine whether it is better to have a cylindrical storage space or rectangular storage space of 50 cubic units.

194 ◾

Nonlinear Optimization

3. Suppose, you want to use the Cobb–Douglass function P(L, K ) = ALa Kb to predict output in thousands, based upon amount of capital and labor used. Suppose you know the price of capital and labor per year are $10,000 and $7,000, respectively. Your company estimates the values of A as 1.2, a = 0.3 and b = 0.6. Your total cost is assumed to be T = PL*L + Pk*k, where PL and Pk are the prices of capital and labor. There are three possible funding levels: $63,940, $55,060, and $71,510. Determine which budget yields the best solution for your company. Interpret the Lagrange multiplier.

References and Suggested Reading Bazarra, M., C. Shetty, and H.D. Scherali, 1993. Nonlinear Programming: Theory and Applications. New York: Wiley. Fox, W.P., F. Giordano, S. Maddox, and M. Weir, 1987. Mathematical Modeling with Minitab. Monterey, CA: Brooks/Cole. Fox, W.P. and W. Richardson, 2000. “Mathematical Modeling with Least Squares using MAPLE”, Maple Application Center, Nonlinear Mathematics, October 2000. Fox, W.P., F. Giordano, and M. Weir, 1997. A First Course in Mathematical Modeling, 2nd Edition. Monterey, CA: Brooks/Cole. Fox, W.P. and W. Bauldry, 2020. Problem Solving with Maple, Volume 2. Boca Raton, FL: Taylor & Francis Publishers. Meerschaert, M. 1993. Mathematical Modeling. San Diego, CA: Academic Press. Phillips, D.T., A. Ravindran, and J. Solberg, 1976. Operations Research. New York: John Wiley & Sons. Rao, S.S., 1979. Optimization: Theory and Applications. New Delhi, India: Wiley Eastern Limited. Winston, W., 1995. Introduction to Mathematical Programming: Applications and Algorithm, 2nd Edition. Boston, MA: Duxbury Press, ITP.

Chapter 9

Inequality Constraints: Necessary/Sufficient Kuhn–Tucker Conditions (KTC) 9.1 Introduction to KTC In the previous sections, we investigated procedures to solve problems with equality constraints. The method of Lagrange multipliers provided a methodology to solve NLP problems of the following type:

Max ( Min ) z = f ( x1 , x 2 , x3 ,#, xn ) Subject to: g1 ( x1 , x 2 ,…, xn ) = b1 g 2 ( x1 , x 2 ,…, xn ) = b2  g m ( x1 , x 2 ,…, xn ) = bm

195

196 ◾

Nonlinear Optimization

However, in most realistic problems, many of the constraints are inequalities. These constraints form the boundaries for the solution. The generic form of the NLP we will study in this chapter is Max ( Min ) z = f ( x1 , x 2 , x3 ,#, xn ) Subject to:   g1 ( x1 , x 2 ,..., xn ) < b1 g 2 ( x1 , x 2 ,…, xn ) < b2  g m ( x1 , x 2 ,…, xn ) < bm

(9.1a)

One method to solve NLPs of this type, equation (9.1a), is the Kuhn–Tucker Conditions (KTC). In this chapter, we describe the KTC first graphically and then analytically. We discuss the necessary and sufficient conditions for X = {x1, x 2, …, xn} to be an optimal solution to the NLP of equation (9.1a). We illustrate how to use Maple to solve these KTC problems. We then present some applications using the KTC solution methodology.

9.2 Basic Theory of Constrained Optimization During these sections on KTC, we are concerned with problems of the form: MAX ( MIN ) f (X) Subject to: g i ( X) ⎡⎢ ≥ ⎥⎤ bi i = 1,2,#,m

{≤} ⎢⎣ = ⎥⎦

(9.1b)

where X is a vector of variables. We allow the constraints to be either “less than or equal to” or “greater than or equal to.” We have previously completed a block on Lagrange multipliers to

Necessary/Sufficient Kuhn–Tucker Conditions



197

solve problems with equality constraints. You recall that during the Lagrange block that the optimal solution actually fell on one constraint or at an intersection of several constraints. With the inequality constraints, the solution no longer must lie on a constraint or at an intersection point of constraints. This concept poses new problems for you. You need a method for accounting for the position of the optimal solution relative to each constraint. This KTC procedure involves setting up a Lagrangian function of the decision variables X, the Lagrange multipliers λ, and the slack or surplus variables U i2. The X j are the decision variables (x1, x 2, …, xn), the − λi are the shadow prices for the ith constraint, and the U i2 are either added (slack variables from ≥ constraints) or subtracted (surplus variables from ≥ constraints). Thus, with the sign of U i2, we are able to accommodate both ≤ and ≥ constraints. The use of U i2 was taught to me by Dr. Robert P. Davis in a Nonlinear Optimization graduate class at Clemson University in 1988. We set up this generic Lagrangian function,

(

)

m

L X , λ ,U 2 = f (X ) +

∑ λ ( g ( X ) + ( ±U ) − b ) i

i

2

i

(9.2)

i=1

Note: ±U i2 depends on the type of inequality constraint as we will explain more in detail.

9.2.1 Necessary and Sufficient Conditions The computational procedure for the KTC requires that all the partials of this Lagrangian function equal zero. These partials are the necessary conditions of the NLP problem. These are the conditions required for x = {x1, x 2, …, xn} to be a solution to (9.1a). The Necessary Conditions ∂ L / ∂ Xj   = 0 ∂ L / ∂λi   = 0

( j = 1,2,#,n ) (i = 1,2,#,m )

∂ L / ∂Ui = 0 or 2U i λi = 0 (i = 1,2,#,m )

(9.3a) (9.3b) (9.3c)

The following two theorems give the sufficient conditions for x* = {x1, x 2 , …, xn} to be an optimal solution to the NLP given in equation (9.2).

198 ◾

Nonlinear Optimization

The Sufficient Conditions Minimum: If f(x) is a convex function and each of the gi(x) is a convex function, then any point that satisfies the necessary conditions is an optimal solution. An optimal point is a point that minimizes the function subject to the constraints. λi is greater than or equal to zero for all i. Maximum: If f(x) is a concave function and each of the gi(x) is a convex function, then any point that satisfies the necessary conditions is an optimal solution. An optimal point is a point that maximizes the function subject to the constraints. λi is less than or equal to zero for all i. If the above conditions are not completely satisfied, then we may use another method to check the nature of a potential stationary or regular point, such as the bordered Hessian. Bordered Hessian: The bordered Hessian is a symmetric matrix of the second partials of the Lagrangian.

(

H B = ∂2 L / ∂ X j2 , λk2

) { j = 1,2,#,n

k = 1,2,#,m}

We can determine, if possible, the nature of the stationary point by classifying the bordered Hessian. This method is only valid leading to max or min points. If the bordered Hessian is indefinite, then another method should be used. Complementary Slackness The KTC computational solution process uses these necessary conditions and solves 2m possible cases, where m is the number of constraints. The value 2 comes from the possible conditions placed on λi, either it equals zero or it does not equal zero. There is actually more to this process since it really involves the complementary slackness condition imbedded in the necessary condition, 2 Uiλi = 0. Thus, either Ui equals zero and λi does not equal zero, or Ui is greater than or equal to zero and λi is equal to zero. This insures that the complementary slackness conditions are satisfied while solving the other necessary conditions from equations (9.3a) and (9.3b). We focus our computational and geometric interpretation on these complementary slackness necessary conditions, equation (9.3c), which lead to the solution process. We have defined U i2 as a slack or surplus variable. Therefore, if U i2 equals zero then our point is on the ith constraint and if U i2 is greater than zero then the point does not lie on the ith constraint. Furthermore, if the value of U i2 is undefined because it equals a negative number, then the point of concern is infeasible. Figures 9.1–9.3 illustrate these conditions.

Necessary/Sufficient Kuhn–Tucker Conditions

Figure 9.1

U2 = 0, Point C is on the constraint.

Figure 9.2

The Point C is inside the feasible region, U2 > 0.



199

200



Figure 9.3

Nonlinear Optimization

The Point C is not in the feasible region, U2 < 0.

9.3 Geometric Interpretation of KTC We begin with a geometric illustration. It is a basic two-variable linear problem. The purpose of this geometric interpretation is to lay a foundation for both further geometric interpretation and computational results.

9.3.1 Spanning Cones (Optional) Let’s consider the following linear example. Example 9.1 Maximize

Z = x1 + x 2

Subject to: x1 + 3 x 2 < 9 2 x1 + x 2 ≤ 8 x1, x 2 ≥ 0

Necessary/Sufficient Kuhn–Tucker Conditions



The gradient of the objective function (we will call this gradient vector C) is the vector of partial derivatives of Z with respect to x 1 and x 2 . Since these are linear functions, they are the coefficients of x 1 and x 2 . In our example, C = [1, 1]. The vector C points in the direction of greatest local increase of the objective function. Each intersection of the feasible region is found as the intersection of the “binding constraints.” “Binding constraints” are constraints that are satisfied at equality without slack or surplus. The cone spanned by a set of vectors is the set of all non-negative linear combinations of all the vectors. The coefficients of the linear combination are called the multipliers of the cone. In a plane, a cone has the appearance as its name suggests (see Figure 10.7). Thus, the binding constraints are used to find the gradient vectors of the feasible region’s boundary or corner points. There are four constraints (counting the non-negativity) and we will refer to their gradients as A1, A2, A3, and A4. Note that all constraints need to be in the form ≤ RHS including the non-negativity constraints.

A1 = [1,3] A2 = [ 2,1] A3 = [ −1,0 ] A4 = [ 0, −1]

Note, we have corner points P1, P2, P3, and P4. The optimal solution is located at P2, and this is where the gradient of C is contained within the cone spanned by their binding constraints (see Figures 9.4–9.8). Each is shown as a spanning cone at points P1–P4. They were drawn separately for clarity.

x2

A1 A2 x1 Figure 9.4

Cone spanned by A1 and A2.

201

202



Nonlinear Optimization

Figure 9.5

Spanning cone illustration at P1 (Opt vector not in span).

Figure 9.6

Spanning cones at P3 (Opt vector not in span).

Necessary/Sufficient Kuhn–Tucker Conditions



203

Figure 9.7

Spanning cones at P2 (Opt vector not in span).

Figure 9.8

Spanning cone at P4 (Opt vector is in span of binding constraints).

204



Nonlinear Optimization

We use this method to solve the following problem: Maximize z = x1 + x 2 Subject to: x1 + 3 x 2 < 9 2 x1 + x 2 <  8 x1, x 2 > 0 We recognize this as a linear programming problem, but we solve it using the KTC method.

(

)

(

L = x1 + x 2 + l1 x1 + 3 x 2 + u12 − 9 + l 2 2 x1 + x 2 + u22 − 8

)

Lx1 = 1 – λ1 + λ2 = 0 Lx 2 = 1 − 3λ1 + λ2 = 0 Lλ1 = x1 + 3 x 2 + u12 − 9 = 0 Lλ2 = 2 x1 + x 2 + u22 − 8 = 0 Lu1 = 2u1λ1 = 0 Lu2 = 2u2l 2 = 0 We take all the partial derivatives with respect to each variable and set them equal to 0 to solve for the variables. We find x1 = 3, x 2 = 2, λ1, −1/5, λ2 = −2/5, u1 = u2 = 0. The optimal sum is 5.

9.4 Computational KTC with Maple We start with solving a linear problem using the procedure. Example 9.2 Maximize 3x + 2 y Subject to: 2x + y ≤ 100 x + y ≤ 80  

x , y ≥ 0   

We set up the Lagrangian function, L.

(

)

(

L = 3 x + 2 y + λ1 2x + y +U 12 − 100 + λ2 x + y +U 2 2 − 80

)

Necessary/Sufficient Kuhn–Tucker Conditions



We take the partial derivatives with respect to x, y, λ1, λ2, U1, and U2. Lx = 4 + 2λ1 + λ2 = 0

(9.4)

L y = 2 + λ1 + λ2 = 0

(9.5)

Lλ1 = 2 x + y +U 12 − 100 = 0

(9.6)

Lλ2 = x + y +U 2 2 − 80 = 0

(9.7)

LU 1 = 2λ1U 1 = 0

(9.8)

LU 2 = 2λ2U 2 = 0

(9.9)

Equations (9.8) and (9.9) are our complementary slackness that provides a condition to go back and solve equations (9.4)–(9.7). We know from (9.8) and (9.9) that either UI or λ1 = 0. We have four cases. So we can set U1 = U2 = 0, U1 = 0, λ2 = 0, λ1 = 0, U2 = 0, or λ1 = λ2 = 0. If λ1 = λ2 = 0, then we see that equations (9.4) and (9.5) are not possible. We move to another easy substitution of U1 = U2 = 0. From equations (9.6) and (9.7), we find x = 20, y = 60, and solving (9.4) and (9.5), we find λ1 = −1, λ2 = −1. Since the objective function and all constraints are linear, we have met our sufficient conditions for a maximum. Again, we will illustrate two methods that will help us obtain all the parameters and variable values. Method 1 will be with the CAS, and Method 2 with the with(Student[MultivariateCalculus]) routine. Example 9.3: Two Variable-Two Constraint Linear Problem Consider the problem: Maximize 3x + 2 y Subject to:  2x + y < 100  x + y < 80

(9.10)

Putting this problem into our generalized Lagrangian form, equation (9.1a), we obtain the following:

(

)

(

L( x , y, λ ,U ) = 3x + 2 y + λ1 2x + y + U 12 − 100 + λ2 x + y + U 22 − 80

)

(9.11)

The six necessary conditions are as follows: 3 + 2λ1 + λ2 = 0

(9.12)

2 + λ1 + λ2 = 0

(9.13)

205

206  ◾  Nonlinear Optimization



2 x + y + U 12 − 100 = 0 (9.14)



x + y + U 22 − 80 = 0 (9.15)



2U 1λ1 = 0 (9.16)



2U 2l 2 = 0 (9.17)

You should recognize, since there are two constraints, that there are four (22) cases required to solve for the optimal solution. These four cases stem from necessary conditions listed in equations (9.16) and (9.17). Cases

Condition Imposed

Condition Inferred

I

λ1 = λ2 = 0

U 22 ≠ 0, U 12 ≠ 0

II

λ1 = 0, λ2 ≠ 0

U 22 = 0, U 12 ≠ 0

III

λ = 0, λ1 ≠ 0

U 12 = 0, U 22 ≠ 0

IV

λ1 ≠ 0, λ2 ≠ 0

U 22 = 0, U 12 = 0

For simplicity, we have arbitrarily made both x and y > 0 for this maximization problem. We provide a graphical representation in Figure 9.9.

Figure 9.9  Region for KTC example MAX 3x + 2y, subject to: 2x + y  0

Necessary/Sufficient Kuhn–Tucker Conditions  ◾  207 Returning to Cases I–IV, we observe the following: (1) Case I considers slack in both the first and second constraints. Therefore, we do not fall exactly on either of the constraints. This corresponds to the intersection point labeled P1 at (0, 0) since only intersection points can led to linear optimization solutions. This point is feasible, but is clearly not optimal as we need to move out from (0, 0) to be optimal. This case will not yield an optimal solution. (2) Case II places the possible solution point on the second constraint, but not on the first constraint. There exist two possible solutions, points 3 and 5 as shown in Figure 9.19. Point 5 is infeasible and point 3 is feasible, but not an optimal solution. This case will not yield an optimal solution. (3) Case III places the possible solution on the first constraint, but not on the second constraint. There exist two possible solutions, points 2 and 4 as shown in Figure 9.9. Point 4 is infeasible and point 2 is feasible, but not the optimal solution. Again, this case does not yield an optimal solution. (4) Case IV places the possible solution on both constraints 1 and 2 simultaneously. This corresponds to point 6 in Figure 9.9. Point 6 is the optimal solution. It is the point in the feasible region tangent to the contours of the objective function in the direction of increased value for the objective function. This case will computationally yield the optimal solution to the problem. Sensitivity analysis is also enhanced by geometric interpretation. Thus, it is clear from Figure 9.10 that if you increase the right-hand side of either or both constraints that the feasible region will be extended and the value of the objective function will increase. We should find this through the computational process and the solution of λ. This computational sensitivity analysis will be shown through the value of the shadow price (−λi).

Figure 9.10  Geometric sensitivity analysis.

208



Nonlinear Optimization

Since both our objective function and constraints are linear, convex, and concave functions, the sufficient conditions are also satisfied. Solving these cases shows that Case IV yields the optimal solution, confirming the graphical solution. Computational results: Case I: λ = λ2 = 0. This case violates equations (9.18) and (9.19) because 2 ≠ 0 and 3 ≠ 0. (Notice that Case I states that λ = λ2 = 0 also implies that U 22 ≠ 0 and U 12 ≠ 0.) Case II: λ1 = 0, λ2 ≠ 0 implying that U 22 = 0 and U 12 ≠ 0. This case violates both (9.18) and (9.19) because λ2 cannot equal both −2 and −3 at the same time. Case III: λ2 = 0, λ1 ≠ 0 implying that U 12 = 0 and U 22 ≠ 0. This case also violates (9.18) and (9.19) because λ1 cannot be equal to −2 and −3 at the same time. Case IV: λ1 ≠ 0, λ2 ≠ 0 implying that U 22 = 0 and U 22 = 0. Imposing Case IV yields the following sets of equations for (9.18)−(9.21): 3 + 2λ1   + λ2  = 0

(9.18)

2 + λ1   + λ2  = 0, and

(9.19)

2x + y − 100 = 0

(9.20)

x + y − 80 = 0

(9.21)

Solving these pairs simultaneously yields the following optimal solution. The solutions are x* = 20, y* = 60, f(x*, y*) = 180, λ1 =−1, λ2 =−1 U 12 = U 22 = 0. The shadow prices indicate for a small change, ∆ in the right-hand-side value of either constraint 1 or 2; the objective value will increase by approximately one x∆. The geometric interpretation reinforces the computational results and gives them meaning. It fully shows the effect of binding constraints (where U i2 = 0) on the solution. Method 1: In Maple > obj3 := 3 x + 2 y; obj3 := 3 x + 2 y > L := obj3 + l1(2 x + y + u12 − 100) + l 2(x + y + u 2 2 − 80); L := 3 x + 2 y + l1(2 x + y + u12 − 100) +l 2( x + y + u22 − 80)



Necessary/Sufficient Kuhn–Tucker Conditions

> lx1:= diff (L, x); lx1:= 3 + 2l1+ l 2 > lx2 := diff (L, y); lx2 := 2 + 2l1+ l 2 > l 1:= diff (L,l1); l 1:= 2 x + y + u12 − 100

> l 2 := diff (L,l 2) l 2 := x + y + u22 − 80 > lu1:= diff (L,u1); lu1:= 2 l1 u1 > lu2 := diff (L,u2); lu2 := 2 l 2 u2

(

)

> solve {lx1 = 0, lx2 = 0, ll1 = 0, ll 2 = 0, lu1 = 0, lu2 = 0} , { x, y, l1, l 2, u1, u2 } ; lu2 := 2 l 2 u2 {l 2 = −1, u2 = 0, l1 = −1, y = 60, x = 20, u1 = 0} The optimal solution is f(20, 60) = 180. How do we know we have found a maximum? Recall the rules for finding the maximum or minimum. Minimum: If f(x) is a convex function and each of the gi(x) is a convex function, then any point that satisfies the necessary conditions is an optimal solution. An optimal point is a point that minimizes the function subject to the constraints. λi is greater than or equal to zero for all i. Maximum: If f(x) is a concave function and each of the gi(x) is a convex function, then any point that satisfies the necessary conditions is an optimal solution. An optimal point is a point that maximizes the function subject to the constraints. λi is less than or equal to zero for all i. The objective function is linear and is both convex and concave, as are the constraints. Since the values of λi are negative, we have found the maximum.

209

210



Nonlinear Optimization

Example 9.4: Two Variable-Three Constraint Linear Problem In this example, we merely added one constraint to the previous problem, x ≤ 40. The addition of one constraint has caused the number of solution cases to consider growing from 2 2 to 23 or 8 cases. The problem with all three constraints is shown in Figure 9.11. Again, for simplicity, we arbitrarily force both x and y ≥ 0. A summary of the graphical interpretation is displayed in Table 9.1. The optimal solution is found using Case IV. Again, the computational solution merely looks for the point where either all the necessary conditions are met or violated. The geometric interpretation reinforces why the other cases do not yield an optimal solution. The optimal solution will be found only in Case IV which geometrically shows that the solution is binding on constraints 1 and 2 and not binding on constraint 3 (slack still exists). The optimal solution found computationally using Case IV (similar to the previous example): f (x*, y*) = f(20, 60) = 180, λ1 = −1, λ2 = −1, λ3 = −0, U 12 = 0, U 22 = 0, and U 32 = 0. The geometric interpretation takes the mystery out of the case-wise solutions. You can visually see why in each specific case we can achieve or not achieve optimality conditions. If possible, make and obtain a quick graph and analyze the graph to eliminate as many cases as possible prior to doing the computational solution procedures. Let us apply this procedure to another example.

Figure 9.11

Contours of 3x + 2y with three constraints.

Necessary/Sufficient Kuhn–Tucker Conditions



211

Table 9.1 Example 2 Summary Case

Condition Imposed

Point #

Feasible

Optimal

I

λ1 = λ2 = λ3 = 0 (all constraints have slack)

1

Yes

No

II

λ1 = 0, λ2 ≠ 0, λ3 ≠ 0 (on constraints 2 and 3, not on constraint 1)

5

No

No

III

λ2 = 0, λ1 ≠ 0, λ3 ≠ 0 (on constraints 1 and 3, not on 2)

6

Yes

No

IV

λ3 = 0, λ2 ≠ 0, λ1 ≠ 0 (not on 3, on 1 and 2)

4

Yes

Yes

V

λ1 = 0, λ2 = 0, λ3 ≠ 0 (on constraint 3, not on 1 or 2)

7

Yes

No

VI

λ1 = λ3 = 0, λ2 ≠ 0

2

Yes

No

(on 2, not on 1 or 3)

9

No

No

VII

λ2 = λ3 = 0, λ1 ≠ 0 (on 1, not on 2 or 3)

3

No

No

VIII

λ1 ≠ 0, λ2 ≠ 0, λ3 ≠ 0 (on all three constraints)

8

No

No

Example 9.5: Geometric Three-Variable Nonlinear Constrained Problem Minimize z = ( x − 14 )  + ( y − 11) 2

2

Subject to:

( x − 11)2   + ( y − 13) x   + y ≤ 19

2

 ≤ 49 (9.22)

We use Maple to generate the plots, contour plots, and constraints to obtain a geometrical interpretation shown in Figure 9.12. The optimal solution, as visually shown, is the point where the level curve of the objective function is tangent to the constraint, x + y = 19 in the direction of increase for the contours. The solution merely satisfies the other constraint, (x − 11)2 + (y − 13)2 < 49. This corresponds to the case where constraint 2 (x + y < 19) is binding and constraint 1, (x − 11)2 + (y − 13)2 < 49, is not binding. This is written as λ2 ≠ 0 (U 22 = 0) and λ1 = 0 (U 12 ≠ 0). We can eyeball an estimate for the solution from the plot. We use the fact that constraint 2 is binding and constraint 1 is merely satisfied to directly solve this case to find the optimal solution. Graphically, we can obtain a good approximation, but we cannot obtain the shadow prices which are invaluable in sensitivity analysis.

212 ◾

Nonlinear Optimization

Figure 9.12

Contour plot of (x − 14)2 + (y − 11)2 with constraints.

Case: λ2 ≠ 0 (U 22 = 0) and λ1 = 0 (U 12 ≠ 0) Necessary conditions with this case applied are as follows: 2 x − 28 + λ2  = 0 2 y − 22 + λ2  = 0

( x − 11)2   + ( y − 13)

2

 + U 12 − 49 = 0   x + y = 19

> f:=(x-14)^2+(y-11)^2;

f := ( x − 14)2 + ( y − 11)2 > g1:=(x-11)^2+(y-13)^2-49;

g 1:= ( x − 11)2 + ( y − 13)2 − 49 > g2:=(x+y-19);

g 2 := x + y − 19

Necessary/Sufficient Kuhn–Tucker Conditions

◾ 213

> L:=f+l1*(-g1-u1^2)+l2*(-g2-u2^2);

(

L := ( x − 14 ) + ( y − 11) + l1 − ( x − 11) − ( y − 13) + 49 − u12 2

2

(

+ l 2 − x − y + 19 − u22

2

2

)

)

> Lx:=diff(L,x);

Lx := 2 x − 28 + l1( −2x + 22 ) − l 2 > Ly:=diff(L,y);

Ly := 2 y − 28 + l1( −2x + 22 ) − l 2 > Ll1:=diff(L,l1);

Ll1:= − ( x − 11) − ( y − 13) + 49 − u12 2

2

> Ll2:=diff(L,l2);

Ll 2 := −x − y + 19 − u22 > Lu1:=u1*l1;

Lu1:= u1 l1 > Lu2:=u2*l2;

Lu2 := u2 l 2 > solve({Lx=0,Ly=0,Ll1=0,Ll2=0,Lu1=0,Lu2=0}, {x,y,l1,l2,u1,u2});

{ y = RootOf ( _ Z l1 = −

2

)

(

)

− 21_ Z + 92 , x = −RootOf _ Z 2 − 21_ Z + 92 + 19,

(

)

(

)

32 10 50 598 , + RootOf _ Z 2 − 21_ Z + 92 , l 2 = RootOf _ Z 2 − 21_ Z + 92 − 73 73 73 73

u1 = 0, u 2 = 0} ,

{l 2 = 0, u2 = RootOf ( 276 + 130 _ Z

2

)

+ 13 _ Z 4 ,

(

)

2

y = 23 + 2 RootOf 276 + 130 _ Z 2 + 13 _ Z 4 , l1 =

(

)

2 114 13 RootOf 276 + 130 _ Z 2 + 13 _ Z 4 , + 49 49

(

)

2

}

x − 4 − 3 RootOf 276 + 130 _ Z 2 + 13 _ Z 4 , u1 = 0

214



Nonlinear Optimization

{l 2 = −6, u1 = 2RootOf (−6 + _ Z , label = _ L5) , l1 = 0, y = 8, u2 = 0, x = 11} , {l 2 = 0, u2 = RootOf (−6 + _ Z , label = _ L4) , l1 = 0, u1 = 6, y = 11, x = 14} , {l 2 = 0, u2 = RootOf (−6 + _ Z , label = _ L4) , l1 = 0, u1 = −6, y = 11, x = 14} 2

2

2

> fsolve({Lx=0,Ly=0,Ll1=0,Ll2=0,Lu1=0,Lu2=0}, {x,y,l1,l2,u1,u2});

{l 1= 0., x = 11.00000000, u2 = 0., u1 = −4.898979486, y = 8.000000000, l 2 = −6.000000000} This is not the solution since the value of u1 is negative. We find that we should return, in Maple, to the results from solve and obtain all the solutions to the roots of equations, > solve(276+130*z1^2+13*z1^4,z1);



1 1 1 1 −845 + 91 13, −845 + 91 13, − −845 − 91 13, −845 − 91 13, 13 13 13 13

> fsolve(Z^2-21*Z+92,Z);

6.227998127,14.77200187 We determine that the solution that satisfies the conditions is x* = 11, y* = 8, λ1 = 0, λ2 = 6, U 22 = 0, and U 12 = 24. The value of the objective function is f(x*, y*) = 18. This is the optimal solution to equation (9.22). The interpretation of the shadow prices shows that if we have more resource for constraint 2 that our objective function will decrease. If we add ∆ to the righthand side of constraint 2, the objective function value will decrease by approximately 6∆. If we changed the right-hand side of the constraint from 19 to 20, the optimal solution becomes x* = 11.5, y* = 8.5, f(x*, y*) = 12.5 or a decrease of 5.5 units (the work is left to the student in the exercise set). Method 2: > obj4 := ( x − 14 ) + ( y − 11) ; 2

2

obj4 := ( x − 14 ) + ( y − 11) 2

2

> cons14 := ( x − 11) + ( y − 13) +U 12 − 49; 2

2

cons14 := ( x − 11) + ( y − 13) +U 12 − 49 2

2

> cons24 := x + y +U 22 − 19; cons24 := x + y +U 22 − 19

Necessary/Sufficient Kuhn–Tucker Conditions



> with ( Student [ Multi var iateCalculus ]) :

(

)

> Lagrange Multipliers obj4, [ cons14,cons24 ] , ⎡⎣ x, y,U 1,U 2⎤⎦ ,output= det ailed ;

(

)

⎡ x = 11, y = 8,U 1 = 2RootOf −6 + _ Z 2 , label = _ L32 ,U 2 = 0, ⎣

λ1 = 0, λ2 = −6, ( x − 14 ) + ( y − 11) = 18 ⎤ , ⎦ 2

2

(

)

2 ⎡ ⎣ x = 14, y = 11,U 1 = 6,U 2 = RootOf _ Z + 6, label = _ L31 ,

λ1 = 0, λ2 = 0, ( x − 14 ) + ( y − 11) = 0 ⎤ , ⎦ 2

2

(

)

⎡ x = 14, y = 11,U 1 = −6,U 2 = RootOf _ Z 2 + 6, label = _ L31 , ⎣

λ1 = 0, λ2 = 0, ( x − 14 ) + ( y − 11) = 0 ⎤ , ⎦ 2

2

(

)

⎡ x = −RootOf _ Z 2 + 92 − 21_ Z + 19, ⎣

(

)

y = −RootOf _ Z 2 + 92 − 21_ Z ,U 1 = 0,U 2 = 0,

λ1 = − λ2 =

(

)

32 10 RootOf _ Z 2 + 92 − 21_ Z , + 73 73

(

)

50 598 RootOf _ Z 2 + 92 − 21_ Z − , 73 73

( x − 14)2 + ( y − 11)

(

2

(

) )

(

= −RootOf _ Z 2 + 92 − 21_ Z + 5

(

2

)

)

2 + − RootOf _ Z 2 + 92 − 21_ Z − 11 ⎤⎥ , ⎦

(

)

⎡ x = −3RootOf 13 _ Z 4 + 130 _ Z 2 + 276 2 − 4 ⎣⎢

(

)

(

)

2

y = 2RootOf 13 _ Z 4 + 130 _ Z 2 + 276 + 23,U 1 = 0, 2

U 2 = RootOf 13 _ Z 4 + 130 _ Z 2 + 276 ,

λ1 =

(

)

2 114 12 RootOf 13 _ Z 4 + 130 _ Z 2 + 276 + , λ1 = 0 49 49

( x − 14)2 + ( y − 11)

2

(

(

)

2

= −3RootOf 13 _ Z 4 + 130 _ Z 2 + 276 − 18

(

(

)

)

2

)

2 + 2RootOf 13 _ Z 4 + 130 _ Z 2 + 276 + 12 ⎥⎤ ⎦

215

216  ◾  Nonlinear Optimization Now we must go through each possible solution and solve the roots of equations for any solution that has a possibility of working. We can exclude all possible solutions whose solutions for any Ui is negative. We determine that the solution that satisfies the conditions is x* = 11, y* = 8, λ1 = 0, λ2 = 6, U 22 = 0, and U 12 = 24. The value of the objective function is f(x*, y*) = 18 that is a minimum because f(x) is convex, the g(x) is convex, and the λi are non-negative.

We return to examining the necessary and sufficient conditions for computational KTC. We have shown how the use of visual interpretation can reduce the amount of work required to solve the problem. By seeing the plot you can interpret the conditions involved at the optimal point and then solve directly for that point. However, sometimes we cannot obtain the graphical interpretation so we must rely on the computational method alone. When this occurs, we must solve all the cases and interpret the results. Let’s illustrate with a few examples. Example 9.6 Let us revisit the previous NLP, equation (9.22). The Lagrangian form of the NLP is as follows: 2 2 2 2 L( x , y , λ , U ) = ( x − 14 )   + ( y − 11)   + λ1 ( x − 11)   + ( y − 13)   + U 12   −49   

+ λ2    x + y + U 22   −19 

(9.23)

The necessary conditions are as follows:

2 ( x − 14 )  +2λ1 ( x − 11) + λ2 x = 0 (9.24)



2 ( y − 11) + 2λ1 ( y − 13) + λ2 y = 0 (9.25)



( x − 11)2   + ( y − 13)

2

  +U 11   −49 = 0 (9.26)



x + y + U 22 − 19 = 0 (9.27)



2λ1  U 1 = 0 (9.28)



2λ2U 2 = 0 (9.29) Case I: λ1 = λ2 = 0. This case finds x = 14 and y = 11. These values violate equation (9.27) because U 22 ≠ −5 indicating this solution is infeasible. Case II: λ1 = 0, λ2 ≠ 0 implying that U 22 = 0 and U 12 ≠ 0 . This case yields the optimal solution as shown below:



2 ( x − 14 )  + λ2   x   = 0 (9.30)

Necessary/Sufficient Kuhn–Tucker Conditions  ◾  217

2 ( y − 11) + λ2 y = 0 (9.31)



( x − 11)2 + ( y − 13)



2

+ U 12 − 49 = 0 (9.32) x + y ± 9 = 0 (9.33)



Since y = 19 − x from (9.33), we can substitute for y into (9.31) and then solve equations (9.30) and (9.31) simultaneously for x and λ2. All the necessary conditions are satisfied at the point: x* = 11, y* = 8, λ1= 0, λ2 =6 U 22 = 0, and U 12 = 24. The value of the function at this point is 18. It is left to the student to show that this point is an optimal point. Case III: λ  = 0, λ1 ≠ 0 implying that U 12 = 0 and U 22 ≠ 0. This case violates (9.34) and (9.35) because λ1 cannot be equal to both 2 and 3 at the same time. Case IV: λ1 ≠ 0, λ2 ≠ 0 implying that U 22 = 0 and U 12 = 0 . Imposing Case IV yields the following set of equations:

2 ( x − 14 ) + 2λ1 ( x − 11) + λ 2 x = 0 (9.34)



2 ( y − 11) + 2λ1 ( y − 13) + λ2 y = 0 (9.35)

( x − 11)2 + ( y − 13)



2

− 49 = 0 (9.36)

x + y − 19 = 0 (9.37)



Solving (9.36) and (9.37) for x and y yields two results. We use these results to solve equations (9.34) and (9.35). These results are as follows: x = 12.772,  y = 6.228, λ1 = −0.5465, λ2 = 0.3439 and x = 4.23, y = 14.77, λ1 = −1.49, λ2   = −0.15314



The functional values are f(12.722, 6.228) = 24.279 and f(4.23, 14.77) = 109.665. These are not optimal values because they do not satisfy the sufficient conditions for λi in the case for a relative minimum. You should show that f(4.23, 14.77) is a relative maximum. Why? Example 9.7 Minimize 2 x 2   −8 x − 6 y + y 2

Subject to:

x+ y 0, λ2 = 0 λ1 = 4/3 from (9.38), (9.39), & (9.40) y = 2.667 from back substitution into (9.39) x = 1.667 from back substitution into (9.38) U 22 = −0.667 which violates the fact that U 22 ≥ 0 to be a real solution in (9.41). Case IV: λ1 > 0, λ2 > 0 So, U 12 = 0 and U 22 = 0 Case IV yields the same solution as Case II.

9.5 Modeling and Application with KTC Example 8: Maximizing Profit from Perfume Manufacturing A company manufactures perfumes and can purchase up to 1925 oz of the main chemical ingredient for $10 per oz. At a cost of $3 per oz, the chemical can be manufactured into an ounce of perfume #11, and at a cost of $5 per oz, the chemical can be manufactured into an ounce of the higher priced perfume #2. An advertising firm estimates that if x ounces of perfume #1 are manufactured,

Necessary/Sufficient Kuhn–Tucker Conditions



219

it will sell for $30 − 0.01x per ounce. If y ounces of perfume #2 are produced, it can sell for $50 − 0.02y per ounce. The company wants to maximize profits for the company. Formulation: x = ounces of perfume 1 produced y = ounces of perfume 2 produced z = ounces of main chemical purchased Max f ( x , y, z ) = x ( 30 − 0.01x ) + y ( 50 − 0.02 y ) − 3x − 5 y − 10z Subject to: x+ y L := x ( 30 − .01⋅ x ) + y ( 50 − .02 ⋅ y ) − 3 x − 5 y − 10 z

(

)

(

)

+ l 1 x + y − z +U 12 + l 2 z +U 22 − 1925 ; L := x ( 30 − 0.01⋅ x ) + y ( 50 − 0.02 ⋅ y ) − 3 x − 5 y − 10 z

(

)

(

+ l 1 x + y − z +U 12 + l 2 z +U 22 − 1925

)

220



Nonlinear Optimization

> Lx := diff ( L, x ) ; Lx := 27. − 0.02 x + l1 > Ly := diff ( L, y ) ; Ly := 45. − 0.04 y + l1 > Lz := diff ( L, z ) ; Lz := −10 − l 1 + l 2 > Ll1:= diff ( L,l1) ; Ll1:= x + y − z +U 12 > Ll 2 := diff ( L,l 2 ) ; Ll 2 := z + U 22 − 1925 > LU 1:= diff ( L,U 1) ; LU 1:= 2l1U 1 > LU 2 := diff ( L,U 2 ) ; LU 2 := 2l 2U 2 > solve

({Lx = 0, Ly = 0, Lz = 0, Ll1 = 0, Ll 2 = 0, LU 1 = 0, LU 2 = 0} , { x, y, z, l1, l 2,U 1,U 2}) ; {l 2 = 2.666666667, x = 983.33333333, l1 = −7.333333333, y = 941.6666667, U 1 = 0., z = 1925.,U 2 = 0.} ,

{l 2 = 10., x = 1350., y = 1125.,U 1 = 23.45207880 I , z = 1925.,U 2 = 0.,} , {l 1 = 0, l 2 = 10., x = 1350., y = 1125.,U 1 = −23.45207880 I , z = 1925., U 2 = 0.,} , {U 2 = 14.14213562, U 1 = 0., l 2 = 0., z = 1725., l1 = −10., x = 850., y = 875.} , {U 1 = 0., U 2 = −14.14213562, l 2 = 0., z = 1725., l1 = −10., x = 850., y = 875.}

Necessary/Sufficient Kuhn–Tucker Conditions



221

We disregard all solutions where either U1 < 0 or U2 < 0 because this violates U1 and U2 having to be real values. We use the four cases as before and put the results in the following table. Cases

λ1

λ2

X

y

Z

I

1,350

1,125

1,925

0

II

850

875

1,725

−10

0

0

III

1,350

1,125

1,925

0

10

−23.45

1,925 −7.33

2.667

0

IV

983.33 941.67

U1

U2

0 −23.45 I

Remarks

0 U1 not real 200 Candidate solution 0 U1 not real 0 (9.42) & (9.43) led to 2 different signs of λi → violation

We have a concave function, linear constraints, and λi (λ1 = −10) is negative for all binding constraints. Thus, we have met the sufficient conditions for the point (850, 875) to be the optimal solution. The optimal manufacturing strategy is to purchase 1,725 ounces of the chemical and produce 850 ounces of perfume #1 and 875 ounces of perfume #2.

(

> subs {U 1 = 0.,U 2 = −14.14213562, l 2 = 0., z = 1725., l1 = −10., x = 850., y = 875.} , L 22537.50 This yields a profit of $22,537.50. Consider the significance of the shadow price for λ1. How do we interpret the shadow price in terms of this scenario? If we could obtain an extra ounce (∆ = 1) of chemical at no cost, it would improve the profit to about $22,537.50 + $10. Example 9: Minimum Variance of Expected Investment Returns A new company has $5,000 to invest, but the company needs to earn about 12% interest. A stock expert has suggested three mutual funds {A, B, and C} in which the company could invest. Based upon previous year’s returns, these funds appear relatively stable. The expected return, variance on the return, and covariance between funds are shown below: Expected Value

Variance

Covariance

A

B

C

0.14

0.11

0.10

A

B

C

0.2

0.08

0.18

AB

AC

BC

0.05

0.02

0.03

)

222 ◾

Nonlinear Optimization

Formulation: We use laws of expected value, variance, and covariance in our model. Let xj be the number of dollars invested in funds j (j = 1, 2, 3). Minimize VI = Var ( Ax1 + Bx 2 + Cx 3 ) = x12 Var ( A ) + x 22 Var ( B ) + x 32  Var (C ) + 2x1x 2 Cov ( AB ) + 2x1x 3 Cov ( AC ) + 2 x 2 x 3 Cov ( BC )  = 0.2x12 + 0.08x 22 + 0.18x 32 + 0.10x1x 2 + 0.04x1x 3 + 0.06x 2 x 3 Our constraints include the following: 1. the expectation to achieve at least the expected return of 12% from the sum of all the expected returns: 0.14x1 + 0.11x 2 + 0.10x 3 ≥ ( 0.12 × 5,000 ) or 0.14x1 + 0.11x 2 + 0.10x 3 ≥ 600 2. the sum of all investments must not exceed the $5000 capital. x1 + x 2 + x 3 ≤ $5,000 Solution We set up the Lagrangian function, L. Min L = 0.2x12 + 0.08x 22 + 0.18x 32 + 0.10x1x 2 + 0.04x1x 3 + 0.06x 2 x 3 2 ⎤ + λ2  ⎡⎣ x1 + x 2 + x 3 + U 22 − $5,000⎤⎦ + λ1  ⎡0.14x 1 + 0.11x 2 + 0.10x 3 − U 1 − 600 ⎦ ⎣

Necessary conditions are as follows: Lx1 = 0.4x1 + 0.10x 2 + 0.04x 3 + 0.14λ1 + λ2 = 0

(9.47)

Lx 2 = 0.16x 2 + 0.10x1 + 0.06x 3 + 0.11λ1 + λ2 = 0

(9.48)

Lx 3 = 0.36x 3 + 0.04x1 + 0.06x 2 + 0.10λ1 + λ2 = 0

(9.49)

Lλ1 = 0.14 x1 + 0.11x 2 + 0.10x 3 −U 12 − 600 = 0

(9.50)

Lλ2 = x1 + x 2 + x 3 +U 22 − $5,000 = 0

(9.51)

There are only two constraints, so we need to consider four cases.

Necessary/Sufficient Kuhn–Tucker Conditions



223

The solution is found in the case where λ1 and λ2 both do not equal zero. This is the case where the solution lies at the intersection of the constraints. Both constraints are binding. > obj7 := 0.2 * x1^ 2 + .08 * x2 ^ 2 + .18 * x3 ^ 2 + .1* x1⋅ x2 + .04 * x1* x3 + .06 * x 2 * x 3; obj7 := 0.2 * x12 + 0.08 * x22 + 0.18 * x32 + 0.1* x1⋅ x2 + 0.04 * x1* x3 +0.06 * x 2 * x 3 > cons17 := .14 ⋅ x1+ .11⋅ x2 + .1⋅ x3 −U 12 − 600; cons17 := 0.14 x1 + 0.11 x2 + 0.1 x3 −U 12 − 600 > cons27 := x1+ x2 + x3 −U 22 − 5000; cons27 := x1 + x2 + x3 −U 22 − 5000 > with ( Student [ Multi var iateCalculus ]) :

(

)

> LagrangeMultipliers obj7, ⎡⎣cons17, cons27⎤⎦ , ⎡⎣ x1, x2, x3,U 1,U ⎤⎦ , output = det ailed ; ⎡⎣ x1 = 1904.761905, x 2 = 2380.952381, x3 = 714.2857143 U 1 = 0., U 2 = 0., λ1 = 13809.52381, λ2 = −904.7619048, 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 + 0.04 x1 x3 +0.06x 2 x 3 = 1.880952381106 ⎤⎦ , ⎡x ⎣ 1 = 702.2471910, x 2 = 3117.977528, x3 = 1179.775281 U 1 = 6.382032363 I ,U 2 = 0., λ1 = 0., λ2 = 639.8876404, 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 + 0.04 x1 x3 +0.06x 2 x 3 = 1.599719101106 ⎤⎦ , ⎡⎣ x1 = 702.2471910, x 2 = 3117.977528, x3 = 1179.775281 U 1 = −6.382032363 I ,U 2 = 0., λ1 = 0., λ2 = 639.8876404, 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 + 0.04 x1 x3 +0.06x 2 x 3 = 1.599719101106 ⎤⎦ ,

224 ◾

Nonlinear Optimization

⎡⎣ x1 = 1250.108970, x 2 = 2929.125621, x3 = 1027.809258 U 1 = 0., U 2 = 14.38901837 I , λ1 = 5957.632290, λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 + 0.04 x1 x3 +0.06x 2 x 3 = 1.787289686 106 ⎤⎦ , ⎡⎣ x1 = 1250.108970, x 2 = 2929.125621, x3 = 1027.809258 U 1 = 0., U 2 = −14.38901837 I , λ1 = 5957.632290, λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 + 0.04 x1 x3 +0.06x 2 x 3 = 1.787289686 106 ⎤⎦ , ⎡⎣ x1 = 0., x 2 = 0., x3 = 0.,U 1 = 24.49489743,U 2 = 70.71067812,

λ1 = 0., λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 +0.04 x1 x 3 + 0.06x2 x3 = 0. ⎦⎤ , ⎡x ⎣ 1 = 0., x 2 = 0., x3 = 0.,U 1 = −24.49489743,U 2 = 70.71067812,

λ1 = 0., λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 +0.04 x1 x 3 + 0.06x2 x3 = 0. ⎤⎦ , ⎣⎡x1 = 0., x 2 = 0., x3 = 0.,U 1 = 24.49489743,U 2 = −70.71067812,

λ1 = 0., λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 +0.04 x1 x 3 + 0.06x2 x3 = 0. ⎤⎦ , ⎡x ⎣ 1 = 0., x 2 = 0., x3 = 0.,U 1 = −24.49489743,U 2 = −70.71067812,

λ1 = 0., λ2 = 0., 0.2 x12 + 0.08 x 22 + 0.18 x32 + 0.1 x1 x2 +0.04 x1 x 3 + 0.06x2 x3 = 0. ⎦⎤ ,

(

)

> h := Hessian obj 7, ⎡⎣ x1, x2, x 3 ⎤⎦ ;

⎡ 0.4 ⎢ h := ⎢ 0.1 ⎢⎣ 0.04

0.1 0.16 0.06

0.04 0.06 0.36

⎤ ⎥ ⎥ ⎥⎦

Necessary/Sufficient Kuhn–Tucker Conditions



225

We solve the following system of equations: x2

x3

λ1

λ2

0.4

0.10

0.04

0.14

1

0.1

0.16

0.06

0.11

1

0.04

0.06

0.36

0.10

1

0.14

0.11

0.10

0

0

1

1

1

0

0

x1

equal to the right-hand-side matrix [0 0 0 600 5,000]. The solution is x1 = 1904.80 x 2 = 2381.00 x 3 = 714.20

λ1 = −13809.50 λ2 = 904.80 z = $1880942.29 or a standard deviation of $1371.50. The expected return is 12% found by 0.14(1904.8) + 0.11(2,381) + 0.1(714.2)/5,000. This solution is optimal. The Hessian matrix, H, has all positive leading principal minors. Therefore, since H is always positive definite, then our solution is the optimal minimum: ⎡ 0.4 ⎢ H =  ⎢ 0.1 0.04 ⎣⎢

0.1 0.16 0.06

0.04 0.06 0.36

⎤ ⎥ ⎥ ⎥⎦

Exercises 1. Use the Kuhn–Tucker conditions to find the optimal solution to the following nonlinear problem: Maximize f ( x, y ) = −x 2 − y 2 + xy + 7x + 4 y Subject to: 2 x + 3 y ≤ 24 −5 x + 12 y ≥ 20

226



Nonlinear Optimization

2. Use the Kuhn–Tucker conditions to find the optimal solution to the following nonlinear problem: Maximize f ( x, y ) = −x 2 − y 2 + xy + 7x + 4 y 2x + 3 y ≥ 16

subject to:

− 5 x + 12 y ≤ 20

3. Use the Kuhn–Tucker conditions to find the optimal solution to the following nonlinear problem: Minimize f ( x, y ) = 2x + xy + 3 y x2 + y ≥ 3

subject to:

2.5 − 0.5 x − y ≤ 0 4. Use the Kuhn–Tucker conditions to solve Minimize f ( x, y ) = 2x + xy + 3 y x2 + y ≥ 3

subject to:

x + 0.5 ≥ 0 y ≥0 5. Solve the following:

Maximum

− ( x − 0.5) − ( y − 5)

subject to :  − x + 2y ≤ 4 x 2 + y 2 ≤ 14 0 ≤ x, 0 ≤ y

2

2

Necessary/Sufficient Kuhn–Tucker Conditions

6.

Minimize x 2 + y 2 subject to : 2x + y ≤ 100 x + y ≤ 80

> 7.

Maximize : − ( x − 4 ) + xy − ( y − 4 ) 2

subject to: 2 x + 3 y ≤ 18 2x + y ≤ 8

2



227



228

Nonlinear Optimization

Project Manufacturing The manufacturer of a new plant is planning the introduction of two new products, a 19-inch stereo color set with a manufacturer’s suggested retail price (MSRP) of $339 and a 21-inch stereo color set with an MSRP of $399. The cost to the company is $195 per 19-inch set and $225 per 21-inch set, plus an additional $400,000 in fixed costs of initial parts, initial labor, and machinery. In a competitive market in which they desire to sell the sets, the number of sales per year will affect the average selling price. It is estimated that for each type of set, the average selling price drops by one cent for each additional unit sold. Furthermore, sales of 19-inch sets will affect the sales of 21-inch sets and vice versa. It is estimated that the average selling price for the 19-inch set will be reduced by an additional 0.3 cents for each 21-inch set sold, and the price for the 21-inch set will decrease by 0.4 cents for each 19-inch set sold. We desire to provide them the optimal number of units of each type set to produce and to determine the expected profits. Recall Profit is revenue minus cost, P = R −C. Part I 1. Formulate the model to maximize profits. Ensure that you have accounted for all revenues and costs. Define all your variables. 2. Solve for the optimal levels of 21-inch and 19-inch sets to be manufactured using a. Classical optimization-calculus b. Combination of 3D surface plot and contour plot (estimated answer is all right here). c. Newton–Raphson method. Briefly explain why you can use this technique. 3. Using Maple, obtain a contour plot of the function. Color, or in some other manner identify, the optimal point. Illustrate the gradient search technique, starting from the initial point (0, 0). Only perform two or three iterations, showing the gradient, the distance traveled, and the new point. There is no requirement to obtain the optimal solution using this method. Part II Above we assumed that the company has the potential to produce any number of TV sets per year. Now we realize that there is a limit on production capacity. Consideration of these two products came about because the company plans to discontinue manufacturing of its black-and-white sets,

Necessary/Sufficient Kuhn–Tucker Conditions



229

thus providing excess capacity at its assembly plants. This excess capacity could be used to increase production of other existing product lines, but the company feels that these new products will be more profitable. It is estimated that the available production capacity will be sufficient to produce 10,000 sets per year (about 200 per week). The company has ample supply of 19-inch and 21-inch color tubes, chassis, and other standard components; however, circuit assemblies are in short supply. Also the 19-inch TV requires different circuit assemblies than the 21-inch TV. The supplier can deliver 8,000 boards per year for the 21-inch model and 5,000 boards per year for the 19-inch model. Taking this new information into account, what should the company now do? Required 1. Solve the above as a Lagrange Multiplier problem with equality (=) constraints. Consider only the 10,000 sets per year as the constraint. Interpret the shadow prices. 1. Solve assuming the above constraints are each inequality ( 0 and we move a small distance away from x 0 in the direction of d; the f(x) will increase. We choose to move away from x 0 in the direction of d 0 − x 0, where d 0 is the optimal solution to the linear program:

( )

( )

Max z = ∇f x 0 ⋅ d s.t. Ad ≤ b d  ≥ 0

Specialized Nonlinear Optimization Methods



235

For a more detailed explanation, see Winston (1995). Next, we choose our new point, x1 to be x 1 = x 0 + t 0(d 0 − x 0), where t 0 solves the following:

(

(

Max f x 0 + t 0 d 0 − x 0

))

0 < t0 < 1 We now choose to move away from x1 in the direction d 1 −x 1. We find d 1 by solving the following LP:

( )

Max z = ∇f x 1 ⋅ d s.t. Ad < b d  > 0 Then we choose our new point, x 2 to be x 2 = x1 + t1(d 1 − x1), where t0 solves the following:

(

(

Max f 10 + t1 d 1 − x 1

))

0 < t1 < 1 We terminate when the change is close to zero or less than given tolerance. The version that we described was developed by Frank and Wolf and more information can be found in Bazarra and Shetty (1993). Example 10.1 Solve the following by the Method of Feasible Directions Max z = f ( x, y ) = 2xy + 4x + 6 y − 2x 2 − 2 y 2 s.t.  x + y ≤ 2 x, y > 0 We begin by obtaining the following plot (Figure 10.2). The solution is approximately (0.90, 1.25) from the plot. For our illustration of the Method of Feasible Directions, we will begin at (0,0). We show a few steps by hand. ∇f ( x , y ) = ⎡⎣ 2 y − 4 x + 4, 6 + 2 x − 4 y ⎤⎦ At our starting point (0,0), ∇f ( 0,0 ) = [ 4,6 ]. We find our direction to move from [0, 0] by solving the LP

236 ◾

Nonlinear Optimization

Solution is approximately here.

Figure 10.2

Plot for Example 9.1. Max z = 4d1 + 6d 2 s.t. d1 + d 2 ≤ 2d1 , d2 ≥ 0

The optimal solution is d1 = 0 and d2 = 2. Thus, we now compute x 1 = [0, 0] – t 0 [0, 2], where to solves the optimization problem Max f ( 0,2t ) = 12t − 8t 2 0 ≤t ≤1 We take the derivative of +12t − 8t 2 , which is 12 – 16t and set it equal to 0 for t = 0.75. Since the second derivative is −16 < 0, we know we have found the maximum. Hence, x 1 = [0, 1.5]. The functional value at [0, 1.5] is z = 4.5. Now, ∇f ( 0,1.5) = [ 7,0 ] . We repeat the steps to find x 2 and continue until we terminate the procedure. The final solution to this problem is z = 8.17, with x = 0.83 and y = 1.17. Table 10.2 summarizes these steps for a few iterations. Since we have a graphical view, we most likely would not start at (0, 0), perhaps (1, 1) would be a better starting guess. You will be asked to solve a few steps from this starting point in the exercise set. A possible methodology was presented by Combs and Fox (1995). The following assumptions are critical to this starting point algorithm: 1. ∇f is perpendicular to the level curves of f. 2. The tangent vector, t, can be found using ∇f .

Specialized Nonlinear Optimization Methods



Table 10.2 Summary of Feasible Directions for f(x,y) from (0, 0) Point

Direction

T

New x

z

(0, 0)

[0, 2]

0.75

[0, 1.5]

4.5

(0, 1.5)

[7, 0]

14/37 = 0.378378

[0.76, 0.93]

7.15

(0.76, 0.93) …



(0.83, 1.17)

8.17

3. For many problems, the solution is found where a constraint is tangent to the level curves of f. 4. If there is an optimal solution, there will always be a point where one of the constraints is tangent to the same level curves as t. Starting Point Algorithm 1. 2. 3. 4. 5.

Find ∇f and ∇g the gradient of one of the constraints. Compute the tangent vector, t. Solve the dot product ∇g. T = 0 Use the remaining equations to set up a system of equations. Solve the system of equations for a new starting point, x 0.

Example 10.2 z = 3 xy + 40x + 30 y − 4x 2 − x 4 − 3 y 2 − y 4 s.t. 4x + 3 y ≤ 12 x + 2y ≤ 4 x, y ≤ 0 Solution 3 3 ∇f = ⎡3 ⎣ y + 40 − 8 x − 4 x ,3 x + 30 − 6 y − 4 y ⎤⎦

∇g = [ 4,3]

(

)

(

t g = 4 * 3 y + 40 − 8 x − 4 x 3 + 3 * 3x + 30 − 6 y − 4 y 2

)

We solve tg = 0 and x + 2y = 4. We obtain our starting point as (1.36753, 1.31623). Using this starting point, we summarize the feasible directions results.

237

238 ◾

Nonlinear Optimization

Point

∇z

d

t

New Point

0

(1.36753, 1.31623)

[22.7785, 17.0839]

[2.4, 0.8]

0.305627

(1.68308, 1.15845)

1

(1.68308, 1.15845)

[10.94075, 21.870]

[2.4, 0.8]

1.25933 × 10−7

(1.68308, 1.1585)

Iteration

We are done since there was no change in our point. We find the actual solution via Maple: with(Optimization); f := -x^4-y^4-4*x^2+3*x*y-3*y^2+40*x+30*y; f := -x4 - y4 - 4 x2 + 3 x y - 3 y2 + 40 x + 30 y g1con := {x >= 0, y >= 0, x+2*y (y−xk) < 0, that is, the vector dk = yk − xk is a descent direction. It remains only to decide how to choose the stepsize. This is done by choosing αk such that f(xk + αkdk) < f(xk). Here, however, we must restrict the stepsize to be at most 1 since the points determine dk and are extreme points and hence, for α > 1 the points on the line are infeasible. Hence the stepsize is chosen by a line search: min α ∈[0,1] f ( xk + α dk ) . Remark: Notice that, since at each step xk is fixed, minimizing ∇f(xk) > (y − xk) with respect to y is the same as the problem of minimizing ∇f(xk) > y. We use

Specialized Nonlinear Optimization Methods



241

this latter formulation in the example below. As a simple example, let us consider the problem: min z = x 4 − 32x + y 2 − 8 y s.t.   x − y ≤1 3x + y ≤ 7 x ≥ 0, y ≥ 0 It is easy to check that the global minimum of f occurs at (x, y) = (2, 4) which gives a value of −64. However, we wish to find the constrained minimum. We note, before beginning, that the extreme (corner) points of the polygonal region are (0, 0), (1, 0), (2, 1), and (0, 7). We begin by choosing the feasible point (0, 0). Since the partial derivatives of f are given by f x(x, y) = 4x 3 − 32 and f y(x, y) = 2y − 8, the gradient of f at (0, 0) is ∇f(0, 0) ≥ (−32, −8)> so that we need to minimize z 0(x, y):= (−32, −8) · (x, y) ≥ −32x − 8y. This is a linear form and its minimum will occur at one of the extreme points. Here z 0(0, 0) = 0, z 0(0, 7) = −56, z 0(1, 0) = −32 and z 0(2, 1) = −72. Hence, the point (2, 1) is the minimizing solution for the linear program. Now we look at the line segment joining (0, 0) with (2, 1). We parameterize this line segment by (1 − t)(0, 0) + t(2, 1) = (2t, t), 0 ≤ t ≤ 1. The values of f on the line are given by −32 (2t) + (2t) 2 − 8t + t 2 = t 2 + 16t4 − 72t. The maximum value occurs at t = 1, and hence, the new trial solution is (2, 1). We now compute the gradient of f at this new point, which yields ∇f(2, 1) = (0, 6)> and now look at the linear programming problem of minimizing z1(x, y) = −6y. Clearly, the minimum value occurs at the extreme point (0, 7) and z1(0, 7) = −42. We continue until we find an acceptable solution. We illustrate with the following problem. Example 10.3 Min z = − x1 − x 2 + 0.5 x12 + x 2 2 − x1 x 2 s.t.

x1 + x 2 ≤ 3 − 2 x1 − 3 x 2 ≤ −6 x1 , x 2 ≥ 0

The objective function may be shown to be convex and the constraints are all linear (convex). We start by setting up the Kuhn–Tucker conditions from Chapter 9.

242 ◾

Nonlinear Optimization

x1 − 1 − x 2 + λ1 − 2λ2 − e1 = 0 2 x 2 − 1 − x1 + λ1 − 3λ2 − e 2 = 0 x1 + x 2 + s1 = 3 2 x1 + 3 x 2 − e 2 = 6 All variables are non-negative, and we know from complementary slackness that

λ2e 2 = 0 λ1s1 = 0 e1 x1 = 0 e 2 x 2 = 0 Wolfe’s method simply applies a modified version of Phase 1 of the simplex procedure of the two-phase method (see Winston, 1995). We must first use artificial variables because of the constraints and in Phase 1 we try to extricate the artificial variables from the basis. The modifications from Wolfe are a. Never perform a pivot that would make both ei and xi from the ith constraint both basic variables. b. Never perform a pivot that would make both si and λi from the ith constraint both basic variables. Therefore, to use Wolfe’s method we must solve for the following LP: Min w = a1 + a2 + a2′ s.t.  x1 − x 2 + λ1 − 2λ2 − e1 + a1 = 1 2 x 2 − x1 + λ1 − 3λ2 − e 2 + a2 = 1 x1 + x 2 + s1 = 3 2 x1 + 3x 2 − e 2′ + a2′ = 6 All variables are non-negative. We put these in a spreadsheet to assist us in our pivots.

z

x1

x2

l1

l2

s1

e1

e2

e 2′

a1

a2

a′2

RHS

1

2

4

2

−5

0

−1

−1

−1

0

0

0

8

a1

0

1

−1

1

−2

0

−1

0

0

1

0

0

1

a2

0

−1

2

1

−3

0

0

−1

0

0

1

0

1

s1

0

1

1

0

0

1

0

0

0

0

0

0

3

a2′

0

2

3

0

0

0

0

0

−1

0

0

1

6

0

0

1

0

3.5

0

−2.77556E−16

−1.11022E−16

−1.11022E−16

0

0

1

s1

a2′

1.666666667 1.666666667

x2

8.88178E−16

−1.11022E−16

1

0

x1

−2.22045E−16

0

−5.55112E−17

−4.44089E−16

1

0

z

1

0

0

0

0

l2

0

1

0

−1 1

−0.333333333

−1

0.333333333

0.333333333

−4

s1

0

0

0

1.5

0.5

−0.5

0.6666667

2.3333333

0

0

0

−1

−1

e1

e2

−0.285714

0.4285714

−0.142857

0.1428571

0.1428571

−1

0

0

0

−1

e2′

0.3333333

−0.333333

−0.333333

−0.666667

−0.666667

0.4285714

−0.142857

−0.285714

−0.714286

−0.714286

0.3333333

−0.333333

−0.333333

0

1

0

−1

−1

0

0

0

−0.5

1

−1 −1

e2

e1

0

1

0

0

0

e2′

0

0

0

1

0

0

0

0

1

0

a1

0

0

0

1

0

a1

1

0

0

0

0

a′2

−0.333333

0.3333333

0.3333333

0.6666667

0

−1

0

0

−1

a′2

0.2857143

−0.428571

0.1428571

−0.142857

−1.142857

−0.333333

a2

−0.428571

0.1428571

0.2857143

0.7142857

−0.285714

−1.5

−0.5

0.5

0.5

−2

a2

1.666667

1.333333

1.333333

0.666667

0.666667

RHS

1.285714

0.571429

1.142857

0.857143

0.857143

4.5

2.5

0.5

1.5

6

RHS



0

l2 −4

1.285714286

−0.428571429

l1

−0.428571429

0.142857143

−0.857142857

s1

0.285714286

0

0

0

−4.142857143 −4.142857143

0

1

4.5

x2

1.714285714

1.714285714

−1.5

1.5

−0.5

0

0

0

0

0

−3.5 −1.5

0

s1

1

l2

0.5

1.5

0

l1

a1

1.5

1

−0.5

0

x2

0

0.5

0

0

a1

4

x2

1

x1

z

z

Specialized Nonlinear Optimization Methods 243

1

0

0

0

0

a1

x2

s1

a2′

x1

z

z

2

0

a2′

3.5

0

0

1

−0.5

1.5

0

0

x2

3

1

0.5

4

1

0

s1

0

−5

−1.5 1.5 4.5

−0.5

−1.5

0

1

0

0

−3.5

s1

0

1

0

0

l2

0

0

−3

0

s1

0.2

e1

0

0

0

1.5

0.5

−0.5

−0.5

1

−1

0

0

−1

0

−1

e2

e2

−1

e1

a1

e2′

−1

0

0

0

0

0

0

1

0

0

−1

−1

0

0

1

0

a1

0

1

0

0

0

e2′

0

0

0

−1

0.2

−0.2

e2′

−0.2

−0.2

−0.4

0

e2

0.2

0.2

−0.6

2.22045E−16

e1

0

0

0

−1

−1

0.6

2.4

0.4

−0.2 −0.2

−0.2

0

s1

−2.4

l2

−2

l2 4.44089E−16

1

0

0

0

0.5

1.5

0

l1

0

0

1

2

−1

0

a2

1

−1

0

a1

1

l1

2

x2

0

0

1

4

2

1

1

0

x1

x1

4.44089E−16

0

e2′

z

1.11022E−16

0

x2

1

−5.55112E−17

5.55112E−17

0

l1

−2.22045E−16

0

−2.22045E−16

1

l1

x2

x1

z

z

−1.5

−0.5

0.5

0.5

−2

a2

0

0

1

0

0

a2

0.2

−0.2

−0.2

0.6

−1

a1

−0.2

0.2

0.2

0.4

−1

a2

1

0

0

0

0

a′2

1

0

0

0

0

a′2

0

−1

0

0

−1

a′2

4.5

2.5

0.5

1.5

6

RHS

6

3

1

1

8

RHS

1.8

1.2

1.2

0.4

0

RHS

244 ◾ Nonlinear Optimization

1.666666667 1.666666667

x2

8.88178E−16

−1.11022E−16

1

0

−1.11022E−16

1

x1

−2.22045E−16

0

−5.55112E−17

0

0

z

s1

l2

1.285714

−0.428571429

−2.22045E−16 1

x2

0

−5.55112E−17

1

x1

0

0

z

e2′

x1

5.55112E−17

1.11022E−16

0

0

0

0

l1

x2

e2′

x1

1

0

0

1

0

0

0

s1

0.666667

2.333333

0.333333

0.2

0.6

2.4

0.4

−0.2 −0.2

−0.2

−2.4

0

s1

e1

0.333333

−0.333333

−0.333333

−0.666667

−0.666667

e2

−0.285714

0.428571

−0.142857

0.142857

0.142857

e2′

−0.2 0.2

−0.2

−0.2

−0.4

0

e2

0

1

0

0

0

e2′

0

0

0

1

0

a1

0.2

0.2

−0.6

2.22045E−16

0

0

0

−1

−1

e1

0.428571

−0.142857

−0.285714

−0.714286

−0.333333

l2

e2 −0.714286

−0.333333

0

0

0

−1

4.44089E−16

1

−0.333333333

l1

−1

0.333333333

−1

−4

−4

l2

0

1

0

0

−1

e1

0

1

0

0

0

e2′

0

0

0

1

0

a1

a2

0.2

−0.2

−0.2

0.6

−1

a1

a2

−0.2

0.2

0.2

0.4

−1

−0.333333

0.333333

0.333333

0.666667

0

−1

0

0

−1

a′2

0.285714

−0.428571

0.142857

−0.142857

−1.142857

a′2

−0.333333

−0.428571

0.142857

0.285714

0.714286

−0.285714

a2

0

−1

0

0

−1

a′2

1.8

1.2

1.2

0.4

0

RHS

1.666667

1.333333

1.333333

0.666667

0.666667

RHS

1.285714

0.571429

1.142857

0.857143

0.857143

RHS



4.44089E−16

−2.22045E−16

1

z

0

0

0

x2

−4.44089E−16

0

a1 0.333333333

1

l1

−0.428571

−0.857143

0.142857143

0.285714286

z

0

1

−1.11022E−16

0

x2

1.714285714

−4.142857

0

−2.77556E−16

0

0

−4.142857

a1

1.714285714

0

s1

l2

0

l1

1

x2

x1

z

Specialized Nonlinear Optimization Methods 245

246 ◾

Nonlinear Optimization

We have ended Phase 1, z = 0, and now only need to substitute our solutions to our variables back into the objective function to obtain z, z = −2.1 when x1 = 1.8, x 2 = 1.2 (Figure 10.3).

Exercises Use Wolfe’s method to solve the following QPP. 1. Min z = 2 x12 − x 2 s.t.  2x1 − x 2 ≤ 1 x1 + x 2 ≤ 1 x1 , x 2 ≥ 0 2. Min z = x1 + 2x 22 s.t. x1 + x 2 ≤ 2 2 x1 + x 2 ≤ 3 x1 , x 2 ≥ 0

Solution is approximately here.

Figure 10.3

QPP example.

Specialized Nonlinear Optimization Methods



247

3. max : −(x − 6)2 − ( y − 8)2 −x + 2 y ≤ 4

s .t .

2 x + 3 y ≤ 12 x, y ≥ 0

10.4 Separable Programming We have previously shown that many NLPs are of the form Max ( or min ) z = 

n

∑f (x ) j

j

j=1

n

∑g ( x ) ≤ b

s.t.

ij

j

i(i =1,2,…m )

j=1

Since the decision variable appears in separate terms of both the objective function and constraints, this NLP is called a separable programming problem. By approximating each f j(xj) and gij(xj) with piecewise linear functions, we can describe the separable methodology. Separable programming problems contain no interaction terms. Separable programming is important because it allows a convex nonlinear program to be approximated with arbitrary accuracy with a linear programming model. The idea is to replace each nonlinear separable function with a piecewise linear approximation. The problem can be reformatted as a linear approximating problem (see Bazzara et al., 1993; Winston, 1995). The new generic problem is as follows: max ( or min ) =

n

∑ ⎡⎣δ

j1

f j ( p j 1 ) + δ j 2 f j ( p j 2 ) + $ + δ jk f j ( p jk ) ⎤⎦

j=1

n

s.t.

∑ ⎣⎡δ

g

j1 ij

(p )+δ j1

g

j 2 ij

( p ) +$+ δ j2

j=1

∂ j 1 + ∂ j 2 + $  + ∂ jk = 1 ( j = 1,2,…, n ) ∂ jr ≥ 0 ( j = 1,2,…n, r = 1,2,…,k )

g

jk ij

(p ) jk

≤ bi  (i = 1,2,…,m )⎤⎦

248 ◾

Nonlinear Optimization

10.4.1 Adjacency Assumptions In order to ensure some accuracy of our linear approximation in the new objective function and constraints, we must ensure that for each j (j = 1, …, n) that at most two δjk ’s are positive. Assume that for a given j, that two δjk ’s are positive. Since δjk is positive, then the other positive decision variable must be either δjk−1 or δjk+1 . We say that one is adjacent to the other. Thus, the adjacency assumption says that if two δjk ’s are positive, then they must be adjacent. The solution process is both iterative and interactive since the solution is an approximation. The approximation improves with each iteration. The adjacency assumption is controlled interactively. There are two situations in which the ordinary simplex method can be employed to find a solution to the approximating problem (Bazarra et al., 1993; Winston, 1995). They are (1) the separable problem is a maximization problem and each separable function, f j(xj), is concave and each gij(xj) is convex, or (2) the separable problem is a minimization problem and each separable function, f j(xj), is convex and each gij(xj) is convex. Under these the adjacency assumptions are automatically satisfied by the simplex method.

10.4.2 Linearization Property Let’s consider a nonlinear function, f(x), as shown in Figure 10.4. To form a piecewise linear approximation using, say, r line segments, we must first select r + 1 values of the scalar x within its range 0 ≤ x ≤ u (call them, x , x ,#, x ) and let f = f ( x) for k = 0,1, …, r. At the boundaries we have x0 = 0 and xr = u. Notice that the values of xk do not have to be evenly spaced. As an example of the linearization property let us now consider the NLP function in Example 10.4. Example 10.4 Consider the following NLP, f ( x1 , x 2 ) = x1 ( 30 − x1 ) + x 2 ( 35 − x 2 ) − x12 − 2x 22 subject to x1 + x 2 ≤ 20,  x12 + 2 x 22 ≤ 250 x1 , x 2 ≥ 0. If we look at the bounds in the constraint x1+x 2 < 20, we find x1 and x 2 are both 0 < xi < 20, which we will use in our example. We will break the linear segments in to five segments {0, 5, 10, 15, 20}. Naturally, as we increase r then accuracy from the approximation gets better. This makes the resulting linear programming much larger. Therefore, we might

Specialized Nonlinear Optimization Methods

Figure 10.4

◾ 249

Piecewise linear approximation of a nonlinear function.

solve sub-problems starting with a wider interval and then decreasing the interval from previous solutions. Thus, in this example, we first solve for the solution between [0, 20] and after obtaining the solution we narrow the interval. We require that if two αk are greater than zero, their indices must differ by exactly 1. In other words, if αs is greater than zero, then only one of either αs+1 or αs−1 can be greater than zero. If the last condition, known as the adjacency criterion, is not satisfied, the approximation to f(x) will not lie on fˆ ( x ). We show the set-up and application in Example 9.5. The reason that this is an “almost” linear programming problem is that the adjacency criterion must be imposed on the new decision variables αkj when any of the functions are nonconvex. This can be accomplished with a restricted basis entry rule. When all the functions are convex, the adjacency criterion will automatically be satisfied so no modifications of the simplex algorithm are necessary. Note that the approximate problem has m + n constraints and Σj rj + n variables. From a practical point of view, one might start off with a rather large grid and find the optimum to the corresponding approximate problem. This should be easy to do, but the results may not be very accurate. To improve on the solution, we could then introduce a smaller grid in the neighborhood of the optimum and solve the new problem. CONVEX PROGRAMMING PROBLEMS These observations stem directly from the fact that the separable programming method guarantees an approximate global optimum to the original problem only

250



Nonlinear Optimization

when one is minimizing a convex function (maximizing a concave function) over a convex set. When these conditions hold, the accuracy of the approach is limited only by the coarseness of the piecewise linear approximations that are used. Furthermore, when solving a convex programming problem, we may solve the approximate problem using ordinary linear programming methods without enforcing the adjacency restrictions. NONCONVEX PROGRAMMING If the conditions that define a convex program are not present, several outcomes may occur. An approximate global optimum is found (as in the minimization example above). An approximate local optimum is found that is not the global optimum. The solution to the approximate problem may be infeasible with respect to the original problem or be nowhere near a corresponding local or global optimum. These outcomes are due to the fact that an insufficient number of lines segments were chosen for the approximation. In many cases, however, infeasible solutions will be only slightly infeasible, and thus present no practical difficulty. Notwithstanding the possibility of obtaining erroneous results, separable programming methods have proven to be very useful in a variety of practical applications. In addition, it is possible to modify the basic transformations by introducing integer variables and related constraints so that approximate global optima are always obtained, regardless of the convexity of the original problem. An experimental code called MOGG was developed by Falk and Soland (1969) along these lines. Unfortunately, the modified formulation often yields problems that are extremely difficult to solve within acceptable time limits. Example 10.5 Consider the following NLP, f ( x1 , x 2 ) = x1 ( 30 − x1 ) + x 2 ( 35 − x 2 ) − x12 − 2x 22 subject to x1 + x 2 ≤ 20,  x12 + 2 x 22 ≤ 250 x1 , x 2 ≥ 0. We note that we cannot use feasible directions nor QPP because the constraints are nonlinear. We define our simplified separable functions as follows: f 1 ( x1 ) = 30x1 − 2x12 f 2 ( x 2 ) = 35x 2 − 3x 22 g11 ( x1 ) = x12 g 12 ( x 2 ) = 2x 22 g 21 ( x1 ) = x1 g 22 ( x 2 ) = x 2

Specialized Nonlinear Optimization Methods

Figure 10.5



251

Plot for separable program example.

Since it appears as though our linear function bounds the region (see Figure 10.5), we decide to break it into five intervals {0, 5, 10, 15, 20}. We used constraint (2), x1 + x 2 < 20 to obtain these intervals. We could have used constrain (1) and found intervals of 0 to 15.8 for x1 and 0 to 11.18 for x 2. First, we want to evaluate our separable functions at these five points and substitute to obtain the following LP:

Max z = 0d11 + 100d12 + 100d13 + 0d14 − 200d15 + 0d 21 + 100d 22 + 50d 23 − 150d 24 + 500d 25 s.t. d11 + d12 + d13 + d14 + d15 = 1 d 21 + d 22 + d 23 + d 24 + d 25 = 1 25d12 + 100d13 + 225d14 + 400d15 + 50d 22 + 200d 23 + 450d 24 + 800d 25 < 250 5d12 + 10d13 + 15d14 + 20d15 + 5d 22 + 10d 23 + 15d 24 + 20d 25 < 20  All d jk ≥ 0

0

0

0

0

d11

d21

s3

s4

0

0

0

1

0

d11

5

25

0

1

−100

d12

10

100

0

1

−100

d13

15

225

0

1

0

d14

20

400

0

1

200

d15

0

0

1

0

0

d21

5

50

1

0

−100

d22

10

200

1

0

−50

d23

15

450

1

0

150

d24

20

800

1

0

−500

d25

0

1

0

0

0

s3

1

0

0

0

0

s4

0.0000

0.0000

0.0000

0.0000

−25.0000

−5.0000

0.0000

0.0000

0.0000

d22

s3

s4

1.0000

1.0000

0.0000

0.0000

d12

100.0000

d12

1.0000

d11

z

z

Final Tableau

5.0000

75.0000

0.0000

1.0000

0.0000

d13

10.0000

200.0000

0.0000

1.0000

100.0000

d14

15.0000

375.0000

0.0000

1.0000

300.0000

d15

−5.0000

−50.0000

1.0000

0.0000

100.0000

d21

0.0000

0.0000

1.0000

0.0000

0.0000

d22

5.0000

150.0000

1.0000

0.0000

50.0000

d23

d25

10.0000

400.0000

1.0000

0.0000

15.0000

750.0000

1.0000

0.0000

250.0000 −400.0000

d24

0.0000

1.0000

0.0000

0.0000

0.0000

s3

1.0000

0.0000

0.0000

0.0000

0.0000

s4

RHS

2

250

1

1

0

RHS

−8.0000

175.0000

1.0000

1.0000

200.0000

We have as basic variables: d11, d21, s3 and s4. So even though in the normal simplex d25 wants to enter, it cannot, due to adjacency rules. Our final tableau is

1

z

z

We have our initial tables 0 as:

252 ◾ Nonlinear Optimization

Specialized Nonlinear Optimization Methods

◾ 253

The approximate solution is d12 = 1, d22 = 1 which implies x1 = 5, x 2 = 5, Z = 200 The actual solution is x1 = 7.5, x 2 = 5.83, and z = 214.58. Example 10.6 Consider the following separable programming problem: Maximize f ( x1 , x 2 ) = x1 − x12 + 3x 2 − x 22 s.t 2 x14 + x 2 ≤ 32 x1 + 2 x 22 ≤ 32 x1 , x 2 ≥ 0 First, we assign our linear points x1 ={0,1,2} and x 2 = {0,1,2,3,4}. The linear model can be written as follows: Maximize Z = 0w11 + 4w12 + 6w13 + 0w 21 + 2w 22 + 0w 23 + 0w 24 − 4w 25 s.t. 0w11 + 2w12 + 32w13 + 0w 21 + w 22 + 2w 23 + 3w 24 + 4w 25 + s1 = 32 0w11 + w12 + 2w13 + 0w 21 + 2w 22 + 8w 23 + 18w 24 + 32w 25 + s 2 = 32 w11 + w12 + w13 = 1 w 21 + w 22 + w 23 + w 24 + w 25 = 1 w jk ≥ 0 Recall that in the solution process for any j, no two wjk can be positive, and if two are positive, then they must be adjacent. We use Excel to perform our updated tableaus. Tableau 1 z

w11

w12

w13

w21

w22

w23

w24

w25

s1

s2

RHS

z

1

0

−4

−6

0

−2

−2

0

4

0

0

0

s1

0

0

2

32

0

1

2

3

4

1

0

32

s2

0

0

1

2

0

2

8

18

32

0

1

32

w11

0

1

1

1

0

0

0

0

0

0

0

1

w21

0

0

0

0

1

1

1

1

1

0

0

1

w13 enters and w11 leaves.

1

0

0

0

0

z

s1

s2

w13

w21

−1

−2

0

0

0

0

0

w22

s2

w13

w21

−30

−32

32

1

30

1

59

−58

−58

62

w12

w11

Now, w12 enters and w 21 leaves.

1

z

Z

Tableau 3

0

1

−30

−32

1

2

w12

6

w11

0

1

0

0

0

w13

0

1

0

0

0

w13

1

0

0

0

0

w21

1

0

0

0

0

w21

0

0

0

1

0

w22

1

0

2

0 −2

−1

12

3

6

w24

1

0

18

3

0

w24

0

4

2

2

w23

1

0

8

2

−2

−2 1

w23

w22

−3

0

24

4

12

w25

1

0

32

4

4

w25

−1

0

0

1

−2 0

0

0

s2

0

0

1

0

0

s2

1

2

s1

0

0

0

1

0

s1

1

1

30

0

6

RHS

1

1

30

0

6

RHS



Now, w 22 enters and s1 leaves.

z

T

Tableau 2

254 Nonlinear Optimization

1.0000

0.0000

0.0000

0.0000

0.0000

z

w22

s2

w13

w12

z

Tableau 4

0.0000

−0.0667

1.0000

0.0000

−0.9333

1.0667

0.0000

0.0000

w12

0.0000

3.8667

w11

0.0000

1.0000

0.0000

0.0000

0.0000

w13

0.0333

−0.0333

−1.9667

1.0000

1.9333

w21

0.0000

0.0000

0.0000

1.0000

0.0000

w22

0.0667 −0.0667

−0.0333

15.9333

1.0000

2.1333

w24

0.0333

5.9667

1.0000

0.0667

w23

−0.1000

0.1000

29.9000

1.0000

6.2000

w25

−0.0333

0.0333

−0.0333

0.0000

0.0667

s1

0.0000

0.0000

1.0000

0.0000

0.0000

s2

0.0333

0.9667

28.0333

1.0000

7.9333

RHS

Specialized Nonlinear Optimization Methods ◾ 255

256 ◾

Nonlinear Optimization

We are now optimal. The solutions to our approximate problem are x1 = 1.9667, x 2 = 1, and Z = 7.96556. The actual solution is z = 8.2264 when x1 = 1.976276 and x 2 = 1.49151856 (found with NLPSolve using Maple).

Example 10.7 Consider the following: Maximize f ( x ) = 20x1 + 16x 2 − 2x12  − x 22 − x 32 subject to : x1 + x 2 ≤ 5, x1 + x 2 − x 3 = 0, x1 ≥ 0, x 2 ≥ 0, x 3 > 0. The objective function is now written as f(x) = f 1(x1) + f 2(x 2) + f 3(x 3), where f 1 ( x1 ) = 20x1 − 2x12, f 2 ( x 2 ) = 16x 2 − x 22, and f 3 ( x 3 ) = −x 32. Thus, it is separable. Clearly, the linear constraints are also separable. We note that the constraints imply that x1 ≤ 5, x 2 ≤ 5, and x 3 ≤ 5, so that we need not extend the approximation beyond these bounds on the variables. The formulation is as follows: Maximize z = 0λ10 + 18λ11 + 42λ12 + 50λ13 + 0λ20 + 39λ21 + 55λ22 − 0λ30 − 4λ31 − 25λ32 , subject to : 0λ10 + 1λ11 + 3λ12 + 5λ13 + 0λ20 + 3λ21 + 5λ22 ≤ 5,0λ10 + 1λ11 + 3λ12 + 5λ13 + 0λ20 + 3λ21 + 5λ22 − 0λ30 − 2λ31 − 5λ32 = 0, λ10 + λ11 + λ12 + λ13 = 1λ20 + λ21 + λ22 = 1λ30 + λ31 + λ32 = 1

λij ≥ 0, for all i and j .

Since each of the functions f 1(x 1), f 2(x 2), and f 3(x 3) is concave, the adjacency condition can be ignored and the problem can be solved as a linear program. We use Excel, and solving by the simplex method gives an optimal objective value of 44 with λ11 = λ12 = 0.5, λ21 = 1, and λ32 = 1 as the positive variables in the optimal solution. The corresponding values for the original problem variables are x 1 = (0.5)(1) + (0.5)(3) = 2, x 2 = 3, x 3 = 5 and z = 46 by direct substitution. This solution should be contrasted with the true solution x 1 = 7.3, x 2 = 8.3, x 3 = 5, and f (x 1, x 2 , x 3) = 46.33333. Once the approximation problem has been solved, we can obtain a better solution by introducing more breakpoints. Usually more breakpoints will be added near the optimal solution given by the original approximation. Adding a single new breakpoint at x 1 = 2 leads to an improved approximation for this problem with a linearprogramming objective value of 46 and x 1 = 2, x 2 = 3, and x 3 = 5. In this way, an approximate solution can be found and iterated as close as desired to the actual solution.

Specialized Nonlinear Optimization Methods



257

Exercises 1. max − ( x − 6 ) − ( y − 2 ) 2

2

subject to : −x + 2 y ≤ 4 x 2 + y 2 ≤ 14 0 ≤ x, 0 ≤ y 2. max − ( x − 6 ) − ( y − 8) 2

2

subject to : −x + 2 y ≤ 4 x 2 + y 2 ≤ 14 0 ≤ x, 0 ≤ y

References and Suggested Reading Alvarez-Vasquez, F., Gonzales-Alcon, C., & Torres, N. (2000). Metabolism of Citric Acid Production by Aspergillus Niger: Model Definition, Steady-State analysis and constrained optimization of Citric Acid Production Rate. New York: John Wiley & Sons. Bazaraa, M.S., Sherali, H., & Shetty, C. (1993). Nonlinear Programming Theory and Algorithms. New York: John Wiley & Sons. Bellman, R. (1952). On the theory of dynamic programming. Proceedings of the National Academy of Sciences, Rand Corporation, Santa Monica, CA. Bellman, R. (1957). Dynamic Programming. Princeton, NJ: Dover. Bonyadi, M.R., & Michalewicz, Z. (2016). Particle swarm optimization for single objective continuous space problems: A review. Evolutionary Computation, pp. 1–54. doi: 10.1162/EVCO_r_00180 PMID:26953883. Canale, R., & Charpa, S. (2010). Numerical Methods for Engineers. London, UK: McGraw Hills Publishers. Ecker, J., & Kupperschmid, M. (1988). Introduction to Operations Research. New York: John Wiley & Sons. Fox, W.P. (1990). Application of Pseudo Boolean Models to Weapon System Design (Dissertation). Clemson University. Fox, W.P. (1992a). Quadratic nonlinear programming with Minitab, Computers in Education (COED) Journal, II(1), pp. 80–84. Fox, W.P. (1992b). Separable nonlinear programming with Minitab, COED Journal, II(4), pp. 30–36.

258 ◾

Nonlinear Optimization

Fox, W.P. (1993). Using microcomputers in undergraduate nonlinear optimization, Collegiate Microcomputers, XI(3), pp. 214–219. Fox, W.P. (2012). Mathematical Modeling with Maple. Boston, MA: Cengage Publishers. Fox, W.P., & Combs, T. (1995). Finding a better staring point for the method of feasible directions, Computers in Education Journal, V(2),pp. 27–32. Giordano, F., Fox, W., & Horton, S. (2012). A First Course in Mathematical Modeling, 5th Ed. Boston, MA: Cengage Publishers. Hillier, F., & Lieberman, G. (1995). Introduction to Operations Research. New York: McGraw-Hill Co. Homaifar, A., Lai, S., & Qi, X. (1994). Constrained optimization via genetic algorithms, Simulation, 62(4), pp. 242–254. doi: 10.1177/003754979406200405. Kennedy, J. (1997). The particle swarm: Social adaptation of knowledge, Proceedings of IEEE International Conference on Evolutionary Computation, pp. 303–308. doi: 10.1109/ ICEC.1997.592326. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization, Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948. Retrieved from http:// www.engr.iupui.edu/~shi/ Conference/psopap4.html. Kirkpatrick, S., Gelatt, C.D., & Vecchi, M (1983). Optimization by simulated annealing, Science, 220(4598), pp. 671–680. Nelder, J. & Mead, R. (1965) A simplex method for function minimization: Errata, Computer Journal, 8, p. 27. Shim, J. (1983). A survey of quadratics programming and business and economics, International Journal of System Science, 14(1), 105–115. Stefanov, S. (2013). Separable Programming: Theory and Models. New York: Springer Science and Business. Winston, W. (1995). Introduction to Mathematical Programming, 2nd Ed. Belmont, CA: Duxbury Press. Zoutendijk, G. (1960). Methods of Feasible Directions. Princeton, NJ: Elsevier, Amsterdam and D. Van Nostrand.

Chapter 11

Dynamic Programming 11.1 Introduction: Basic Concepts and Theory In many real-world problems, decisions have to be made sequentially at different points in time or at different levels of components, subsystems, or major end items. The army’s acquisition process is an example of a real-world sequential decision process. Since these decisions are made at a number of stages, they are often referred to as multistage decision problems. Dynamic programming (DP) is a mathematical technique that is well suited for this class of problems. This technique was developed by Richard Bellman in the early 1950s. Thus, this methodology is relatively new in the world of mathematics. DP can be used to solve both discrete and continuous nonlinear, multistage problems. Most texts present only the discrete DP algorithms. We desire an exposure to the continuous DP model, so we will begin with continuous DP. The definitions and concepts that we define are valid in both the discrete and continuous problems. We will also examine some discrete problems. There does not exist a standard mathematical formulation for any DP problem. DP is a general approach to solving problems that need to be solved in stages. Therefore, a certain amount of ingenuity and creativity are required to formulate and solve a DP problem. Exposure to examples increases one’s ability to formulate and solve DP problems. Let’s begin with a more general framework and an easier example. If we are going to consider DP as a multistage decision process, then let’s closely examine one of the stages. Figure 11.1 illustrates a single-stage decision problem. A  decision process of this type is characterized by Inputs, Decision Variables, Outputs (for the next stage), and the Return function, which measures the effectiveness of the current decisions. Multistage problems are a series of these types of stages that are linked together. The Inputs to subsequent stages are the Outputs of the previous stage. 259

260



Nonlinear Optimization

Stage i

Figure 11.1

Typical stage in a DP problem.

There exist three general types of DP problems. They are the initial value, the final value, and the boundary value problems. In the initial value problems, the initial state variable is prescribed. In the final value problems, we know the value of the final state variable. In the boundary value problems, we know both the initial and the final state variable values. DP is a recursive algorithm. In many cases, it is best to work, systematically, backwards to solve the problem. This makes use of the concept of suboptimization and the principle of optimality. Bellman (1957) defined his principle of optimality as follows: An optimal policy (or set of decisions) has the property that whatever the initial state and initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. This means that for every possible decision that could have been made from the first stage, all subsequent decisions are optimal with regard to the previous-stage decision. This will become clear as we do an example. Before we begin an example, we should examine the Recurrence relationships. Assume the desired objective is to maximize an n-stage objective function, f, which is given as the sum of all individual stage returns: Maximize f = Rn ( x n , sn+1 )  + Rn−1 ( xn−1 , sn ) + $ +R1 ( x1 , s 2 ) where the state and decision variables are related by si = t i  ( si+1 , xi ) i = 1,2,#,n By the definition of the principle of optimality and the recursive property, we start at the final stage with input as specified and find xi to optimize the Return. Irrespective of what happens in the other stages, xi must be found such that R is optimized for the given inputs. Let’s re-examine the definition and characteristics of a DP problem.

Dynamic Programming ◾ 261

DP is a technique that can be used to solve many optimization problems. In most applications, DP obtains solutions by working backwards from the end of the problem toward the beginning. Thus, it breaks up a long problem into a series of smaller, tractable problems.

11.1.1 Characteristics of Dynamic Programming The following characteristics are provided: 1. Problem must be divisible into stages with a decision at each stage. 2. Each stage has a number of states associated with it. By state, we mean the information that is needed at any stage to make an optimal solution. Often the optimal solution will be in a variable form (temporarily). 3. The decision chosen at any stage describes how the state at the current stage is transformed into the state at the next stage. 4. Principle of Optimality: (Bellman, 1957) An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions constitute an optimal policy with respect to the state resulting from the first decision. 5. If the states have been classified into one of T Stages, there must be a recursion that reflects the cost or reward earned from stages, t, t + 1, t + 2, …, T to the cost and reward earned from stages t + 1, t + 2, …, T. In essence this is the backward-working procedure. 6. If large, can create computational difficulties.

11.1.2 Working Backwards 1. The stage is the mechanism by which we build up the problem. 2. The state at any stage gives the information needed to make the correct decision at the current state. 3. In most cases, we must determine how the reward (cost) during the current stage depends upon the stage decision, states, and value of t. 4. We must determine how the stage t + 1 depends upon the stage t decision. 5. If we define (minimization problem) ft (i) as the minimum cost incurred in stages t, t + 1, t + 2, …, T given that the stage t state is i, then (in many cases) we may write ft(i) = MIN{(Cost during Stage t) + ft + 1(new state at stage t +1)}, where minimization is over all possible decisions at that stage. 6. We begin by determining all the fT(.)’s, then all the fT−1(.)’s, back to f1(initial). 7. We then determine the optimal Stage 1 Decision. This leads to Stage 2 where we find the optimal Stage 2 decision. We continue until Stage T is found.

262 ◾

Nonlinear Optimization

11.2 Continuous DP First, let’s apply these concepts in a continuous DP example. Example 11.1 Min : x12 + x 22 + x 32 s.t. x1 + x 2 + x 3 ≥ k x1, x 2 , x 3 ≥ 0 ( all positive or zero ) We draw a stage diagram. x 32

x 22

x12

Stage 3 x3

Stage 2 x2

Stage 1

The transitions must be found across the stages: At Stage 3: x1 + x 2 + x 3 ≥ k x1 + x 2 + x 3 ≥ k = ( almost ) S3 Now suppose, x 3 is found, then At Stage 2: x 2 + x1 ≥ S3 − X 3 = S2 At Stage 1: x1 ≥ S2 − x 2 = S1 These are the transformations from stage to stage. Now, we work backwards starting at Stage 1. f 1 ( x1 ) = Min x12 st. x1 ≥ 0 and x1 ≥ S1 f 2 ( x 2 ) Min ⎡⎣ x 22 + f 1 ( S1 ) ⎤⎦ s.t. x 2 ≥ 0 f 3 ( x 3 ) = Min ⎡⎣ x 32 + f 2 ( S2 ) ⎤⎦ s.t. x 3 ≥ 0

x1

Dynamic Programming ◾ 263

COMPUTATIONS Stage 1: f 1 ( x1 ) = Min x12 s.t. x1 ≥ 0 x1 ≥ S1 By inspection: f 1 ( x1 ) = x12 or S12 {The smallest value} x1∗ = S1 Optimal Stage 2: f 2 ( x 2 ) = Min ⎡⎣ x 22 + f 1 ( S1 ) ⎦⎤ s.t. x 2 ≥ 0 Min ⎡⎣ x 22 + f ( S2 − x 2 ) ⎤⎦   By substitution of S1 = S2 − x 2 s.t. x 2 ≥ 0 2 Min ⎡⎣ x 22 + ( S2 − x 2 ) ⎦⎤   Because f 1∗ ( S1 ) = S12

Take derivative with respect to x 2. 2 x 2 − 2 ( S2 − x 2 ) = 0 x 2∗ = S2 2 The second derivative is positive, so we found a minimum. We need to find f 2(S2).

( ) f ( x ) = (S ) 2

2 2 f x 2∗ = ⎡( S2 2 ) + ( S2 − S2 2 ) ⎤ ⎣ ⎦ ∗ 2

2 2

We will use this to find f 3(X 3). Stage 3:

( )

Min ⎡⎣ x 32 + f 2* ( S2 ) ⎦⎤ s.t x 3 ≥ 0

264



Nonlinear Optimization

By substitution, we find Min ⎡⎣ x 32 + f 2 ( S3 − x 3 ) ⎤⎦ s.t. x 3 ≥ 0 2 Min ⎡ x 32 + ( S3 − x3 ) ⎤ 2 ⎣ ⎦

(Since f ( S ) = S 2) * 2

2 2

Now, we take the derivative with respect to x 3: 2 x 3 − ( S3 − x 3 ) = 0  3x 3 = S3 x 3∗ = S3 3 We are done except going backwards using our transition relationships that we previously found: Let S3 = K so, x 3∗ = K /3 S2 = S3 − x 3∗ = K = K /3 = 2 K /3 x 2∗ = S2 2 = K /3 S1 = S2 − x 2∗ = 2 K /3 = K /3 = K /3 x1∗ = S1 = K /3 We have now solved this continuous problem: The solution to this problem is x1 = x 2 = x 3 = K /3 S3 = K , S2 = 2 K /3, S1 = K /3

11.3 Modeling and Applications of Continuous DP A missile manufacturing company has accepted a contract to supply 80 lots of Patriot Missiles at the end of the first month and 120 lots of missiles at the end of two months. The cost of manufacturing missiles is given by 50x + 0.2x2 dollars, where x is the number of lots produced in that month. If the company produced more than

Dynamic Programming ◾ 265

the required amount in the first month, then there is a carrying charge of $8 dollars per lot that is carried over to the end of the second month. Find the optimal strategy for the number of lots to be produced each month in order to minimize the total cost of manufacturing and storage. Assume that the company has enough facilities to produce 200 missile lots per month and that there is no initial on-hand inventory. Why can this problem be solved as a DP problem? There are decisions that have to be made at different times and these decisions are based upon what happened in the previous stage. We know both the initial inventory and the final inventory (both 0), so this is a boundary value problem. We need to produce missiles for periods 1 and 2 that both meet the demand and minimize the overall cost. The DP solution process includes both forward and backward recursion as we show in our examples. Let R1 = 50 x1 + 2x12 R2 = 50 x 2 + 0.2 x 22 + 8( x1 − 80 ) We know that x1 + x 2 = 200 for units to produce and we ship 80 in month 1 and 120 in month 2. There is an $8 carrying charge for items produced in month 1 and used in month 2. We assume that there is no inventory at the beginning or end of the period. Our stage diagram is shown below: S3 = 0

R2

R1

Stage 2

S1tage 1

x2

x1

Stage 1: Minimize 50x1 + 2 x12 s.t. x1 ≥ 0 S1 = 0

( no initial inventory )

We produce an optimal amount x1∗ = 80 + S2 S2 = x1 − 80 Stage 2: Minimize 50x 2 + 0.2 x 22 + 8S2 + R1 (80 + S2 )

S1 = 0

266



Nonlinear Optimization

We substitute x* = 80 + S2 to obtain Minimize 50x 2 + 0.2 x 22 + 8S2 + 50 (80 + S2 ) + 0.2 (80 + S2 )

2

We know that S2 = 120 − x 2, so we again substitute Minimize 50x 2 + 0.2 x 22 + 8(120 − x 2 ) + 50 ( 200 − x 2 ) + 2 ( 200 − x 2 )

2

This function simplifies to f 2 ( x 2 ) = 18,960 − 88 x 2 − 0.4x 22 df ( x 2 ) = −88 + 0.8x 2 = 0 dx 2 We find x 2 = 110 and the second-derivative test shows that we have found a minimum (f ″ = 0.8 > 0), Thus, x 2 = 11, S2 = 120 − x 2 = 10 and x1 = 80 + S2 = 80 + 10 = 90. The minimum cost is f(90, 110) = $14,120.

Exercises Use continuous DP to solve: 1. Min z = y12 + y 22 + y32 s.t.

y1 + y 2 + y3 = 10

2. Min z = y12 + y 22 + y32 s.t.

y1 + y 2 + y3 ≥ 15

3. A missile manufacturing company has accepted a contract to supply 90 lots of Patriot Missiles at the end of the first month and 120 lots of missiles at the end of two months. The cost of manufacturing missiles is given by 50x + 0.25x 2 dollars, where x is the number of lots produced in that month. If the company produced more than the required amount in the first month, then there is a carrying charge of $8 dollars per lot that is carried over to the end of the second month. Find the optimal strategy for the number of lots to be produced each month in order to minimize the total cost of manufacturing and storage. Assume that the company has enough facilities to produce 250 missile lots per month and that there is no initial on-hand inventory.

Dynamic Programming



267

11.4 Models of Discrete Dynamic Programming Let us illustrate discrete DP through two discrete (integer) resource allocation problems. Example 11.2 A Knapsack Problem. Consider a knapsack that can hold 8 pounds. Item 1 weighs 4 lbs, item 2 weighs 3 lbs, and item 3 weighs 2 lbs. The return (gain) for placing these items in the knapsack are 5, 4, and 2, respectively. How many of each item should we place in the knapsack to maximize the return? Max w = 5 x + 4 y + 2z s.t. 4x + 3 y + 2z ≤ 8 x , y , z ≥ 0, and all integer

THE THREE-STAGE DIAGRAM We draw a stage diagram. 2z

4y

5x

Stage 3

Stage 2

Stage 1

z

y

The Transitions: Stage 3: S3 = 8 Stage 2: S 2 = S3 − 2 z Stage 1: S1 = S2 − 3 y Solution Stage 3: Maximize R3  = 2 z

x



268

Nonlinear Optimization

S3

z

R3

8

4

8

7

3

6

6

3

6

5

2

4

4

2

4

3

1

2

2

1

2

1

0

0

0

0

0

Stage 2: Maximize R2 + f 3 ( S3 ) Y

R2

S2 − 3y

f3(S2 − 3y)

R2 + f3(S2 − 3y)

Best Return

2

8

2

2

10*

10

1

4

5

4

8

0

0

8

8

8

2

8

1

0

8*

8

1

4

4

4

8*

8

0

0

7

6

6

2

8

0

0

8*

1

4

2

2

6

0

0

6

6

6

1

4

2

2

6*

0

0

5

1

2

1

4

1

0

4*

0

0

4

1

2

1

4

0

0

4*

0

0

3

1

2

2

0

0

2

1

2*

2

1

0

0

1

0

0

0

0

0

0

0

0

0

0

S2 8

7

6

5

4

3

8

6

4

4

Dynamic Programming ◾ 269

Stage 1: S1

x

R1

*Best f2(S1 − 4x)

8

2

10

0

f1 = R1 + Best f2 10*

1

5

4

9

0

0

10

10*

The optimal objective function value is 10. This is achieved by multiple optimal solutions as evidenced by the value 10 turning up as the best result in several final columns. Let’s backtrack to find all the multiple optimums. When x = 2, nothing is based on the remaining stages, so y = z = 0. When x = 0, all 8 lbs are remaining for y and z to share. In Stage 2, with 8 lbs to use, the best solution was 10 with y = 2 and z = 1. x

y

z

w

0

2

1

10

2

0

0

10

Example 11.3 The following table gives the return functional values, Ri(Dj) for a given input Dj: Dj

R1(Dj)

R2(Dj)

R3(Dj)

0

0

2

0

1

1

4

3

2

5

4

4

3

6

5

5

A three-stage diagram is needed to solve this problem. Stage 1. f 1 ( S1 ) = Max R1 ( D1 ) 0 ≤ D1 ≤ S1 S1

D1

R1(D1)

Max = f1(S1)

0

0

0

0

1

0

0

1

1

0

0

1

1

2

5

0

0

1

1

2

5

3

6

2

3

1

5

6

270 ◾

Nonlinear Optimization

Stage 2. f 2 ( S2 ) = Max R2 ( D2 ) + f 1 ( S2 − D2 ) 0 ≤ D2 ≤ S2 A

*

B

S2

D2

R2(D2)

S2 − D2

f1(S2 − D2)

A+B

f2 = max

0

0

2

0

0

2

2

1

0

2

1

1

3

1

4

0

0

4

4

0

2

2

5

7

7

1

4

1

1

5

2

4

0

0

4

0

2

3

6

8

1

4

2

5

9

2

4

1

1

5

3

5

0

0

5

2

3

9

Stage 3. f 3 ( S3 ) = Max R3 ( D3 ) + f 2 ( S3 − D3 ) 0 ≤ D3 ≤ S3 A S3 3

B

D3

R3(D3)

S3 − D3

f2(S3 − D3)

A+B

0

0

3

9

9

1

3

2

7

10

2

4

1

4

8

3

5

0

2

7

f3 = max

10

The optimal function value is 10 = f 3(3). By backtracking, we find that S3 = 3, D3 = 1, S2 = 2, D2 = 0, S1 = 2, and D1 = 2. (Check the original return table to verify that the return of these entries gives 10.)

11.5 Modeling and Applications of Discrete DP The number of drug-related incidents in each of a city’s three high schools depends upon the number of security patrolpersons assigned to each school. The city has five security patrolpersons available for assignment to the three schools. Historical records show the following number of incidents due to suspected drugs:

Dynamic Programming

◾ 271

No. of Patrolpersons Assigned to Schools 0

1

2

3

4

School 1

14

10

7

4

1

0

School 2

25

19

16

14

12

11

School 3

20

14

11

8

6

5

5

Determine how to assign the patrolpersons to minimize the number of drugrelated incidents. We find that our optimal minimum solution is 37 events. There are alternate solutions that can achieve this result. Tie

37 Patrolman

School 1

3

2

2

1

1

School 2

1

2

1

2

1

School 3

1

1

2

2

2

School 3 Available

Result

0

20

1

14

0

20

2

11

1

14

0

20

(Continued )

272



Nonlinear Optimization

School 3 Available

Result

3

8

2

11

1

14

0

20

4

6

3

8

2

11

1

14

0

20

5

5

4

6

3

8

2

11

1

14

0

20

School 2 Used 0

Result

Passed back

25

0

Result 20

Total 45

*

0 1

19

0

20

39

* (Continued )

Dynamic Programming ◾ 273

School 2 Used

Result

Passed back

Result

Total

0

25

1

14

39

2

16

0

20

36

1

19

1

14

33

0

25

2

11

36

3

14

20

34

2

16

14

30

*

1

19

11

30

*

0

25

8

33

4

12

20

32

3

14

14

28

2

16

11

27

*

1

19

8

27

*

0

25

6

31

5

11

20

31

4

12

14

26

3

14

11

25

2

16

8

24

1

19

6

25

0

25

5

30

*

*

*



274

Nonlinear Optimization

School 1 Avail 5

Used

Result

Passed

*Result

Total

5

0

0

45

45

4

1

1

39

40

3

4

2

33

37

*

2

7

3

30

37

*

1

10

4

27

37

*

0

14

5

24

38

Example 11.4 We have $4 Million to invest in four oil wells. The amount of revenue earned at each of the four sites depends on the amount of investment in each site. This information is provided in the table below and was derived from revenue forecasting formulas. Assuming that the revenue invested in each site must be multiples of $1 million, use DP to determine the optimal investment policy to maximize revenues. Revenue ($Millions) Amount Invested ($Millions)

SITE 1

SITE 2

SITE 3

SITE 4

0

4

3

3

2

1

7

6

7

4

2

8

10

8

9

3

9

12

13

13

4

11

14

15

14

Solution Process: (I). SITE 1 S1

D1

R1

4

4

11

3

3

9

2

2

8

1

1

7

0

0

4

Dynamic Programming ◾ 275

(II) SITE 2 S2

D2

R2

S1

f *(S1)

4

4

14

0

4

18

3

12

1

7

19

2

10

2

8

18

1

6

3

9

15

0

3

4

11

14

3

12

0

4

16

2

10

1

7

17

1

6

2

8

14

0

3

3

9

12

2

10

0

4

14

1

6

1

7

13

0

3

2

8

11

1

6

0

4

10

10

0

3

1

7

10

10

0

4

0

3

7

7

f *(S2)

3

2

1

0

Sum

Best

19

17

14

(III) SITE 3 S3

D3

R3

S2

4

4

15

0

7

22

3

13

1

10

23

2

8

2

14

22

1

7

3

17

24

0

3

4

19

22

3

13

0

7

20

2

8

1

10

18

1

7

2

14

21

0

3

3

17

20

3

SUM

BEST

24

21

(Continued )

276 ◾

Nonlinear Optimization

f *(S2)

S3

D3

R3

S2

2

2

8

0

7

15

1

7

1

10

17

17

0

3

2

14

17

17

1

7

0

7

14

14

0

3

1

10

13

0

3

0

7

10

10

BEST

1

0

SUM

BEST

(IV) SITE 4 S4

D4

R4

S3

f *(S3)

SUM

4

4

14

0

10

24

3

13

1

14

27

2

9

2

17

26

1

4

3

21

25

0

2

4

24

26

Optimal solution is $27 million in revenues when we invest as follows: SITE 4: $3 million SITE 3: $1 million SITE 2: 0 SITE 1: 0

Exercises 1. Maximize

2x1  +10 x 2 + ( x3  − 4 )

2

s.t. x1 + x 2 + x3 ≤ 7 x1 , x 2 , x3  ≥ 0 & integer 2. Maximize

2x1   +10 x 2 + ( x3  − 4 )

s.t. x1 + x 2 + x3 ≤ 7 x1 , x 2 , x3  ≥ 0

2

27

Dynamic Programming ◾

277

3. We have three projects competing for budget allocation. Let Dj equal the allocation of the budget to project j (j = 1, 2, 3). Let rj(Dj) equal the return from project j for the given input Dj. In general, we desire to Max ∑r j ( D j ) s.t. ∑ D j ≤ K

(Budget )

Dj ≥ 0 The following table shows the return for each project type for the finite inputs Dj. Dj

r1(D1)

r2(D2)

r3(D3)

0

0

2

0

1

4

6

5

2

7

8

9

3

9

10

11

4

12

11

10

Our budget, K is 4 units. Find the optimal Dj (for j = 1, 2, 3) to allocate from the budget using DP. What is the projected return for these projects? Show all work. 4. Use DP to solve: Minimize x 2 + y 2 + z 2 s.t.

x + 2y + z ≥ 9

x , y , z ≥ 0 & all integer 5. Use DP to solve: Minimize x 2 + y 2 + w 2 + z 2 subject to :  x + 2 y + 2w + z ≥ 9  x , y ,w, z ≥ 0, integer

278 ◾

Nonlinear Optimization

References and Suggested Readings Bellman, R. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957. Bradley, S., A. Hax, and T. Magnanti. Applied Mathematical Programming. Reading, MA: Addison-Wesley, 1977. Hillier, F. and G. Lieberman. Introduction to Mathematical Programming. New York: McGraw-Hill, 1990. Luenberger, D. Linear and Nonlinear Programming. Reading, MA: Addison-Wesley, 1984. Phillips, D., A. Ravindran, and J. Solberg. Operations Research. New York: John-Wiley & Sons, 1976. Rao, S. Optimization Theory and Applications. New Delhi: Wiley Eastern Ltd., 1979. Winston, W. Introduction to Mathematical Programming: Applications and Algorithms. Boston, MA: PWS-Kent, 1991.

Chapter 12

Data Analysis with Regression Models, Advanced Regression Models, and Machine Learning through Optimization 12.1 Introduction and Machine Learning We have data for supply transactions on a monthly basis for two years. We would like to predict or forecast for the next year based on this data. Can we construct such a model? We will revisit this scenario later in the chapter as sine regression. In this chapter, we will briefly discuss the optimization of curve fitting building the models for some regression techniques as background information and point keys to finding adequate models. We do not try to cover all the regression topics that exist, but we do illustrate some real-world examples and the techniques used to gain insights, predict, explain, and answer scenario-related questions. We confine ourselves to simple linear regression, multiple regression, nonlinear regression (exponential and sine), binary logistic regression, and a simple Poisson regression. 279

280 ◾

Nonlinear Optimization

12.1.1 Machine Learning Machine learning stems from the modeling approach, which is an iterative approach as we move to refine models. Therefore, we might start with ordinary least squares regression and “move” models to or from time series. We might transition to the more formal machine learning methods. We have found that machine learning has many definitions. They range from any use of a computer (machine) performing calculations and analysis measure to the applications of artificial intelligence (AI) in the computer’s calculations and analysis. Here we adapt the definition from Huddleston and Brown: The defining characteristic of machine learning is the focus on using algorithmic methods to improve descriptive, predictive, and prescriptive performance in real-world contexts. An older, but perhaps more accurate, synonym for this approach from the statistical literature is algorithmic modeling. Breiman (2001) This algorithmic approach to problem solving often entails sacrificing the interpretability of the resulting models. Therefore, machine learning is best applied when this trade-off makes business sense, but is not appropriate for situations such as public policy decision-making, where the requirement to explain how one is making decisions about public resources is often essential. Huddleston et al. (2017) We concentrate in this chapter with the supervised learning aspect only and in particular those regression models that we can apply. According to Huddleston et al., Machine learning is an emerging field, with new algorithms regularly developed and fielded. Specific machine learning techniques are usually designed to address one of the three types of machine learning problems introduced here, but in practice, several methods are often combined for real-world application. As a rule, unsupervised learning methods are designed to be descriptive, supervised learning methods are designed to be predictive, and reinforcement learning methods are designed to be prescriptive. This difference in modeling focus yields the following list of guiding principles for algorithmic modeling, some of which are based on Breiman’s (2001) article: ◾ There are likely to be many models with demonstrated predictive power. ◾ Analysts should investigate and compete as many models as possible.

Data Analysis with Regression Models & ML



281

◾ Analysts should measure the performance of models on out-of-sample test datasets using a procedure that mimics the real-world situation in which the model will be applied. ◾ Predictive accuracy on out-of-sample test data, not goodness of fit on training data, is the primary criterion for how good a model is, however… ◾ Predictive power is not the only criteria upon which model selection is made; we routinely also consider model interpretability, speed, deployability, and parsimony. This chapter concentrates on predictive methods. The following steps describe a standard workflow for developing an algorithmic solution for prediction in the context of a supervised learning problem: ◾ ◾ ◾ ◾ ◾ ◾

Data Acquisition and Cleaning; Feature Engineering and Scaling; Model Fitting (Training) and Feature Selection; Model Selection; Model Performance Assessment; and Model Implementation

Technology is a key aspect in all these algorithms. Where appropriate and applicable we present examples using Maple, MATLAB, and Excel to compute our output. Each has some strengths and weaknesses in doing forecasting that are learned by building and experimenting with models using your technology of choice. How can technology assist us? Here are the areas where technology might assist us and add a bit of discussion.

12.1.1.1 Data Cleaning and Breakdown Real-world data is often messy, with missing values, varied formatting, duplicated records, etc. Analysts can expect to spend considerable time reformatting and standardizing any data acquired and will often require domain expert advice for dealing with data problems. For most supervised learning problems, data acquisition and cleaning will end with the data stored in a flat table where the rows represent observations and the columns represent the features of those observations. For supervised learning problems, at least one column must represent the response variable or class (the item that will be predicted). This response variable is often referred to as the dependent variable in statistical regression and as the target variable in AI. Data, if possible, needs to be broken into groups: training or modeling data, testing data to check the model, and validation data (when available) to determine “how well” the model has performed.

282



Nonlinear Optimization

In our examples, the data is basically clean. We only need to analyze the breaking down into categories, if necessary as train, test, and validate.

12.1.1.2 Engineering This deals with clustering and scaling issues. This is not an issue with the data examples used in this chapter, but it could be for data collected or provided in realworld analysis.

12.1.1.3 Model Fitting In general, we suggest using the following steps in any regression analysis: Step 1: Obtain the data (x, y). Then obtain a scatterplot of the data and note the trends (linear, curved, etc.). Step 2: If necessary, transform the data into “y” and “x” components. Step 3: Build or compute the regression equation. Obtain all the output. Interpret the ANOVA output for R 2, F-test, and P-values for coefficients. Step 4: Plot the regression function and the data to obtain a visual fit. Step 5: Compute the predictions, the residuals, percent relative error as described later. Step 6: Ensure the predictive results passes the common sense test. Step 7: Plot the residual versus model predictions to determine the model adequacy. In the next sections, we will discuss regression from the optimization process.

12.2 The Different Curve Fitting Criterion We will briefly describe three curve fitting criteria that could be used to fit a model to a set of data: least squares, Chebyshev’s criterion or minimizing the largest error, and minimizing the sum of the absolute error. We illustrate each method with the following available dataset for length, l in inches, and weight, w in ounces. l

12

13

13

14

15

16

17

18

w

17

16

17

23

26

27

43

49

12.2.1 Fitting Criterion 1: Least Squares The method of least-squares curve fitting, also known as ordinary least squares or simple linear regression (SLR), is simply the solution to a model that minimizes the sum of the squares of the deviations between the observations and predictions.

Data Analysis with Regression Models & ML



283

Least squares will find the parameters of the function, f(x) that will minimize the sum of squared differences between the real data and the proposed model, as shown in equation (12.1). m

Minimize S =

∑ ⎣⎡ y − f ( x )⎤⎦ 1

j

2

(12.1)

j=1

Given our data and assuming we are looking for a simple quadratic model such as y = kx 2 our model looks like: Minimize S = (17 − 144k ) + (16 − 169k ) + (17 − 169k ) 2

2

2

+ ( 23 − 256k ) ( 27 − 256 ) + ( 43 − 289k ) + ( 49 − 324k ) 2

2

2

2

We have discussed unconstrained optimization methods in one and multiple variables previously. We provide a brief review discussion in the next sections. We might expand and collect terms and then differentiate S to find the value of k that make f ’(x) = 0. It might be easier to use a numerical search method like golden section that we presented in Chapter 4. We illustrate the result of golden section (Figure 12.1). The value of the slope, using the midpoint of the final interval, is k = 0.12679 with an SSE of 193.397. The simple model is w = 0.123679 l2 .

Figure 12.1

Screenshot of golden section for least squares model.

284



Nonlinear Optimization

12.2.2 Fitting Criterion 2: Minimize the Sum of the Absolute Deviations The model is to find the value of the parameters in f(x) that n

Minimize

∑| y − f ( x ) | i

i

(12.2)

i=1

Assume that we have the following data and we want to use this criterion in equation (12.2) to fit the model we are interested in finding. This formulation would require a purely numerical method of optimization such as golden section or gradient search methods depending on the number of parameters in the model. Back to our dataset, we now want to minimize the sum of the absolute deviations using W = kL2. We formulate the model as follows: Minimize S = 17 − 144k * + 16 − k *169 + $+ | 49 − k * 324 | The method that we will eventually use to solve this problem for the value of k that minimizes this sum is a numerical search technique called Golden section search. We learned about implementing Golden section search in Chapter 4 as a single variable optimization method. In this case, we are looking for the value of k (Figure 12.2).

Figure 12.2 Screenshot of golden section for minimize the sum of absolute deviations model. The model found is W = 0.122005 L2 as shown in Figure 12.2.

Data Analysis with Regression Models & ML

◾ 285

12.2.3 Fitting Criterion 3: Chebyshev’s Criterion or Minimize the Largest Error

(

)

We define the largest error, as R = Max Wi − kl i3 Our goal is to minimize this largest error. We use the following generalized formulation: Minimize R Subject to :

(12.3)

R ± Ri ≥ 0 This formulation, equation (12.3), with our data, would look like: Minimize R Subject to : R + (17 − k *144 ) ≥ 0 R − (17 − k *169 ) ≥ 0 % R + ( 49 − k * 324 ) ≥ 0 R ≥0 k ≥0 This is a linear programming problem. The result finds the value of k that minimizes the largest R. We mentioned learning programming in Chapter 1 as a review topic for other techniques used in Chapter 10. We allow this solution to be an exercise.

Exercises Formulate each dataset to be solved by each of the criterion 1. x

1

2

3

4

5

y

4

7.8

11

13.9

19

a. y = b + ax b. y = ax 2

286



Nonlinear Optimization

2. Stretch of a spring data: x (× 10−3)

5

10

20

30

40

50

60

70

80

90

100

y (× 105)

0

19

57

94

134

173

216

256

297

343

390

a. y = ax b. y = b + ax c. y = ax 2 3. Data for the ponderosa pine:

x

17

19

20

22

23

25

28

31

32

33

36

37

39

42

y

19

25

32

51

57

71

113

140

153

187

192

205

250

260

a. b. c. d.

x+b y = ax 2 y = ax 3 y = ax 3 + bx 2 + c

4. Given Kepler’s data: Body

Period (s)

Distance from Sun (m)

Mercury

7.60 × 106

5.79 × 1010

Venus

1.94 × 107

1.08 × 1011

Earth

3.16 × 107

1.5 × 1011

Mars

5.94 × 107

2.28 × 1011

Jupiter

3.74 × 108

7.79 × 1011

Saturn

9.35 × 108

1.43 × 1012

Uranus

2.64 × 109

2.87 × 1012

Neptune

5.22 × 109

4.5 × 1012

Fit the model y = ax3/2. 5. Set up and solve the linear program for Criteria 3 in the last example.

Data Analysis with Regression Models & ML



287

12.3 Introduction to Simple Linear and Polynomial Regression In simple linear regression, we want to build a model to minimize the sum of squared error. Equation (12.4) illustrates this. n

Minimize SSE =

∑( y − f ( x )) i

2

(12.4)

i

i=1

Most of the technologies can do this for the user. Excel, SPSS, SAS, MINITAB, R, MATLAB, and JMP are among the most commonly used. We will illustrate Excel, Maple, and MATLAB. We also illustrate from the standpoint of optimization. In optimization methods, we take the function we want to optimize, in this case minimize. We take the derivative of the function with regard to each parameter. This creates normal equations that we will solve for the critical points that simultaneously make all the normal equations equal to zero. This is straight forward with polynomial regression models. Let’s start with a quadratic regression model to illustrate: y = f ( x ) = b2 x 2 + b1 x + b0 n

The model is to minimize S =

∑ ( y − (b x

2 2 i

i

+ b1 x + b0

i =1

))

2

We take the partial derivatives with respect to b2, b1, and b 0. We set them equal to zero. They are the normal equations.

∑  ( y − ( b x ∑  ( y − ( b x ∂S = 2∑  ( y − (b x ∂b ∂S =2 ∂b2 ∂S =2 ∂b1

i

2 2 i

+ b1 x + bo

))( −x ) = 0

i

2 2 i

+ b1 x + bo

))( −1) = 0

i

2 2 i

+ b1 x + bo

))( −1) = 0

0

i

These can be simplified and become mathematical forms that we can put into matrix to solve the equations. ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣

∑x ∑x ∑x ∑x ∑x ∑x ∑x ∑x n 4

3

3

2

2

2

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

⎡ b2 ⎢ ⎢ b1 ⎢ bo ⎣

⎡ ⎢ ⎤ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎢ ⎣⎢

∑ yx ∑ yx ∑y

2

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦⎥

288 ◾

Nonlinear Optimization

There is a pattern to these normal equations for polynomial equations that the reader should easily conjecture and show to be true. We can replace the symbols with numbers and solve the normal equations for the critical points. In this case, we have three unknowns {b2, b1, b 0}, so we must use a gradient search method to find the solution in equation (12.4) or we could substitute into the normal equation and solve a system of equations.

12.3.1 Excel Several approaches are available in Excel to find solutions to regression problems. We could use the Solver after initializing the parameters and then minimizing the function of the sum of squares error. We could build the system of equations for the normal equations and use matrix operations to solve the normal equations: Given XB = Y, the solution to the parameters B = X−1Y. We could call out the Data Analysis package for regression. It works for linear and polynomial regression as well as multivariable regression (Figure 12.3). We use Excel to obtain the solution using the form: B = X−1Y We find that b 0 = 145.4921, b1 = −21.6032, and b2 = 0.904762 The equation is W = f (L) = 145.4921 − 21.6032L + 0.904762L2.

Figure 12.3

Screenshot from Excel for Matrix form of least squares in Excel.

Data Analysis with Regression Models & ML



289

12.3.2 Regression in Maple In Maple, we can use optimization to minimize the expanded sum of the squared error, B = X−1Y, or the internal Maple Fit command (Figure 12.4).

Figure 12.4

Screenshot of Maple for least squares.

290



Nonlinear Optimization

12.3.3 MATLAB MATLAB has a polynomial fit command. >>x=[12,13,13,14,15,16,17,18] >>y=[17,16,17,23,26,27,43,49] >>polyfit(x,y,2)

We can obtain a plot of the data and the model through the commands (Figure 12.5): >>x1 = linspace(10,20); >>y1 = polyval(p,x1); >>figure >>plot(x,y,'o') >>hold on >>plot(x1,y1) >>hold off >> polyfit(x,y,2) ans = 0.9048 -21.6032

145.4921

Model is y = 0.9048x 2 − 21.6032x + 145.4921

Figure 12.5

Screenshot from MATLAB for data and model.

Data Analysis with Regression Models & ML



291

Exercises 1. Find the normal equations for the following polynomials: a. y = b 0 + b1x + b2x 2 + b3x 3 b. y = b 0 + b1x + b2x 2 + b3x 3 + b4x4 2. Find or modify the normal equations from 1 (a) to obtain the normal equations to y = b1 x + b2 x 2 + b3 x 3 3. Using the technology of choice, resolve exercises 1−4 from Section 12.1 using the least squares criterion.

12.4 Diagnostics in Regression In the ordinary least squares method, linear regression, we are more concerned with the mathematical modeling and the use of the model for explaining or predicting the phenomena being analyzed. We provide only a few diagnostic measures at this point to determine the adequacy of the model. These diagnostics include the following: 1. 2. 3. 4. 5.

Percent relative error with our rule of thumb. Sum of squared error. Residual plots. Coefficient of determination, R 2. Hypothesis test on the parameters with p-values.

We start with percent relative error (Giordano et al., 2013). Percent relative error is calculated by equation (12.5). %RE =

100 ( yi − f ( xi )) yi

(12.5)

We usually provide a rule of thumb when looking at the magnitude of the percent relative errors and we want most of them to be less than 20% and those near where we need to predict less than 10%. The sum of squared error, SSE, is what our model is minimizing. The result we find should be the smallest SSE for the same models. SSE for other models might be lower. A quadratic model fit will have a smaller SSE than a linear model. The coefficient of determination, R2, is the square of the correlation coefficient, ρ. The correlation coefficient is a measure of the linear relationship between variables. The value of R2 is 0 ≤ R2 ≤ 1 and the larger is better. It gives the percent of the y-data

292 ◾

Nonlinear Optimization

explained by the model. R2 is also calculated by 1 − SSE/SST, where SSE is the sum of squared error and SST is the sum of squared total. Since we mentioned correlation, we discuss it briefly. Decision makers have several misconceptions about correlation, many of which are supported by a poor definition. A good definition states that correlation is a measure of the linear relationship between variables. The key term is linear. Some definitions for correlation actually state, it is a measure of the relationship between two variables and they do not even mention linear. This is true in Excel. The following Excel definition from the help menu is found in the Excel help menu (www.office.com) that we believe helps fuel the misconception: Returns the correlation coefficient of the array1 and array2 cell ranges. Use the correlation coefficient to determine the relationship between two properties. For example, you can examine the relationship between a location’s average temperature and the use of air conditioners. It is no wonder decision makers have misconceptions. Decision makers, like the latter definition, often lose the term linear and state or think that correlation does measure the relationship between variables. As we will show that might be false thinking. We now present two rules of thumb for correlation from the literature. First, from Devore (2012), for math, science, and engineering data, we have the following: 0.8 < |ρ| ≤ 1.0: 0.5 < |ρ| ≤ 0.8: |ρ| ≤ 0.5:

Strong linear relationship Moderate linear relationship Weak linear relationship

According to Johnson (2012) for non-math, non-science, and non-engineering data, we find a more liberal interpretation of ρ: 0.5 < |ρ| ≤ 1.0: 0.5 < |ρ| ≤ 0.3: 0.1 < |ρ| ≤ 0.3: |ρ| ≤ 0.1:

Strong linear relationship Moderate linear relationship Weak linear relationship No linear relationship

Further, in our modeling efforts, we emphasize the interpretation of |ρ| ≈ 0. This can be interpreted as either no linear relationship or the existence of a nonlinear relationship. Most students fail to pick up on the importance of the nonlinear relationship aspect of the interpretation. We provide the visualization of the plot of residuals versus the model. We examine the plot for trends or patterns (Giordano et al, 2013; Affi et al., 1975), as shown in Figure 12.6. If a pattern is seen, we deem the model not adequate even though we might have to use it. If there is no visual pattern, then we may conclude the model is adequate.

Data Analysis with Regression Models & ML



293

Figure 12.6 Patterns for residuals: (a) no pattern, (b) curved patter, (c) fanning pattern, (d) outliers, and (e) a linear trend. (From Giordano et al., 2013.)

Hypothesis test on parameters is also very useful. For each coefficient in the model a hypothesis test is run to compare the value at zero versus not zero. A small p-value less than 0.05 or 0.01 is acceptable. We add one additional diagnostic, the common sense test. This diagnostic test is used to determine if the model answer can be used and it answers the question and provide realistic results.

294



Nonlinear Optimization

12.4.1 Example for the Common Sense Test 12.4.1.1 Exponential Decay Example We desire to build a mathematical model to predict the degree of recovery after discharge for orthopedic surgical patients. We have two variables: time in days in the hospital, t, and a medical prognostic index for recovery, y, where a large value of this index indicates a good prognosis. Our data is taken from Neter and Wasserman (1996, p. 469). t

2

5

7

10

14

19

26

31

34

38

45

52

53

60

65

y

54

50

45

37

35

25

20

16

18

13

8

11

8

4

6

We provide the scatterplot, Figure 12.7, showing the negative trend. We obtain the correlation coefficient, ρ, as − 0.9410528 Using either rule of thumb the correlation coefficient, ⏐ρ⏐= 0.9410528, indicating a strong linear relationship. We obtain this value (Figure 12.7) and think we will have an excellent regression model.

Figure 12.7

Scatterplot of the data with negative trend.

Data Analysis with Regression Models & ML



295

We obtain the following linear model, y = 49.4601 − 0.75251*t. The sum of squared error is 451.1945. The correlation is, as we stated, −0.94105, and R 2, the coefficient of determination, which is the correlation coefficient squared is 0.88558. These are all indicators of a “good” model. Next, we examine both the percent relative error and the residual plot. The percent relative errors are: >Percent_relative_error := [seq(100*residual[i]/Y[i], i = 1 .. 15)]; Percent_relative_error := [16.74888889, 14.60380000, 8.459777778, -5.231351351, -2.644571429, -28.65240000, -34.47800000, -44.58187500, -15.97555556, -37.42769231, -57.47625000, 33.35818182, 17.77375000, 67.23500000, 140.8650000]

Although some are small, others are quite large with 8 of the 15 over 20% in error. The last two are over 67% and over 140%. How much confidence would you have in predicting? The residual plot (Figure 12.8) clearly shows a curved pattern.

Figure 12.8

Residual plot showing a curved pattern.

296



Nonlinear Optimization

As seen in Figure 12.8, we observe a trend that indicates from Figure 12.6 that we do not have an adequate fitting model. Advanced courses in statistical regression will show how to possibly correct this inadequacy. Furthermore, assume we need to predict the index when time was 100 days. Using our regression model, we would predict the index as −29.7906. A negative value is clearly unacceptable and makes no common sense since we expect the index, expressed by the dependent variable y, is always positive. The model does not pass the common sense test. So, with a strong correlation of −0.94105 what went wrong? The residual plot diagnostic shows a curved pattern. In many regression analysis books, the suggested cure for a curved residual is adding a nonlinear term that is missing from the model.

12.4.2 Multiple Linear Regression Next, we try adding a nonlinear term. We try a parabolic model, y = b 0 + b1x + b2x 2; we get similar results to the linear model. We run the regression of y versus x and 2 2 x with an intercept. The model is y = 55.82213 − 1.71026*x + 0.014806*x . The 2 correlation is now 0.990785, and the R is 0.981654. The sum of squared error is now 72.3472. The residual plot appears to have removed the curved pattern as the cure suggests. If we use the model to predict at x = 100 days, we find y = 32.8321. The answer is now positive, but again does not pass the common sense test. The quadratic function is now curving upwards toward positive infinity (see Figure 12.9). This result is an unexpected but unacceptable outcome. We certainly cannot use this model to predict future outcomes.

Exercises For each problem solved in exercises from Section 12.2, find the appropriate diagnostics and determine the adequacy of the regression models found.

12.5 Nonlinear Regression through Optimization If we take equation (12.1) and allow f(x) to be other than a polynomial with a nonlinear function such as aebx, abx, asin(bx + c), or asin (bx + c) + dx + e, we cannot isolate the parameters due to the nonlinear expression. However, we still employ the concept that we want to minimize the sum of square error. Our expression is a little more involved as we will illustrate, but solving for the parameters involves optimization through numerical algorithms. We have found that our Newton–Raphson method from Chapter 6 works well to find the optimal parameters.

Data Analysis with Regression Models & ML

Figure 12.9



297

Quadratic model, y = 55.82213 − 1.71026*x + 0.014806*x2.

As an aside, we recall that a search algorithm requires starting points. The choice of starting points is key for finding the correct solution as well as convergence in a timely manner or within n iterations. We suggest for exponential regression performing a ln-ln transformation of the data, doing linear regression on the ln-ln transformed data, and return the results to the original space to use those coefficient as a starting point. For sine regression, plot the data and estimate the key parameters. We do not recommend using zeroes as the default. Fox’s article on starting points should be reviewed (see suggested reading for this section by Fox, 2000).

12.5.1 Exponential Regression The model form that we address first is y = aebx. The least squares fit would be as shown in equation (12.6): n

Minimize S =

∑(   y − a e ) i

i=1

bxi 2

(12.6)

298 ◾

Nonlinear Optimization

If we take the first partial derivatives and set them equal to zero, we cannot possibly obtain a closed form solution because one of the parameters, b, is in an exponent. Therefore, the normal equations followed by solving a system of equations will not work. The solutions for the parameters a and b are obtained by using numerical procedures discussed in Chapter 6: gradient search and Newton–Raphson method. To solve these exponential regression problems, we will need technology.

12.5.1.1 Newton–Raphson Algorithm This method uses Newton’s root finding procedure to find a critical point from an initial point. The algorithm uses Cramer’s rule and Newton’s method (see page 73, Mathematical Modeling by Mark Meerschaert, Academic Press, 1993). Variables: f = the function to be maximized or minimized x(n) = approximate coordinate of the root after n iterations. y(n) = approximate coordinate of the root after n iterations. N = the number of iterations allowed. F = second partial of f with respect to x G = second partial of f with respect to y. INPUTS: x(0), y(0), N PROCESS: Begin For n = 1 to N do q p1:= plot ( m sup ply, x5 = 0.25) :

(

)

> p2 := pointplot ⎡⎣months, demand ⎤⎦ , title = ' supply ' : > display

({ p1, p2}) ;

The fit is not very good as shown in Figure 12.20. The trend is not captured. We go back to the data and estimate the parameters of intercept, slope, amplitude, phase shift, from basic trigonometry. Our estimates for a–e are {16, 0.9, 6, 1.6, 1}.

Figure 12.20

Not a good fit due to choice of poor starting point.

309

310 ◾

Nonlinear Optimization

> m supply := NonlinearFit ( a + b ⋅ x5 + c ⋅ sin(x5 ⋅ d + e), months,

)

demand , x5, initialvalues = ⎡a ⎣ = 16, b = .9, c = 6, d = 1.6, e = 1⎤⎦ ; m supply := 14.1865330075917 + 0.847951259234693 x 5 −6.68918257505847 sin (1.57350123298938x5 +0.0826250652440048) > p1:= plot ( m supply, x5 = 0.25) :

(

)

> p2 := point plot ⎡⎣months, demand ⎦⎤ , title = ' supply ' : > display

({ p1, p2}) ;

This model, shown in Figure 12.21, does a better job of capturing the trend.

Figure 12.21

Sine model and original data.

Data Analysis with Regression Models & ML



311

m supply := 14.1865330075917 + 0.847951259234693x5 −6.68918257505847 sin (1.57350123298938x5 +0.0826250652440048) The sum of squared error is only 11.58. The new SSE is quite a bit smaller than with the linear model. Our nonlinear (oscillating) model is overlaid with the data in Figure 12.21, visually representing a good model. Clearly, the sine regression does a much better job in predicting the trends than just using a simple linear regression. MATLAB We provide the Matlab commands below for our example. > mdl=fitnlm(t,y,modelfun1,beta0) mdl = Nonlinear regression model: y * b1*sin(b2*x() + b3) + x()*b4 + b5

Estimated Coefficients: Estimate

SE

t-Stat

b1

6.6892

0.38351

b2

1.5735

0.010024

b3

0.08263

0.12415

b4

0.84795

0.047473

17.862

1.6131e-11

0.56742

25.002

1.2163e-13

b5

14.187

p-Value 17.442 156.98 0.66559

2.269e-11 1.5415e-25 0.51578

Number of observations: 20 Error degrees of freedom: 15 Root Mean Squared Error: 1.21 R-Squared: 0.974 Adjusted R-Squared: 0.967 F-statistic vs. constant model: 142 p-value = 9.91e-12 The model is y = 14.187 + 0.84795x + 6.6892 sin(1.5735x + 0.08263). EXCEL Excel is done the same way as we described before with exponential regression except the model is now the sine model with five parameters. We have a spreadsheet template available from the author upon request. We provide a screenshot from the template that includes the results in Figure 12.

312 ◾

Nonlinear Optimization

Excel’s model is y = 6.3164 sin(1.5735x + 0.0835805) + 0.875939302x + 13.693. We used three different technologies to estimate the solution to our supply data. The summary is listed in Table 12.1.

12.5.3 Illustrative Examples 12.5.3.1 Nonlinear Regression (Exponential Decay) Let’s try a nonlinear regression model (Neter and Wasserman, 1996, Fox 2011, 2012). We desire the model, y = aebx. Maple has a NonlinearFit command. However, if we fail to use good initial values or do not specify any initial values then we get a terrible model. First, we do not specify any initial values, > NonlinearFit ( a ⋅ exp (b ⋅ x3 ) , decaytime, popdecay, x 3 ) ; 5.67900473695457 10−29 e1−02821232007783x 3 Table 12.1 Technology Excel Maple

Summary of Sine Regression Equation y = 6.3164 sin(1.5735 x + 0.0835805) +0.875939302 x + 13.693

m supply1:= 14.1865330075917 + 0.84795125923469 x5

SSE 11.57768 11.58

+6.68918275750584 sin (1.57350123298938x 5 ) +0.0826250652440048 MATLAB

y = 14.187 + 0.84795 x + 6.6892 sin (1.5735 x + 0.08263).

14.5634

Data Analysis with Regression Models & ML



313

If we plot this function overlaid on the data points, we can see just how poor the model is.

(

)

> p1 := pointplot ⎡⎣ xpts, ypts ⎤⎦ , title = "scatter plot x vs y" :

(

)

> p2 := plot 5.67900473695457 10−29 e1−02821232007783x , x = 0.30, thickness = 4 : > display ( p1, p 2 ) ;

We suggest using a ln-ln transformation of the data and then a regression model, so we will get a good approximation of the parameters to use as initial values. y = ae bx Taking the ln of each side yields ln ( y ) = ln ( a ) + b * x So we use this to get a model,

(

(

))

> lnpopdecay := evalf ⎡⎣ seq ln ( popdecay[i]) ,i = 1.15 ⎦⎤ ;

314 ◾

Nonlinear Optimization

lnpopdecay := [3.988984047,3.912023005,3.806662490, 3.610917913,3.555348061,3.218875824,2.995732274 2.772588722,2.890371758,2.564949357,2.079441542 2.397895373,2.079441542,1.386294361,1.791759469 ] > LinearFit ([1, x4 ], decaytime, lnpopdecay, x 4 ) ; 4.03715886613379 − 0.0379741808112946x4 > c2 := exp ( 4.03715886613379 ) ; c2 := 56.66512069 Our initial values for a nonlinear fit model should be 56.665 and −0.037974. > NonlinearFit ( a ⋅ exp (b ⋅ x2 ) , decaytime, popdecay, x 2,initialvalues = [ a

)

= 56.66512069, b = 0.0379741808⎤⎦ ; 56.6065660633837e −0.0395864525511808x 2 n

We minimize the function,

∑( y − a (exp(bx ))) . We obtain the model, y = 2

i

i

i=1

58.60663e−0.03959t. Care must be taken with the selection of the initial values for the unknown parameters (Fox, 2011). We find Maple yields good models based upon “good” input parameters. As a nonlinear model, correlation has no meaning nor does R2. We computed the sum of squared error as that was our objective function. Our SSE was 45.4953. This value is substantially smaller than 451.1945 obtained in the linear model. The model appears reasonable and the residual plots showing no pattern can be seen in Figures 12.22 and 12.23. −0.0395864525511808x 2 , > predict := map ( x 2 → 58.6065660633837e

decaytime, convert = float ) ; predict := [54.14544404,48.08231684,44.4229892,39.44795522, 33.67098218,27.62452985,20.93869690,17.17863929, 15.25500052,13.02097530,9.869570867,7.480885790, 7.190529076,5.450239682,4.471515205]

Data Analysis with Regression Models & ML

Figure 12.22



Exponential model with data.

> residual := ⎣⎡ seq

(( popdecay[i] − predict[i]) , i = 1.15)⎤⎦ ;

residual := [ −0.14544404,1.9768316,0.57770108,−2.44795522, 1.32901782,−2.62452985, −0.93869690,−1.17863929 2.74499948,−0.02297530, −1.869570867,3.519114210, 0.809470924,−1.450239682,1.52844795]

(

)

> SSE := add (residual[i]) , i = 1.15 ; 2

SSE := 49.45929995 > popm := Mean ( popdecay ) ; popm := 23.3333333333333

315

316



Nonlinear Optimization

Figure 12.23

> SST := add

Residual plot.

(( popdecay[i] − popm) , i = 1.15) ; 2

SST := 3943.33333333333 ⎛ ⎛ SSE > Rsquare := ⎜ 1 − ⎜ ⎝ ⎝ SST

⎞⎞ ⎟⎠ ⎟ ; ⎠ Rsquare := 0.987457489446323

> ⎛ ⎛ 100 ⋅residual[i] ⎞ ⎞ > relative := ⎜ seq ⎜ , i = 1.15⎟ ; ⎟ ⎝ ⎝ popdecay[i ] ⎠ ⎠

Data Analysis with Regression Models & ML



317

relative := [ −0.2693408148,3.835366320,1.283780178, − 6.616095189,3.797193771, −10.49811940,−4.693484500, − 7.366495562,15.24999711, −0.16134846615,−23.36963584, 31.99194736,10.11838655,−36.25599205,25.47774658⎤⎦

(

)

> pointplot ⎡⎣ predict , residual ⎤⎦ , title = 'Residual Plot ' ; The percent relative errors are also much improved with only four being greater than 20% and none larger than 36.25599205%. relative := [ −0.2693408148,3.835366320,1.283780178, − 6.616095189,3.797193771, −10.49811940,−4.693484500, − 7.366495562,15.24999711, −0.16134846615,−23.36963584, 31.99194736,10.11838655,−36.25599205,25.47774658⎤⎦ If we use this model to predict at x = 100, we will find, by substitution, that y = 1.118. This new result passes the common sense test. This model would be our recommended model for this data. Further, we created a statistical output procedure for the nonlinear regression in Maple and we produced the following output. > STAToutput := proc (n, a1, b1, SE1, SE 2, t1, t 2, PV 1, PV 2, SSE , SST , MSE , rsq ) print ("SSE = ", SSE ); print ("SST = ", SST ); print(" Approximate R − square = ", rsq) print("Degrees of freedom and MSE are ",n − 2, MSE ) printf ("Coefficeient S tan dard Error T − Statistic P − value \ n \ n"); printf ("%12.6 f , %10. f , %10. f , %10. f \ n", a1, SE1, t1, PV 1); printf ("%12.6 f , %10. f , %10. f , %10. f \ n", b1, SE 2, t 2, PV 2); end ;

318



Nonlinear Optimization

STAToutput := proc (n, a1, b1, SE1, SE 2, t1, t 2, PV 1, PV 2, SSE , SST , MSE , rsq ) print ("SSE = ", SSE ); print ("SST = ", SST ); print(" Approximate R - square = ", rsq) * print (" Degrees of freedom and MSE are ",n + (−2), MSE ) * printf ("Coefficeient S tan dard Error T - Statistic P - value \ n \ n"); printf ("%12.6 f , %10. f , %10. f , %10. f \ n", a1, SE1, t1, PV 1); printf ("%12.6 f , %10. f , %10. f , %10. f \ n", b1, SE 2, t 2, PV 2); end proc > STAToutput := proc (n, a1, b1, SE1, SE 2, t1, t 2, PV 1, PV 2, SSE , SST , MSE , rsq) "SSE = ", 49.45929988 "SST = ", 3943.33333333333 " Approximate R-square = ", 0.987457489464074 "Degrees of Freedom and MSE are", 13,3.804561529 Coefficient

Standard Error

T-Statistic

p-Value

58.606566

1.4845

39.4788

0.0000

−0.039586

0.0017

−22.7525

0.0000

We see the coefficients are both statistically significant (p-values with(Statistics); > Fit(a*x+b, Timedata, Supplydata, x); 0.792481203007520 x + 114.578947368421 > NonlinearFit(a*sin(b*x+c)+d*x+e, Timedata, Supplydata, x); -0.624025587954723 sin(0.814667559630742 x + 4.46299801294494) + 0.805754951765897 x + 114.442958019731 > p1 := pointplot([Timedata, Supplydata], title = "scatter plot time versus supply"); > p2 := plot(-.624025587954723*sin(.81466755963074 2*x+4.46299801294494) +.805754951765897*x+114.442958019731, x = 0 .. 20, color = black, thickness = 3); > display({p1, p2});

The fit is not very good as shown in Figure 12.25. The trend is not captured. So we go back to the data and estimate the parameters of intercept, slope, amplitude, and phase shift, from trigonometry. This model, shown in Figure 12.26, does a better job of capturing the trend. The model found is supplymodel := .875949272382401*x+113.6929293406056.31644497665213* sin(1.57354961436919*x + 122.605612470617)

The sum of squared error is only 11.58. The new SSE is quite a bit smaller than with the linear model. Our nonlinear (oscillating) model is overlaid with the data in Figure 6.10, visually representing a good model. Clearly the sine regression does a much better job in predicting the trends than just using a simple linear regression or the sine model starting at the default starting point.

Data Analysis with Regression Models & ML

Figure 12.25

Not a good fit due to choice of poor starting point.

Figure 12.26

Sine model and original data.



321

322 ◾

Nonlinear Optimization

Exercises 1. Fit an exponential model, y = a*ebx using the following data a.

b.

x

1

2

3

4

5

y

1

8

25

50

71

x

1

2

3

4

y

19

10

5

0.50

5 0.1

2. Fit a sine regression model to the following dataset Time

Items

Time

Items

1

120

13

132

2

115

14

126

3

110

15

121

4

118

16

129

5

124

17

135

6

118

18

128

7

113

19

122

8

121

20

132

9

128

21

143

10

122

22

136

11

119

23

130

12

125

24

141

12.6 One-Predictor Logistic and One-Predictor Poisson Regression Models Often our dependent variable has special characteristics. Here, we examine two such special cases: (1) in logistic regression, the dependent variable is binary {0, 1}, and (2) in Poisson regression, the dependent variable measures integer counts that follow a Poisson distribution.

Data Analysis with Regression Models & ML

◾ 323

12.6.1 Logistic Regression and Poisson Regression with Technology 12.6.1.1 Logistic Regression with Technology We begin with some background on logistic regression. In data analysis, logistic regression (sometimes called the logistic model or logit model) is a type of regression analysis used for predicting the outcome of a binary dependent variable (a variable that can take only two possible outcomes, e.g. “yes” vs. “no” or “success” vs. “failure”) based on one or more predictor variables. Logistic regression attempts to model the probability of a “yes/success” outcome using a linear function of the predictors. Specifically, the log-odds of success (the logit of the probability) is fit to the predictors using linear regression. Logistic regression is one type of discrete choice model, which in general predict categorical dependent variables—either binary or multi-way. Like other forms of regression analysis, logistic regression makes use of one or more predictor variables that may be either continuous or categorical. Also, like other linear regression models, the expected value (average value) of the response variable is fit to the predictors—the expected value of a Bernoulli distribution is simply the probability of success. Unlike ordinary linear regression, however, logistic regression is used for predicting binary outcomes (Bernoulli trials) rather than continuous outcomes, and models a transformation of the expected value as a linear function of the predictors, rather than the expected value itself. For example, logistic regression might be used to predict whether a patient has a given disease (e.g., diabetes), based on observed characteristics of the patient (age, gender, body mass index, results of blood tests, etc.). Another example might be to predict whether a voter will vote Democratic or Republican, based on age, income, gender, race, state of residence, votes in previous elections, etc. Logistic regression is used extensively in numerous disciplines, such as the medical field and social science fields, marketing applications, such as prediction of a customer’s propensity to purchase a product or cease a subscription, etc. Yes, even in the government logistic regression has utility. The model for just one predictor is Yi =

B0 B0 + εi B2 X i + ε iYi = 1 + B1e 1 + B1e B2 X i

where the error terms are independent and identically distributed (iid) as normal random variables with constant variance. For more than one predictor, we use the model

Yi =

eb0 + 1 + eb0

∑b x ⋅Y = e + ∑b x . 1+ e + ∑b x + ∑b x b0

i i

i

i i

i i

b0

i i

324



Nonlinear Optimization

What is Logistic Regression? Logistic regression calculates the probability of the event occurring, such as the purchase of a product. In general, the object being predicted in a regression equation is represented by the dependent variable or output variable and is usually labeled as the Y variable in the regression equation. In the case of logistic regression, this “Y ” is binary. In other words, the output or dependent variable can only take the values of 1 or 0. The predicted event either occurs or it doesn’t occur—your prospect either will buy or won’t buy. Occasionally this type of output variable is also referred to as a Dummy Dependent Variable. Output Desired We assume we would like to obtain as much output as possible, but at a minimum we want: Estimates of the coefficients, their standard errors, t* statistics, p-values, and some analysis of fit between the full model and a not-full model that includes -2 ln likelihood and chi-squared tests. These not only give us the model, but also some essential diagnostic measures. An Example of Logistic Regression To simplify the analysis, we create a maximum ln likelihood function for the logit expression: ln L ( Bi ) = ∑Yi ( B0 + ∑ Bi X i ) − ∑ ln (1+ exp ( B0 + ∑ Bi X i ) ) I know this looks intimidating but it is not. This is an unconstrained optimization problem to maximize a function of many variables. We can build the function and optimize using the techniques previously covered. We illustrate through an example. Example 12.5 We have the following data where the response, Y, is a binomial from Bernoulli trials—like yes or no. In this case, we define success, as a 1and a failure as a 0. x

4

2

4

3

9

6

2

11

6

7

3

2

Y

1

1

0

1

0

0

0

0

1

0

1

1

The model we want is Yi =

e (b0 + b1xi ) 1 + e (b0 + b1xi )

We must obtain our function by substituting the data into the equation

(

ln L ( Bi ) = ∑Yi ( B0 + ∑ Bi X i ) − ∑ ln 1+ exp ( B0 + ∑ Bi X i )

)

Data Analysis with Regression Models & ML

◾ 325

We substitute and simplify lnL ( Bi ) = 6 * x1 + 20 * x 2 − 2 * ln (1+ exp ( x1 + 4 * x 2 )) − 3 * ln (1+ exp ( x1 + 2 * x 2 )) −2 * ln (1 + exp ( x1 + 3 * x 2 )) − ln (1+ exp ( x1 + 9 * x 2 )) − 2 * ln (1+ exp ( x1 + 6 * x 2 )) − ln (1 + exp ( x1 + 11* x 2 )) − ln (1+ exp ( x1 + 7 * x 2 ))

At this point, we have multiple methods to solve this unconstrained maximization problem, such as gradient search, Newton’s method, or the internal optimization routines residing in technology. In Excel, we will use the solver. In Maple, we might use Newton’s method, gradient search, or NLPSolve. In MATLAB, we used the interval function as we will describe later. In Excel, we proceed as follows: 1. We entered the data. 2. Create a heading and initial values for our model’s coefficients: B 0, B1, and B 2 . Usually we set them at 0. 3. Create the functions we need (I do this in two parts). Column P1 uses Yi*(B 0 + B1*number + B 2*difference) and Column P uses ln(1+ exp(B 0 + B1*number+B 2*difference)). 4. Sum columns P1 and P 2 . 5. In an unused cell take the difference of P1 − P 2 —this is the objective function. 6. Open the Solver and Maximize this cell containing P 1 − P 2, by changing cells with B 0, B1, and B 2 . Ensure to uncheck the non-negativity box. 7. Solve. 8. Obtain your model and use it as needed. 9. Repeat steps 3–8 for the model with intercept only. We have the data entered in to two columns. Next we create columns for Y *X ’B and ln(1 + exp(X’B)) using initial values for b 0 and b1. We sum these two columns separately, and in another cell, we take the difference in the sums. This is our objective function that we maximize by changing the cells for b 0 and b1. By doing so, we get the results shown. EXCEL We used the formulas and Solver to obtain the optimal parameters.

326



Nonlinear Optimization

We initialize b 0 and b1 both as 0. We call the Solver.

Data Analysis with Regression Models & ML

Figure 12.27



Plot of the logistic function in this case a 2D CDF.

Our result is the model y = 2.691061879*e(−0.582142932*x)/(1 + 2.691061879*e(−0.582142932*x)). We plot the function, which looks like an inverted S as shown in Figure 12.27. Optional Diagnostics Before we accept this model, we require a minimum of a few diagnostics. We want to (1) examine the significance of each estimated coefficient {b 0, b1} and (2) compare this full model to an intercept-only model and one-term model to measure the chi-square differences. We start with the estimates of our full model’s coefficients {b 0 = 2.691061879 and b1 = −0.508214}. We need the following (a) estimates of the standard errors of these estimates, (b) t* which equals the estimates/se, and (c) p-value for t*. We know that the estimates for the variance-covariance matrix are the inverse of the Hessian matrix evaluated at the estimates of {b 0, b1}. In a logistic equation, the number of terms in the regression model affects the Hessian matrix. To obtain all this, we will need the Hessian matrix, H(X ), so that we can find the

327

328



Nonlinear Optimization

inverse, H(X )−1, and then − H(X )−1 that is the variance-covariance matrix when evaluated at our final coefficient estimates. The main diagonal of this matrix are our Variances for each coefficient. If we take the square root of these coefficients, then we get the standard error, se, of our coefficients. ⎡⎡ ⎛ ⎢ ⎢− ⎜ ⎢⎢ ⎝ ⎣⎣ ⎛ −⎜ ⎝

n

∑ i =1 n



x (i )e ∑ ⎜⎝ 1+ e

b0 + b1 x(i )

b0 + b1 x(i )



i =1

⎡ ⎛ ⎢− ⎜ ⎢ ⎣ ⎝ ⎛ −⎜ ⎝

⎛ e b0 + b1 x(i ) (e b0 + b1 x(i ) )2 ⎞ ⎞ − ⎟, ⎝⎜ 1+ e b0 + b1 x(i ) (1+ e b0 + b1 x(i ) )2 ⎟⎠ ⎠

n



∑ ⎜⎝ 1x(+i )ee

b0 + b1 x (i )

b0 + b1 x (i )

(e b0 + b1 x(i ) )2 x(i) ⎞ ⎞ ⎤ ⎟ ⎥, (1 + e b0 + b1 x (i ) )2 ⎟⎠ ⎠ ⎥ ⎦



(e b0 + b1 x(i ) )2 x (i) ⎞ ⎞ ⎟, (1+ e b0 + b1 x(i ) )2 ⎠⎟ ⎠



⎤ x (i )2 (e b0 + b1 x (i ) )2 ⎞ ⎞ ⎤ ⎥ ⎥ b0 + b1 x (i ) 2 ⎟ ⎟ (1+ e ) ⎠ ⎠ ⎥⎥ ⎦⎦

i =1

n



2 b0 + b1 x (i )

∑ ⎜⎝ x1+(i ) ee

b0 + b1 x(i )

i =1

We would like to use pattern recognition and a simplification step to better see what is happening here and let π = exp(b 0 + b1x1 + b2x 2 + … + bnxn). n 2 ⎞ ⎛ π ⎞ ⎛ π . Then, we can more easily write the Hessian Let P = − ⎜⎝ ⎟⎠ − ⎜ 2⎟ 1 + π ( 1+ π) ⎝ ⎠ i =1



matrix for n terms and its inverse as follows: We take the square root of the entries on the main diagonal as our estimates of the se for {b 0, b1, …, bn). In our example, we compute H and H−1. We compute H using the sums of the columns in the matrix H. To obtain H−1, we use =MINVERSE command in Excel. H 2.0053

8.574

8.574

44.643

2.787753

−0.5354

H−1

−0.5354

0.1252

We take the square root of the main diagonal entries of {2.787753, 0.1252} to obtain {1.66965, 0.353836} as our standard error, se, estimates for b 0 and b1, respectively. We find that our se estimates are {1.66965, 0.353836}. We can enter and fill in the rest of our Analysis of Regression table. Analysis of Regression coefficients



Data Analysis with Regression Models & ML

Coefficient

Estimate from the Solver

se from the Square Root of the V-C Matrix

b0

2.69105

1.66965

−0.5821

b1

0.3538

Z-Statistic = Estimate/se

329

p-Value from p(Z > |Z-statistic|)

2.69105/1.66965 = 1.6117

0.107

−1.645

0.09996

We see from the results in the table that the coefficients for b1 and b2 are not significant at α = 0.05. Let’s calculate the deviances for our model. We define the deviances as dev i = ± − 2 ⎣⎡Yi ln ( π1 ) + (1− Yi ln (1− π1 )) ⎤⎦

1/2

where the sign is positive when Yi ≥ 𝜋1i and negative when Yi < 𝜋1i and we define 𝜋1 as (1 + 𝜋)−1 with 𝜋 as defined earlier. Analysis of Deviations Model

ln Likelihood

Deviance

df

Full Model

−6.05883

12.11667

2

Constant Model

−8.63177

17.26355

1

Difference

−2.57294

5.14688

1

Chi-square

p-Value

5.14668

0.023288

We find that the difference is significant at α = 0.05, so we choose the full model over the constant model. Odds-Ratios Interpretation of B parameters in the logistic model. π *π * = B0 + B1x1 +$ + Bn x n where π *π * π *π * B1 = Change in log-odds 𝜋* for every 1 unit increase in x1 holding all other x’s fixed. eBi − 1 = Percentage change in odds ratio 𝜋/1 − 𝜋) for every 1 unit increase in x1 holding all other x’s fixed. So, B1 = −0.5821, eB1 = 0.5857, eB1 – 1 = −0.4412. For each unit of x1, we estimate the odds of a fixed contract to decrease by 44.12%.

330



Nonlinear Optimization

MAPLE In Maple, although we could use our multivariable search methods, we illustrate the NLPSolve command as the most direct approach.

MATLAB In MATLAB, we illustrate the use of the glnfit command.

12.6.1.2 Simple Poisson Regression with Technology Basically, we are implementing the following steps. Step 1: Determine if the y data follows a Poisson distribution. This might involve obtaining a histogram of y and even doing a Goodness-of-Fit test. If you determine that it follows a Poisson distribution, continue. Otherwise, find a different regression model. Step 2: Determine the number of parameters in the Poisson regression model. Assume that there are constant terms and one parameter per independent variable. For example, if we have only (x, y) data pairs then the model would be y = e (b0 +b1 x ) .

Data Analysis with Regression Models & ML



331

Step 3: Using technology, fit the model to find the parameters. Step 4: If readily available or programmable, then examine the diagnostics. Example using (income, credit cards) data. Income = [ 24,27,28,29,30,31,32,33,34,35,38,39,40,41,42,45,48,49,50, 52,59,60,65,68,70,79,80,84,94,120,130] Credit cards = [ 0,0,2,0,1,1,0,0,1,1,1,0,0,0,0,1,0,0,2,0,0,2,6,3,3,0,0,0,0,6,1] Step 1: The credit cards follow a Poisson distribution as validated by both the histogram and Goodness-of-Fit test. Step 2: There are two parameters {b 0, b1}. Step 3: The model is y = e (b 0 +b1x ) .

12.6.1.2.1 MATLAB >> mdl=fitnlm(income,cards,modelfun,beta0) mdl = Nonlinear regression model: y ~ exp(b1 + b2*x())

Estimated Coefficients:

Estimate b1 b2

SE −0.91523 0.015548

Number of observations: 31 Error degrees of freedom: 29 Root Mean Squared Error: 1.53 R-Squared: 0.133 Adjusted R-Squared: 0.103 F-statistic vs. zero model: 8.87 p-value = 0.000987 Our model is y=e(-0.91523 + 0.015545 x).

t-Stat 0.6228 0.0066844

p-Value −1.4695

0.15246

2.3261

0.027211

332 ◾

Nonlinear Optimization

12.6.1.2.2 MAPLE

Data Analysis with Regression Models & ML

◾ 333

Our model is y = e(−0.91523 + 0.015545x).

12.6.1.2.3 EXCEL Enter the data, model, and error as follows:

We note that b 0 and b1 are initially set at 0. We open the Solver and enter the objective function and decision variables. Ensure we obtain both positive and negative values.

334



Nonlinear Optimization

We click on Solve.

(Continued )

Data Analysis with Regression Models & ML



335

Our model is y = e(−0.91523 + 0.015545x) and the SSE is 67.62766.

12.6.2 Logistic Regression Illustrious Examples We begin with three one-predictor logistic regression model examples in which the dependent variable is binary {0, 1}. The logistic regression model form that we will use is y=

e b0 +b1x . 1 + e b 0 +b1x

(

)

Example 12.6: Damages Versus Time We allow damage, y, as a binary variable where 1 means damage and 0 means no damage as a function of flight time in hours, x1. Our data is (Figure 12.28) y := 11 0 1 0 0 0 0 1 0 11 0 0 0 0 0 1 0 0 1 0 11 0 0 0 1 0 1 0 x := 4 2 4 3 9 6 2 11 6 7 3 2 5 3 3 8 10 5 13 7 3 4 2 3 2 5 6 6 3 4 10 > y := [1,1,0,1,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,1,0,1,0 ] ; y := [1,1,0,1,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,1,0,1,0 ] ;

336



Nonlinear Optimization

Figure 12.28

Logistic regression model for damages and time.

> x1:= [4, 2, 4,3,9,6, 2,11,6, 7,3, 2,5,3,3,8,10,5,13, 7,3, 4, 2,3, 2,5,6,6,3, 4,10]; x1:= [4, 2, 4,3,9,6, 2,11,6, 7,3, 2,5,3,3,8,10,5,13, 7,3, 4, 2,3, 2,5,6,6,3, 4,10] > model :=

exp(a + b1 ⋅ t1) ; 1 + exp(a + b1⋅t1) model :=

e a + b1 t 1 1 + e a + b1 t 1

> modelb := NonlinearFit(model , x1, y, t1, intialvalues = [a = 1.5, b1 = −.5]);

modelb :=

e 1.44319130879736−0.391897392001439t1 1+ e 1.44319130879736−0.391897392001439t1

> plot(modelb, t1 = 0.10); It is up to the user to decide over what intervals of x, we call the y probability a1 or a 0. The S curve is shown in Figure 6.14.

Data Analysis with Regression Models & ML

◾ 337

Example 12.7: Damages Versus Time Differentials > x2 := [19.2,24.1, −7.1,3.9,4.5,10.6, −3,16.2,72.8, 28.7,11.5,56.3, −.5, − 1.3,12.9,34.1,6.6, −2.5, 24.2,2.3,36.9,−11.7, 2.1,10.4,9.1, 2, 12.6,18,1.5, 27.3, −8.4]; x2 := [19.2,24.1, −7.1,3.9,4.5,10.6, −3,16.2,72.8, 28.7,11.5,56.3, − 0.5, −1.3,12.9,34.1,6.6,−2.5, 24.2,2.3,36.9,−11.7, 2.1, 10.4,9.1, 2,12.6,18,1.5, 27.3, −8.4]; > y;

[1,1,0,1,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,1,0,1,0] > model :=

exp(a + b1 ⋅ t1) ; 1 + exp(a + b1⋅t1) model :=

e a + b1 t 1 1 + e a + b1 t 1

> modelb2 := NonlinearFit(model , x1, y, t1, intialvalues = [a = .5, b1 = 1]); modelb2 :=

e −1.19966702762947 + 0.0545525473058010t1 1+ e −1.19966702762947 + 0.0545525473058010t1

> plot(modelb, t1 = 0.400); It is up to the user to decide over what intervals of x, we call the y probability a 1 or a 0. Figure 12.29 captures the S curve.

12.6.3 Poisson Regression Discussion and Examples According to Devore (2012), simple linear regression is defined as follows: “There exists parameters B 0, B1, and 𝜎2 such that for any fixed input value of x, the dependent variable is related to x through the model equation, Y = β 0 + β1X1 + ε. The quantity e in the model equation is a random variable assumed to normally distributed with mean = 0 and variance = 𝜎2.” We expand this definition to when the response variable, yi, is assumed to have a normal distribution with mean, μy, and standard deviation, 𝜎, we found that the mean could be modeled as a function of our multiple predictor variables {X1, X 2, …, Xn} using the linear function Y = β 0 + β1X1 + β2 X 2 + … + βkXk. We have used linear models for bivariate data such as y = a + bx or y = a + bx + cx 2 and used models such as y = a 0 + a1x1 + a2x 2+ … +anxn when we have n different independent predictor variables. The key assumptions that we used for least squares are the linearity of the relationship between dependent and

338



Nonlinear Optimization

Figure 12.29

Logistic regression model for damages and time differences.

independent variables, independence and normality of the errors, and homoscedasticity (constant variance) of the errors. If any of these assumptions is violated then the adequacy of the model is diminished. In our first courses, we show the use of residual plots to give the students information about the adequacy of the model depending on the patterns seen or not seen in the residual plot. Our analysis and check for these assumptions generally concern examining the residual plot, a plot of the errors versus the model values, for patterns or no patterns (Affi et al, 1979).

12.6.3.1 Normality Assumption Lost According to Neter et al. (1996) and Montgomery et al. (2006), in the case of logistic and Poisson regression, the fact that probability lies between 0 and 1 imposes a constraint. We lose both the normality assumption of multiple linear regression and the assumption of constant variance. Without these assumptions, the F and t tests have no basis for the analysis. When this happens, we must transform the model and the data. The new solution involves using the logistic transformation of the probability p or logit p, such that ⎛ p ⎞   = β0 + β1 X 1 + β 2 X 2 +$ + βn X n . ln ⎜ ⎝ 1− p ⎟⎠

Data Analysis with Regression Models & ML

◾ 339

They go on to explain that the β coefficients could now be interpreted as increasing or decreasing the log odds of an event, and exp(β) (the odds multiplier) could be used as the odds ratio for a unit increase or decrease in the explanatory variable. When the response variable is in the form of a count we face a yet different constraint. Counts are all positive integers and stand for rare events. Thus, the Poisson distribution (rather than the Normal distribution) is more appropriate since the Poisson has a mean greater than 0 and our counts are all positive counting numbers. So, the logarithm of the response variable is linked to a linear function of explanatory variables such that ln (Y ) = β0 + β1 x1 + β 2 x 2 + $ + βn xn and thus,

( )(

)(

) (

)

Y = e B0 e B1x1 e B2 x2 … e Bn xn . In other words, the typical Poisson regression model expresses the log outcome rate as a linear function of a set of predictors.

12.6.3.1.1 Assumptions in Poisson Regression There are several key assumptions in Poisson regression that are different than the assumptions in the simple linear regression model. These include that the logarithm of the dependent variable changes linearly with equal incremental increases in the exposure variable. For example, if we measure risk in exposure per unit time and one group is counts per month and another is count per years we can convert all exposures to strictly counts. We find that changes in the rate from combined effects of different exposures or risk factors are multiplicative. We find that for each level of the covariates, the number of cases has variance equal to the mean, which makes it follow a Poisson distribution. Further, we assume the observations are independent. We use diagnostic methods to identify violations of the assumption to determine whether variances are too large or too small include plots of residuals versus the mean at different levels of the predictor variable. Recall that in the case of normal linear regression, diagnostics of the model used plots of residuals against fits (fitted values). This implies that some of the same diagnostics can be used in the case of Poisson regression. We will use the residual or deviation plot, deviations versus the model to look for patterns as our main diagnostic method. In Poisson regression, we start with the basic model shown in equation (12.7), Yi = E [Yi ] + ε i for i = 1,2,…,n.

(12.7)

340



Nonlinear Optimization

The ith case means that response is denoted by ui, where ui can be one of many defined functions (Neter et al., 1996), but we elect to use only the form shown in equation (12.8),

(

)

ui = u ( X i ,B ) = exp X i′ B where ui > 0.

(12.8)

We assume that the variables, Yi, are independent Poisson random variables with expected value ui. In order to apply regression techniques, we will use the likelihood function. The likelihood function, L, is given in equation (12.9). n

L=



f i (Yi ) =

n

∏ i =1

i=1

u ( X i , B ) i exp ⎡⎣ −u ( X i ,B )⎦⎤ Yi ! Y

(12.9)

Most texts explain that maximizing this function is quite difficult, so they use the logarithm of the likelihood function shown in equation (12.10): ln ( L ) =

n

n

n

∑Y ln(u ) − ∑u − ∑ln(Y !) i

i =1

i

i

i

i =1

(12.10)

i=1

where ui is the fitted model. We maximize this function to obtain the best estimates for the coefficients of the model. Numerical search techniques are used to obtain these estimates. We mention here that “good” starting points are required to possibly obtain convergence. Within the model development, we are concerned about the deviations or residuals as we previously mentioned. In Poisson regression, the deviance is modeled as shown in equation (12.11): n ⎡ n ⎤ ⎛ Yi ⎞ Dev = 2 ⎢ Yi ln ⎜ ⎟ − (Yi − ui )⎥ ⎝ ui ⎠ ⎢⎣ (i=1) ⎥⎦ (i =1)





(12.11)

where ui is the fitted model. We note that because of term ln(Yi/ui) that if Yi = 0, we must set ln(Yi/ui) = 0. Inferences for the coefficients are carried out in the same fashion as with logistic regression. To estimate the variance-covariance matrix, we require the use the Hessian matrix. We define the Hessian, H(X), as the matrix of second partial derivatives of the ln(L) function. The variance-covariance matrix, VC(X,B), is minus the inverse of this Hessian matrix evaluated with the final estimates of the coefficients, B. VC ( X , B ) = −H ( X )

−1

The main diagonal entries of the matrix are the estimates for the variance. Since we need the estimated standard deviations, seB , we take the square root of each main

Data Analysis with Regression Models & ML



341

diagonal entry to obtain this estimate. We may then perform hypothesis test of the coefficients using the t-test. We use the logarithm of the likelihood function, equation (12.4). The Hessian is defined as the matrix of second partial derivatives. We will illustrate two Hessian modeling examples and then we will make a useful observation. Assume that our model is yi = exp(b 0 + b1*xi). Putting this model into equation (12.4), we have ln ( L ) =

n

n

n

∑Y ln (exp(b + b x )) − ∑exp(b + b x ) − ∑Y ! i

1 i

0

1 i

0

i =1

i=1

i

i=1

We define the second partial derivatives as follows:   g ij =

∂2 ( ln ( L )) for all i and j . ∂bi ∂bij

(12.12)

The estimates for the variance-covariance matrix are defined and are displayed in equation (12.13): s 2 ( b ) = [(− g ij )B =b ]−1

(12.13)

We take these partial derivatives and set up the Hessian matrix, gij as shown in the matrix below: ⎡ ⎛ n b +b ⎞ ⎢ − 0 1 xi ⎜ e ⎟ ⎢ ⎠ ⎝ i =1 ⎢ g ij = ⎢ ⎛ n ⎞ ⎢ − xi e b0 +b1xi ⎟ ⎜ ⎢ ⎝ ⎠ i=1 ⎢⎣





⎤ ⎥ ⎥ ⎥ n ⎞ ⎥ xi2e b0 +b1xi ⎟ ⎥ ⎠ ⎥⎥ i=1 ⎦

⎛ −⎜ ⎝

⎞ xi e b0 +b1xi ⎟ ⎠ i=1



⎛ −⎜ ⎝



n

When our model slightly differs, such as yi = exp(b 0 + b1i + b2x 2i), then we find the Hessian matrix, gij. We note the similarities between the last two Hessian matrices. ⎡ ⎛ ⎢− ⎜ ⎢ ⎝ ⎢ ⎢ ⎛ g ij = ⎢⎢ − ⎜ ⎢ ⎝ ⎢ ⎢ ⎛ ⎢− ⎜ ⎢⎣ ⎝

⎞ e b0 +b1x1i +b2 x2i ⎟ ⎠ i =1 n

∑ n

∑x e 1i

b0 + b1 x1i + b2 x 2 i

i=1 n

∑x e 2i

i=1

b0 + b1 x1i + b2 x 2 i

⎤ ⎥ ⎥ ⎥ n n ⎞ ⎛ ⎞⎥ b0 + b1 x1i + b2 x 2 i ⎥ 2 b0 + b1 x1i + b2 x 2 i x1i x 2i e x1i e −⎜ ⎟ ⎟⎥ ⎠ ⎝ i=1 ⎠⎥ i=1 ⎥ n n ⎞ ⎛ ⎞ ⎥ x1i x 2i e b0 +b1x1i +b2 x2i ⎟ − ⎜ x 2i2 e b0 +b1x1i +b2 x2i ⎟ ⎥ ⎠ ⎝ i=1 ⎠ ⎥⎦ i=1

⎛ −⎜ ⎝

⎞ x1i e b0 +b1x1i +b2 x2i ⎟ ⎠ i =1



⎞ ⎛ ⎟ −⎜ ⎠ ⎝





⎞ ⎛ ⎟ −⎜ ⎠ ⎝





n

⎛ −⎜ ⎝

⎞ x 2i e b0 +b1x1i +b2 x2i ⎟ ⎠ i =1 n



342 ◾

Nonlinear Optimization

We see the pattern in the matrix of partial derivatives and we can extend the pattern to easily obtain the Hessian for a model when we have n independent variables, yi = exp(b 0 +b1x1i + b2x 2i + … + bnxni), and we identify the common term in the matrix as the summation, exp(b 0 + b1x1i + b2x 2i + … + bnxni). We call this summation P. This gives us a generic Hessian matrix for Poisson regression to use with our choice of the model from yi = exp(b 0 + b1x1i + b2x 2i + … + bnxni) depending on the number of independent variables. ⎡P ⎢ ⎢ ⎢ x1i P ⎢ ⎢ g ij = − ⎢ x 2i P ⎢ ⎢ ⎢ ⎢ ⎢ x ni P ⎣

∑ ∑



∑x P ∑x P ∑x P ∑x P ∑x P ∑x x P ∑x x P ∑x P ∑x x P 1i

2i

3i



2 1i

2 2i

1i 3i



1i 2i

2 2i

2i 3i





∑x x

1i ni

 P

∑x

 x P

2i 3i

∑x

∑ x P ⎤⎥ ⎥ ⎥ x x P ∑ ⎥ ⎥ ∑ x x P ⎥⎥ ni

1i ni

2i ni

 x P …

3i ni



∑x

2 ni

P

⎥ ⎥ ⎥ ⎥ ⎦

This is the generic Hessian matrix, so we need to replace the formulas with numerical values and compute the inverse of the negative of this matrix. Once we replace the variables with their respective values we should have a non-singular square matrix that we can take the inverse. The main diagonal entries of this matrix inverse are the estimates for the variances of the coefficients to the estimates of b. The square root of the entries of the main diagonal are the estimates of the se of the coefficients of b to be used in the hypothesis testing for each coefficient, b, as t * = bi / se (bi ) We now have all the equations that we need to build the tables of outputs for Poisson regression that are similar to Excel’s prepackaged regression outputs.

12.6.3.2 Estimates of Regression Coefficients We use one for the constant plus one for every predictor variable in the model being examined for the number of coefficients. Estimates are the final values (that converged) for the numerical search method to maximize the ln(L) equation. The values of se are the square roots of the main diagonal of the inverse of (−) the hessian matrix. The values of t* = (final coefficient estimate)/se and the p-value are displayed, where the p-value is the probability associated with the |t*| from P(T> |t*|). In our summary of Poisson regression analysis, let m = number of variables in the model, let k = number of data elements of the dependent variable, Y. We present the statistical formulas.

Data Analysis with Regression Models & ML

Degrees of Freedom (df)

Deviance

Mean Deviance, MDev

Regression

M

Dreg = Dt − Dres

MDev(reg) = Dreg/m

Residual

k−1−m

Dres = Result from equation (12.9) using the full model with m predictors

MDev(res) = Dres/ (k − 1 − m)

Total

k−1

Dt = Result from equation (12.9) using only y = exp(b0) as the best model

MDev(t) = Dt/ (k − 1)

◾ 343

Ratio |MDev(reg)|

12.6.4 Illustrative Poisson Regression Examples The first example will be explained in more detail than the second example for illustrative purposes to show how we used the equations and Maple to perform Poisson regression. We note that a prerequisite for using Poisson regression is that data for the dependent variable, Y, must be discrete counts data with large numbers a rare event. We have chosen two datasets that have published solutions in the literature to be our examples.

12.6.4.1 Maple Step 0: Enter the data for “Y ” and “X”. Step 1: Take the “Y ” data and (a) obtain a histogram and (b) perform a chisquared goodness-of-fit test for a Poisson distribution. If the distribution of “Y” follows a Poisson distribution then we continue. If you have count data, you might want to use Poisson regression regardless. Step 2: Assume that we need a constant model, y = exp(b 0). Compute the value of b 0 that minimizes 2* Deviations as shown in equation (12.9). Step 3: Assume that the model is now, y= exp(b 0 + b1x). Compute the values of b 0 and b1 that minimize equation (12.9). Step 4: Interpret the output and the odds ratio. Maple does not have internal regression command for this. We caution that using the NonlinearFit command with just y = exp(b 0 + b1x) does not give the correct Poisson regression results. Data Source: http://www.oxfordjournals.org/our_journals/tropej/online/ma_ chap13.pdf.

344



Nonlinear Optimization

Example 12.8 Type of surgeries versus total surgeries The data we use have 20 elements for “Y ” and “X.” Step 0: Enter the data. xhc := [seq(hosp1[i, 2], i = 1 .. 20)]; xhc := [3246, 2750, 2507, 2371, 1904, 1501, 1272, 1080, 1027, 970, 739, 679, 502, 236, 357, 309, 192, 138, 100, 95] yhc := [seq(hosp1[i, 1], i = 1 .. 20)]; yhc := [26, 24, 21, 21, 21, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 14, 14, 13, 13]

Step 1: Histogram and chi-square goodness-of-fit test, if appropriate, > Histogram(yhc, binbounds = [13, 15, 17, 19, 21, 23, 25, 27]);

The histogram (Figure 12.30) and the goodness-of-fit test confirm the data follows a Poisson distribution. > with(Statistics): > infolevel[Statistics] := 1: Specify the matrices of categorized data values. > Ob := Array([14,7,3,4]): > Ex := Array([11.4,11.4,5.7,2.5]):

Figure 12.30

Histogram of data.

Data Analysis with Regression Models & ML

Perform the goodness-of-fit test upon this sample. > ChiSquareGoodnessOfFitTest(Ob, Ex, level=.05); Chi-Square Test for Goodness-of-Fit -----------------------------------

Null Hypothesis: Observed sample does not differ from expected sample Alt. Hypothesis: Observed sample differs from expected sample Categories: Distribution: Computed statistic: Computed p-value: Critical value: Result: [Accepted]

4 ChiSquare(3) 4.47018 0.214966 7.81472828803626

There is no statistical evidence against the null hypothesis hypothesis = true, criticalvalue = HFloat (7.814728288036256), distribution = ChiSquare(3), pvalue = HFloat(0.21496642484520567), statistic = 4.470175438

Step 2: The constant model. FINDING THE BEST CONSTANT MODEL ⎡ ⎛ ⎛ yhc[i ] ⎞ ⎞⎤ , i = 1.20⎟ ⎥ ; > ycc := ⎢seq ⎜ ln ⎜ ⎟ exp(x) ⎠ ⎠ ⎥⎦ ⎢⎣ ⎝ ⎝ ⎡ ⎛ 26 ⎞ ⎛ 24 ⎞ ⎛ 21 ⎞ ⎛ 21 ⎞ ⎛ 21 ⎞ ⎛ 20 ⎞ ycc := ⎢ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎣ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 19 ⎞ ⎛ 18 ⎞ ⎛ 18 ⎞ ⎛ 17 ⎞ ⎛ 17 ⎞ ⎛ 16 ⎞ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 16 ⎞ ⎛ 16 ⎞ ⎛ 16 ⎞ ⎛ 15 ⎞ ⎛ 14 ⎞ ⎛ 14 ⎞ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 13 ⎞ ⎛ 13 ⎞ ⎤ ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎥ ⎝ e ⎠ ⎝ e ⎠⎦ > ycc1:= Vector[column]( ycc); ⎡ 1.20Vectorcolumn ⎢ Data Type : anything ⎢ ycc1:= ⎢ Storage : rec tan gular ⎢ ⎢⎣ Order : Fortran _ order

⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦



345

346 ◾

Nonlinear Optimization

> part1:= evalf (( yhc1. ycc1)) ; ⎛ 26. ⎞ ⎛ 24. ⎞ ⎛ 21. ⎞ ⎛ 20. ⎞ part1:= 26. ln ⎜ x ⎟ + 24. ln ⎜ x ⎟ + 63. ln ⎜ x ⎟ + 20. ln ⎜ x ⎟ ⎝ e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎛ 19. ⎞ ⎛ 18. ⎞ ⎛ 17. ⎞ ⎛ 16. ⎞ + 19. ln ⎜ x ⎟ + 36. ln ⎜ x ⎟ + 34. ln ⎜ x ⎟ + 64. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 15. ⎞ ⎛ 14. ⎞ ⎛ 13. ⎞ + 15. ln ⎜ x ⎟ + 28. ln ⎜ x ⎟ + 26. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ > part 2 := add (( yhc[n] − exp(x), n = 1.20)) ; part 2 := 355 − 20e x > s := 2 ⋅( part1− part 2);

⎛ 26. ⎞ ⎛ 24. ⎞ ⎛ 21. ⎞ ⎛ 20. ⎞ s := 52. ln ⎜ x ⎟ + 48. ln ⎜ x ⎟ + 126. ln ⎜ x ⎟ + 40. ln ⎜ x ⎟ ⎝ e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎛ 19. ⎞ ⎛ 18. ⎞ ⎛ 17. ⎞ ⎛ 16. ⎞ + 38. ln ⎜ x ⎟ + 72. ln ⎜ x ⎟ + 68. ln ⎜ x ⎟ + 128. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 15. ⎞ ⎛ 14. ⎞ ⎛ 13. ⎞ + 30. ln ⎜ x ⎟ + 56. ln ⎜ x ⎟ + 52. ln ⎜ x ⎟ − 710 + 40e x ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ > Minimize ( s, initialpoint = { x = 0}) ; \ ⎛ ⎛ 26. ⎞ ⎛ 24. ⎞ ⎛ 21. ⎞ Minimize := ⎜ 52. ln ⎜ x ⎟ + 48. ln ⎜ x ⎟ + 126. ln ⎜ x ⎟ ⎝ e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎝ ⎛ 20. ⎞ ⎛ 19. ⎞ ⎛ 18. ⎞ + 40. ln ⎜ x ⎟ + 38. ln ⎜ x ⎟ + 72. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎛ 17. ⎞ ⎛ 16. ⎞ ⎛ 15. ⎞ + 68. ln ⎜ x ⎟ + 128. ln ⎜ x ⎟ + 30. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎞ ⎛ 14. ⎞ ⎛ 13. ⎞ +56. ln ⎜ x ⎟ + 52. ln ⎜ x ⎟ − 710 + 40e x , initialpoint = { x = 0}⎟ ⎝e ⎠ ⎝e ⎠ ⎠

Data Analysis with Regression Models & ML

⎡ ⎛ ⎛ yhc[i ] ⎞ ⎞⎤ > ycc := ⎢seq ⎜ ln ⎜ , i = 1.20⎟ ⎥ ; ⎟ exp(x) ⎠ ⎠ ⎥⎦ ⎢⎣ ⎝ ⎝ ⎡ ⎛ 8 ⎞ ⎛ 16 ⎞ ⎛ 15 ⎞ ⎛ 23 ⎞ ⎛ 5 ⎞ ⎛ 13 ⎞ ycc := ⎢ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎣ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 4 ⎞ ⎛ 10 ⎞ ⎛ 33 ⎞ ⎛ 19 ⎞ ⎛ 10 ⎞ ⎛ 16 ⎞ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 22 ⎞ ⎛ 2 ⎞ ⎛ 22 ⎞ ⎛ 2 ⎞ ⎛ 18 ⎞ ⎛ 21 ⎞ ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎝ e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 24 ⎞ ⎛ 9 ⎞ ⎤ ln ⎜ x ⎟ , ln ⎜ x ⎟ ⎥ ⎝ e ⎠ ⎝ e ⎠⎦ > ycc1:= Vector[column]( ycc); ⎡ 1.20Vectorcolumn ⎢ ⎢ Data Type : anything ycc1:= ⎢ Storage : rec tan gular ⎢ ⎢⎣ Order : Fortran _ order

⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦

> part1:= evalf (( yhc1. ycc1)) ; ⎛ 8. ⎞ ⎛ 16. ⎞ ⎛ 15. ⎞ ⎛ 23. ⎞ part1:= 8. ln ⎜ x ⎟ + 32. ln ⎜ x ⎟ + 15. ln ⎜ x ⎟ + 23. ln ⎜ x ⎟ ⎝ e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎝e ⎠ ⎛ 5. ⎞ ⎛ 13. ⎞ ⎛ 4. ⎞ ⎛ 19. ⎞ + 5. ln ⎜ x ⎟ + 13. ln ⎜ x ⎟ + 4. ln ⎜ x ⎟ + 38. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 33. ⎞ ⎛ 10. ⎞ ⎛ 22. ⎞ ⎛ 2. ⎞ + 33. ln ⎜ x ⎟ + 10. ln ⎜ x ⎟ + 44. ln ⎜ x ⎟ + 4. ln ⎜ x ⎟ ⎝e ⎠ ⎝ e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎛ 18. ⎞ ⎛ 21. ⎞ ⎛ 24. ⎞ ⎛ 9. ⎞ + 18. ln ⎜ x ⎟ + 21. ln ⎜ x ⎟ + 24. ln ⎜ x ⎟ + 9. ln ⎜ x ⎟ ⎝e ⎠ ⎝ e ⎠ ⎝ e ⎠ ⎝e ⎠ > part 2 := add (( yhc[n] − exp(x), n = 1.20)) ; part 2 := 301 − 20e x

◾ 347

348



Nonlinear Optimization

> s := 2 ⋅( part1− part 2); ⎛ 8. ⎞ ⎛ 16. ⎞ ⎛ 15. ⎞ ⎛ 23. ⎞ s := 16. ln ⎜ x ⎟ + 64. ln ⎜ x ⎟ + 30. ln ⎜ x ⎟ + 46. ln ⎜ x ⎟ ⎝e ⎠ ⎝ e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎛ 5. ⎞ ⎛ 13. ⎞ ⎛ 4. ⎞ ⎛ 19. ⎞ + 10. ln ⎜ x ⎟ + 26. ln ⎜ x ⎟ + 8. ln ⎜ x ⎟ + 76. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 33. ⎞ ⎛ 10. ⎞ ⎛ 22. ⎞ ⎛ 2. ⎞ + 66. ln ⎜ x ⎟ + 20. ln ⎜ x ⎟ + 88. ln ⎜ x ⎟ + 8. ln ⎜ x ⎟ ⎝ e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎛ 18. ⎞ ⎛ 21. ⎞ ⎛ 24. ⎞ ⎛ 9. ⎞ + 36. ln ⎜ x ⎟ + 42. ln ⎜ x ⎟ + 48. ln ⎜ x ⎟ + 18. ln ⎜ x ⎟ ⎝e ⎠ ⎝e ⎠ ⎝ e ⎠ ⎝ e ⎠ − 602 + 40e x > Minimize ( s, initialpoint = { x = 1}) ; ⎡⎣13.1244003238018649, [ x = 2.87638551592036 ]⎤⎦ We found the constant model: y= e(2.87638551591097) with the Dev = 13.1244. Step 3: Finding the minimization of equation (12.9) with a fuller model, in this case y = exp(a + b * x) is substantially more difficult. We had to use our NewtonRaphson algorithm and program from Chapter 5 to find the minimum. We built the equation, f = 2 * Dev, in Maple and then called Newton’s Multivariate program to solve. We find the final model is: > snew := 2(newpart1 − part 2new) 21 ⎞ 21 ⎞ ⎛ 26 ⎞ ⎛ 24 ⎞ ⎛ ⎛ snew := −710 + 52 ln ⎜ x 1+ 3264 x 2 ⎟ + 48 ln ⎜ x1+ 2750x 2 ⎟ + 42 ln ⎜ x1+2507 x 2 ⎟ + 42 ln ⎜ x1+ 2371x 2 ⎟ ⎠ ⎝e ⎠ ⎝e ⎝e ⎠ ⎝e ⎠ 21 ⎞ ⎛ ⎛ 20 ⎞ ⎛ 19 ⎞ ⎛ 18 ⎞ + 42 ln ⎜ x 1+1904 x 2 ⎟ + 40 ln ⎜ x1+1501x 2 ⎟ + 38 ln ⎜ x 1+1272 x 2 ⎟ + 36 ln ⎜ x 1+1080 x 2 ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 18 ⎞ ⎛ 17 ⎞ ⎛ 17 ⎞ ⎛ 16 ⎞ + 36 ln ⎜ x 1+1027 x 2 ⎟ + 34 ln ⎜ x 1+ 970 x 2 ⎟ + 34 ln ⎜ x1+739x 2 ⎟ + 32 ln ⎜ x 1+679 x 2 ⎟ ⎝e ⎠ ⎝e ⎠ ⎠ ⎝e ⎠ ⎝e ⎛ 16 ⎞ ⎛ 16 ⎞ ⎛ 16 ⎞ ⎛ 15 ⎞ + 32 ln ⎜ x1+ 502x 2 ⎟ + 32 ln ⎜ x 1+ 236 x 2 ⎟ + 33 ln ⎜ x 1+357 x 2 ⎟ + 30 ln ⎜ x1+309x 2 ⎟ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎝e ⎠ ⎛ 14 ⎞ ⎛ 14 ⎞ ⎛ 13 ⎞ ⎛ 13 ⎞ + 28 ln ⎜ x 1+192 x 2 ⎟ + 28 ln ⎜ x 1+138x 2 ⎟ + 26 ln ⎜ x 1+100 x 2 ⎟ + 26 ln ⎜ x1+95x 2 ⎟ ⎠ ⎝e ⎠ ⎝e ⎠ ⎠ ⎝e ⎝e + 2e x1+3246.x 2 + 2e x1+2750.x 2 + 2e x1+2507 x 2 + 2e x 1+ 2371x 2 + 2e x 1+1904 x 2 + 2e x1+1501x 2 + 2e x 1+1272 x 2 + 2e x 1+1080 x 2 + 2e x1+1027 x 2 + 2e x1+970x 2 + 2e x1+739x 2 + 2e x1+ 679x 2 + 2e x 1+ 502 x 2 + 2e x 1+236 x 2 + 2e x 1+357 x 2 + 2e x1+309x 2 + 2e x1+192x 2 + 2e x1+138x 2 + 2e x1+100x 2 + 2e x1+ 95x 2

Data Analysis with Regression Models & ML



final new x = 2.65536; final new y = 0.00019 final f value is 0.93807 and interpret as y = e(2.65536+0.00019x). Step 4: Statistics, Interpretation, and Odds Ratio > printf (" coeffiecient SE T − Statistic P − Value ODDS Ratio") coeffiecient SE T-Statistic P-Value ODDS Ratio > pr int ( con tan tm, SE1, t1, PV 1) ; 2.65536,0.06033586751, 44.00964317,0.000515903083232017 > pr int ( coeffecientm, SE 2, t 2, PV 2,OddsRatio ) ; 0.00019,0.00003687229991,5.152919685,0.0356588810716962,1.000190018 Clearly, we see a good fit in Figure 12.31.

Figure 12.31

Data and model.

349

350



Nonlinear Optimization

Example 12.9: Violence in the Philippines Since the number of violence acts, SIGACTS, are numerical counts, we should use Poisson regression to model the data. We examine the histogram in Figure 12.32 noticing it appears to follow a Poisson distribution. A goodness-of-fit test for the Poisson distribution confirms that it follows a Poisson distribution ( χ2 = 933.11, p = 0.000). Therefore, we may use Poisson regression (Figure 12.33). > Histogram ( yhc, binbounds = [ 9, 12, 15, 21, 24 ]) ; We find the data does follow a Poisson distribution. > with(Statistics): > infolevel[Statistics] := 1:

Specify the matrices of categorized data values. > Ob := Array([7,0,1,3,1,8]): > Ex := Array([2.3,2.9,3.9,3.9,3.1,3.6]):

Perform the goodness-of-fit test upon this sample. > ChiSquareGoodnessOfFitTest(Ob, Ex, level=.05); Chi-Square Test for Goodness-of-Fit -----------------------------------

Figure 12.32

Histogram of violence data.

Data Analysis with Regression Models & ML

Figure 12.33

Histogram of SigActs of violence.

Null Hypothesis: Observed sample does not differ from expected sample. Alt. Hypothesis: Observed sample differs from expected sample. Categories: Distribution: Computed statistic: Computed p-value: Critical value: Result: [Rejected]

6 ChiSquare(5) 21.6688 0.000605193 11.0704974062099

There exists statistical evidence against the null hypothesis. hypothesis = false, criticalvalue = 11.0704974062099, distribution = Chisquare(5), pvalue = 0.000605192984603264, statistic = 21.66880882



351

352 ◾

Nonlinear Optimization

> violencedata := ⎡⎣ seq ( violence[i, 2], i = 1.80 ) ⎤⎦ ; violencedata = [122, 44, 2, 42,31, 28,64,10,1,12,57,18, 4, 4, 29, 23 52,5,35,8,33, 26, 4,0, 2,5, 27,8, 26, 2,11,0,8,0,125,0,14, 7,0 8,10,0,0,13,1,0,6,35, 7,1,5,3, 40,30,9,0,11,0,3, 23,6,3,10, 64,0,0,15, 2,0,3,126,0,0,0, 4,0,0, 7,0,0 ] > Histogram ( violencedata, binbounds = [ 20, 40,60,80,100,120,140 ]) ; A Poisson regression was run and yields the following outputs. Model is y = exp(9.5599 − 0.06398x). A Poisson regression yields the following model: > p mod el = exp(b0 + b1⋅ x); p mod el = e b 0 + b1 x > NonlinearFit( pmodel , literacyratedata, violencedata, x); y = e8.58852448202650−0.0526327458474074x where y = Counts of violent activities x = Literacy level

(

)

> p1:= plot e 9.5599−0.06398x , x = 75.190, title = "Violence versus Literacy " :

( > display ({ p1, p2}) ;

)

> p2 := po int plot ⎡literacyratedata, violencedata ⎤⎦ : ⎣

Data Analysis with Regression Models & ML  ◾  353

Figure 12.34  Poisson regression model. We obtain plots to examine a visual fit and see a pretty good fit.

(

)

> p1:= po int plot  xhc , yhc  , title = " Plots " : > p 2 := plot ( exp( xx 2 ⋅ x + xx1), x = 0.180, thickness = 3, color = Blue ) : > display ( p1, p 2 ) ; We see in Figure 12.34 a pretty good fit of the data with our model. We summarize our statistics:

> STAToutput ( nobs , xx1, xx 2, SE 1, SE 2, t 1, t 2, PV 1, PV 2, DF 1, DF 2, RD ,Re sD , SST ) ;

"Degrees of Freedon and Regression Deviations = ",1, 2065.9443 "Degrees of Freedon and Residuals Deviations = ", 78,309.389073



" Total = ", 2375.33337348472469



" Mean Deviation = ",3.966526577

354



Nonlinear Optimization

Coefficient

Standard Error

T-Statistic

P-Value

5.599020

0.1653

33.8755

0.0004

−0.023550

0.0014

−16.4345

0.0018

Odds Ratio

0.9767

Again, we accept that the fit looks pretty good. We interpret the odds ratio for the coefficients to help explain our results, 0.976. This means for every 1 unit increase in literacy that Violence goes down slightly, 2.4%. This suggests improving literacy within the country will decrease violence. Interpretation: Based on the findings of this research, literacy affects violent conflict in the Philippines. Thus, considering literacy, the findings correspond to the claim that conflict and aggression are influenced by literacy.

Exercises For the following data, (a) plot the data and (b) state the type of regression that should be used to model the data. 1. Tire tread Number

Hours

Tread (cm)

1

2

5.4

2

5

5.0

3

7

4.5

4

10

3.7

5

14

3.5

6

19

2.5

7

26

2.0

8

31

1.6

9

34

1.8

10

38

1.3

11

45

0.8

12

52

1.1

13

53

0.8

14

60

0.4

15

65

0.6

Data Analysis with Regression Models & ML



355

xb for the data yc below. If we use our ln-ln transformation, we obtain ln Z = ln a + b ln x − c ln y. Use regression techniques to estimate the parameters a, b, and c.

2. Let’s assume our suspected nonlinear model form is: Z = a

ROW

x

y

z

1

101

15

0.788

2

73

3

304.149

3

122

5

98.245

4

56

20

0.051

5

107

20

0.270

6

77

5

30.485

7

140

15

1.653

8

66

16

0.192

9

109

5

159.918

10

103

14

1.109

11

93

3

699.447

12

98

4

281.184

13

76

14

0.476

14

83

5

54.468

15

113

12

2.810

16

167

6

144.923

17

82

5

79.733

18

85

6

21.821

19

103

20

0.223

20

86

11

1.899

21

67

8

5.180

22

104

13

1.334

23

114

5

110.378

24

118

21

0.274

25

94

5

81.304

356 ◾

Nonlinear Optimization

3. Using the basic linear model, yi = β 0 + β1xi fit the following datasets. Provide the model, the analysis of variance information, the value of R2, and a residual x

y

x

y

100

150

250

400

125

140

250

430

125

180

300

440

150

210

300

390

150

190

350

600

200

320

400

610

200

280

400

670

x

y

x

y

110

198

362

102

115

173

363

95

120

174

500

122

230

149

505

112

235

124

510

98

240

115

515

96

360

130

plot. a.

Admit GRE Topnotch GPA 1

380

0

3.61

0

660

1

3.67

0

800

1

4

(Continued )

Data Analysis with Regression Models & ML



357

Admit GRE Topnotch GPA 0

640

0

3.19

0

520

0

2.93

0

760

0

3

0

560

0

2.98

1

400

0

3.08

0

540

0

3.39

1

700

1

3.92

b. The following data represents changes in growth where x = body weight and y = normalized metabolic rate for 13 animals.

Age

NonSmokers

Smokes Smokes Smokes 1–9 per 10–14 per 15–19 day day per day

Smokes 20–24 per day

Smokes 25–34 per day

Smokes >35 per day

15–20

1 (10366)

0 (3121)

0 (3577)

0 (4319)

0 (5683)

0 (3042)

0 (670)

20–25

0 (8162)

0 (2397)

1 (3286)

0 (4214)

1 (6385)

1 (4050)

0 (1166)

25–30

0 (5969)

0 (2288)

1 (2546)

0 (3185)

1 (5483)

4 (4290)

0 (1482)

30–35

0 (4496)

0 (2015)

2 (2219)

4 (2560)

6 (4687)

9 (4268)

4 (1580)

35–40

0 (3152)

1 (1648)

0 (1826)

0 (1893)

5 (3646)

9 (3529)

6 (1136)

40–45

0 (2201)

2 (1310)

1 (1386)

2 (1334) 12 (2411)

11 (2424)

10 (924)

45–50

0 (1421)

0 (927)

2 (988)

2 (849)

9 (1567)

10 (1409)

7 (556)

50–55

0 (1121)

3 (710)

4 (684)

2 (470)

7 (857)

5 (663)

4 (255)

>55

2 (826)

0 (606)

3 (449)

5 (280)

7 (416)

3 (284)

1 (104)

4. Ten observations of college acceptances to graduate school. 5. Dataset for lung cancer from E.L. Frome (1983), Biometrics 39, 665–674. The numbers of person years are in parenthesis broken down by age and daily cigarette consumption. 6. Modeling absences from class where: Gender: 1-female 2-male Ethnicity: six categories

358



Nonlinear Optimization

Ethnicity

School

Math Score

Lang. Score

Bilingual Status

Days Absent

2

4

1

56.98

42.45

2

4

2

4

1

37.09

46.82

2

4

1

4

1

32.37

43.57

2

2

1

4

1

29.06

43.57

2

3

1

4

1

6.75

27.25

3

3

1

4

1

61.65

48.41

0

13

1

4

1

56.99

40.74

2

11

2

4

1

10.39

15.36

2

7

2

4

1

50.52

51.12

2

10

2

6

1

49.47

42.45

0

9

Gender

School: school 1 or school 2 Math Test score: continuous Language Test Score: continuous Bilingual status: four bilingual categories

t

7

14

21

28

35

42

y

8

41

133

250

280

297

Year Quantity

0

1

2

3

4

5

6

7

8

9

10

15

150

250

275

270

280

290

650

1200

1550

2,750

Projects 1. Fit the following nonlinear model with the provided data: Model: y = ax b Data:

Data Analysis with Regression Models & ML



359

2. Fit the following model, y = axb, with the provided data

Recommended steps. 1. Step 1: Ensure you understand the problem and what answers are required 2. Step 2: Get the data that is available. Identify the dependent and independent variables. 3. Step 3: Plot the dependent versus independent variables and note trends. 4. Step 4: If the dependent variable is binary {0, 1}, then use binary logistic regression. If the dependent variables are counts that follow a Poisson distribution, then use Poisson regression. Otherwise, try linear, multiple, or nonlinear regression as needed. 5. Step 5: Ensure your model produces results that are acceptable.

12.7 Conclusions and Summary We showed some of the common misconceptions by decision makers concerning correlation and regression. Our purpose of this presentation is to help prepare more competent and confident problem solvers for the 21st century. Data can be found using part of a sine curve where the correlation is quite poor, close to zero, but the decision maker can describe the pattern. Decision makers see the relationship in the data as periodic or oscillating. Examples such as these should dispel the idea that correlation of almost zero implies no relationship. Decision makers need to see and believe concepts concerning correlation, linear relationships, and nonlinear (or no) relationship.

References and Suggested Reading Affi, A. & S. Azen. (1979). Statistical Analysis, 2nd Edition. Academic Press, London, UK, pp. 143–144. Burden, R. L. & J. D. Faires. (2005). Numerical Analysis, 8th Edition. Brooks-Cole Publishers, Belmont, CA. Breiman, L. (2001). Statistical modeling: The two cultures, Statistical Science, 16(3), 199–231. Cheney, E. W. & D. Kincaid. (1984). Numerical Mathematics and Computing. Brooks-Cole Publishers, Monterey, CA. Devore, J. (2012). Probability and Statistics for Engineering and the Sciences, 8th Edition, Cengage Publisher, Belmont, CA, pp. 211–217.

360



Nonlinear Optimization

Fox, W. P. (2011). Using the EXCEL solver for nonlinear regression, Computers in Education Journal (COED), 2(4), 77–86. Fox, W. P. (2012a). Mathematical Modeling with MAPLE. Cengage Publishers, Boston, MA. Fox, W. P. (2012b). Issues and importance of “Good” starting points for nonlinear regression for mathematical modeling with MAPLE: Basic model fitting to make predictions with oscillating data. Journal of Computers in Mathematics and Science Teaching, 31(1), 1–16. Fox, W. P. & C. Fowler. (1996). Understanding covariance and correlation, PRIMUS, VI(3), 235–244. Fox W.P., Hammond J. (2019) Advanced Regression Models: Least Squares, Nonlinear, Poisson and Binary Logistics Regression Using R. In: García Márquez F., Lev B. (eds) Data Science and Digital Business. Springer, Cham. https://doi. org/10.1007/978-3-319-95651-0_12 Fox, W. & A. Ninh. (2019) Forecasting with Machine learning. Machine learning and data science. In Garcia Marquez, Fausto Pedro (eds) Handbook of Research on Big Data Clustering and Machine Learning. IGI Global, 2020, pp. 1–478. Web. 31 October 2019. doi: 10.4018/978-1-7998-0106-1. Giordano, F., W. P. Fox, S. Horton, & M. Weir. (2008). A First Course in Mathematical Modeling. Cengage Publishing, Belmont, CA. Giordano, F., W. P. Fox, & S. Horton. (2014). A First Course in Mathematical Modeling, 5th Edition. Cengage Publishers, Boston, MA. Huddleston, S. & G. Brown. (2018). Chapter 7, INFORMS Analytics Body of Knowledge. John Wiley & Sons and Naval Postgraduate School updated notes by Huddleston and Brown, New York. Johnson, I. (2012). An Introductory Handbook on Probability, Statistics, and Excel. http:// records.viu.ca/~johnstoi/maybe/maybe4.htm (accessed July 11, 2012). Mendenhall, W. & T. Sincich. (1996). A Second Course in Statistics Regression Analysis, 5th Edition. Prentice Hall, Upper saddle River, NJ, pp. 476–485. Montgomery, D., E. Peck, & G. Vinning. (2006). Introduction to Linear Regression Analysis, 4th Edition. John Wiley & Sons, Hoboken, NJ, pp. 428–448. Neter, J., M. Kutner, C. Nachtsheim, & W. Wasserman. (1996). Applied Linear Statistical Models, 4th Edition, Irwin Press, Chicago, IL, pp. 531–547.

Answers to Selected Problems Exercises for Chapter 1 3. For the cable installation example, assume that we are moving the computers around to the following coordinates and resolve.

X

Y

10

50

35

85

60

77

75

60

80

35

The Model This is an unconstrained optimization model. We want to minimize the sum of the distances from each department to the placement of the central computer system. The distances represent cable lengths assuming that a straight line is the shortest distance between two points. Using the distance formula, d=

( x − X 1 )2 + ( y − Y1 )

2

where d represents the distance (cable length in feet) between the location of the central computer (x, y) and the location of the first peripheral computer (X1, Y1). 5



Since we have five departments we define distance = We use the coordinates in the table above.

( x − X i )2 + ( y − Y )i 7. 2

i=1

361

362 ◾

Answers to Selected Problems

Exercises for Chapter 2 Problems 1–5: Find Each Limit (if it Exits):

(

)

1. lim x 3 + 2 x − 21 = 64 + 8 − 20 = 52 x→4

x2 + x =∞ x→∞ x t2 −1 3. lim =2 t→1 t − 1

2. lim

4. lim

y→0

tan( y) =1 y

x LDNE. x 6. Differentiate y = sin(x 2) = 2*x*cos(x^2) 7. Differentiate y = (sin2x) = 2*sin(x)*cos(x) 5. lim x→0

(

)

8. fx := diff exp ( xy ) , x ; 2

fx := y 2e xy

(

2

)

fy := diff exp ( xy ) , y ; 2

fy := 2 xye xy

2

fxx := diff ( f x , x ) ; fxx := y 4 e xy

2

fyy := diff ( f y , y ) ; 2

fyy := 2 xe xy + 4x 2 y 2e xy

2

fxy := diff ( f x , y ) ; 2

fxy := 2 ye xy + 2 y 3 xe xy

2

Exercises for Chapter 3 1. 2,000 by road 2. f ′ = 0 at x = 0, max is at x = 1, f(1) = 1 3. Extreme points at 0 and 10.6667. The point x = 0 yields at relative maximum and x = 10.667 yields a relative minimum 4. Maximum at x = 1. 5. The function is maximized at 0 and 2/3r 0. This confirms the conjecture.

Answers to Selected Problems



363

Exercises for Chapter 4 1. f:= x->-x^2-2*x; x → − x 2 − 2x a. DICHOTOMOUS ( f, −2, 1, 0.2, 0.01); The interval [a, b] is [−2.00, 1.00] and user-specified tolerance level is 0.20000. The first two experimental endpoints are x1= −0.510 and x 2 = −0.490. Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

1

−0.5100

−0.4900

0.7599

0.7399

[−2.0000, 1.0000]

2

−1.2550

−1.2350

0.9350

0.9448

[−2.0000, −0.4900]

3

−0.8825

−0.8625

0.9862

0.9811

[−1.2550, −0.4900]

4

−1.0688

−1.0488

0.9953

0.9976

[−1.2550, −0.8625]

5

−0.9756

−0.9556

0.9994

0.9980

[−1.0688, −0.8625]

The midpoint of the final interval is −0.965625 and f(midpoint)= 0.999. The maximum of the function is 0.995 and the x value = −1.068750 b. FIBSearch (f, −2, 1, 0.6); The interval [a, b] is [−2.00, 1.00], and user-specified tolerance level is 0.60000. The first two experimental endpoints are x1 = −0.800 and x 2 = −0.200. Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−1.4000

−0.8000

0.9600

0.3600

[−2.0000, −0.2000]

3

−0.8000

−0.8000

0.8400

0.9600

[−1.4000, −0.2000]

The midpoint of the final interval is −0.800000 and f(midpoint) = 0.960. The maximum of the function is 0.990 and the x value = −1.100000 c. GOLD(f, −2, 1, 0.6); The interval [a, b] is [−2.00, 1.00], and user-specified tolerance level is 0.60000. The first two experimental endpoints are x1= −0.854 and x 2 = −0.146.

364



Answers to Selected Problems

Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−1.2918

−0.8540

0.9787

0.2707

[−2.0000, −0.146]

3

−0.8540

−0.5837

0.9149

0.9787

[−1.2918, −0.146]

4

−1.0213

−0.8540

0.9787

0.8267

[−1.2918, −0.5837]

5

−1.1245

−1.0213

0.9995

0.9787

[−1.2918, −0.8540]

The midpoint of the final interval is −1.072886 and f(midpoint) = 0.995. The maximum of the function is 0.9995 and the x value = −1.1245. d. Newton’s Newton (f, −5, 10); −5 −1.000000000 −1.000000000 2. f:=x->-x^2-3*x; x → − x 2 − 3x a. DICHOTOMOUS ( f, −3, 1, 2, 0.01); The interval [a,b] is [−3.00, 1.00] and user-specified tolerance level is 0.20000. The first two experimental endpoints are x1 = −1.010 and x 2 = −0.990. Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

1

−1.0100

−0.9900

2.0099

1.9899

[−3.0000, 1.0000]

2

−2.0050

−1.9850

1.9950

2.0148

[−3.0000, −0.9900]

3

−1.5075

−1.4875

2.2499

2.2498

[−2.0050, −0.9900]

4

−1.7562

−1.7362

2.1843

2.1942

[−2.0050, −1.4875]

5

−1.6319

−1.6119

2.2326

2.2375

[−1.7562, −1.4875]

6

−1.5697

−1.5497

2.2451

2.2475

[−1.6319, −1.4875]

The midpoint of the final interval is −1.559688 and f (midpoint) = 2.246. The maximum of the function is 2.250 and the x value = −1.487500

Answers to Selected Problems



365

b. FIBSearch (f, −3, 1, 0.6); The interval [a, b] is [−3.00, 1.00], and user-specified tolerance level is 0.60000. The first two experimental endpoints are x1= −1.500 and x 2 = −0.500. Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−2.0000

−1.5000

2.2500

1.2500

[−3.0000, −0.5000]

3

−1.5000

−1.0000

2.0000

2.2500

[−2.0000, −0.5000]

4

−1.5000

−1.5000

2.2500

2.0000

[−2.0000, −1.0000]

The midpoint of the final interval is −1.500000 and f(midpoint) = 2.250. The maximum of the function is 2.250 and the x value = −1.500000 c. GOLD (f, −3, 1, 0.6); The interval [a, b] is [−3.00, 1.00], and user-specified tolerance level is 0.60000. The first two experimental endpoints are x1 = −1.472 and x 2 = −0.528. Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−2.0557

−1.4720

2.2492

1.3052

[−3.0000, −0.5280]

3

−1.4720

−1.1116

1.9412

2.2492

[−2.0557, −0.5280]

4

−1.6950

−1.4720

2.2492

2.0991

[−2.0557, −1.1116]

5

−1.4720

−1.3345

2.2120

2.2492

[−1.6950, −1.1116]

The midpoint of the final interval is −1.403312 and f(midpoint) = 2.241. The maximum of the function is 2.243 and the x value = −1.583638 d. Newton (f, 1, 10); −1 −1.500000000 −1.500000000 3. f:= x->-x^2-2*x; f := x → −x 2 − 2 x

366  ◾  Answers to Selected Problems

a. DICHOTOMOUS (  f, −2, 1, 0.2, 0.01); The interval [a, b] is [−2.00, 1.00], and user-specified tolerance level is 0.20000. The first two experimental endpoints are x1 = −0.510 and x 2 = −0.490.



Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

1

−0.5100

−0.4900

0.7599

0.7399

[−2.0000, 1.0000]

2

−1.2550

−1.2350

0.9350

0.9448

[−2.0000, −0.4900]

3

−0.8825

−0.8625

0.9862

0.9811

[−1.2550, −0.4900]

4

−1.0688

−1.0488

0.9953

0.9976

[−1.2550, −0.8625]

5

−0.9756

−0.9556

0.9994

0.9980

[−1.0688, −0.8625]

The midpoint of the final interval is−0.965625 and f(midpoint) = 0.999. The maximum of the function is 0.995 and the x value = −1.068750 b. FIBSearch (f, −2, 1, 0.6); The interval [a, b] is [−2.00, 1.00], and user-specified tolerance level is 0.60000.



Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−1.4000

−0.8000

0.9600

0.3600

[−2.0000, −0.2000]

3

−0.8000

−0.8000

0.8400

0.9600

[−1.4000, −0.2000]

The first two experimental endpoints are x1= −0.800 and x 2 = −0.200. The midpoint of the final interval is−0.800000 and f  (midpoint)  = 0.960. The maximum of the function is 0.990 and the x value = −1.100000 c. GOLD (f, −2, 1, 0.6); The interval [a, b] is [−2.00, 1.00], and user-specified tolerance level is 0.60000. The first two experimental endpoints are x1 = −0.854 and x 2 = −0.146.



Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−1.2918

−0.8540

0.9787

0.2707

[−2.0000, −0.1460]

3

−0.8540

−0.5837

0.9149

0.9787

[−1.2918, −0.1460]

4

−1.0213

−0.8540

0.9787

0.8267

[−1.2918, −0.5837]

5

−1.1245

−1.0213

0.9995

0.9787

[−1.2918, −0.8540]

Answers to Selected Problems  ◾  367

The midpoint of the final interval is −1.072886 and f(midpoint) = 0.995. The maximum of the function is 0.915 and the x value = −1.291772. d. Newton(  f, −3, 20); −3



−1 −1

4. f:= x->x-exp(x);



x → x − ex a. DICHOTOMOUS (f, −1, 3, 0.2, 0.01); The interval [a,b] is [−1.00, 3.00], and user-specified tolerance level is 0.20000. The first two experimental endpoints are x1 = 0.990 and x 2 = 1.010.



x(1)

x(2)

f(x1)

f(x2)

1

0.9900

1.0100

−1.7012

−1.7356

[−1.0000, 3.0000]

2

−0.0050

0.0150

−1.0000

−1.0001

[−1.0000, 1.0100]

3

−0.5025

−0.4825

−1.1075

−1.0997

[−1.0000, 0.0150]

4

−0.2538

−0.2338

−1.0296

−1.0253

[−0.5025, 0.0150]

5

−0.1294

−0.1094

−1.0080

−1.0058

[−0.2538, 0.0150]

6

−0.0672

−0.0472

−1.0022

−1.0011

[−0.1294, 0.0150]

Iteration

Interval

The midpoint of the final interval is−0.057188 and f  (midpoint) = −1.002. The maximum of the function is −1.000 and the x value = 0.015000. b. FIBSearch (f, −1, 3, 0.1); The interval [a, b] is [−1.00, 3.00] and user-specified tolerance level is 0.10000. The first two experimental endpoints are x1 = 0.527 and x 2 = 1.473.



Iteration

x(1)

x(2)

f(x1)

f(x2)

Interval

2

−0.0545

0.5273

−1.1670

−2.8884

[−1.0000, 1.4727]

3

−0.4182

−0.0545

−1.0015

−1.1670

[−1.0000, 0.5273] (Continued )

368  ◾  Answers to Selected Problems

x(1)

Iteration

x(2)

f(x1)

f(x2)

Interval

4

−0.0545

0.1636

−1.0764

−1.0015

[−0.4182, 0.5273]

5

−0.2000

−0.0545

−1.0015

−1.0141

[−0.4182, 0.1636]

6

−0.0545

0.0182

−1.0187

−1.0015

[−0.2000, 0.1636]

7

0.0182

0.0909

−1.0015

−1.0002

[−0.0545, 0.1636]

8

0.0182

0.0182

−1.0002

−1.0043

[−0.0545, 0.0909]

The midpoint of the final interval is 0.018182 and f  (midpoint) = −1.000. The maximum of the function is −1.001 and the x value =  −0.054545 c. GOLD (f, −1, 3, 0.1); The interval [a, b] is [−1.00, 3.00], and user-specified tolerance level is 0.10000. The first two experimental endpoints are x1 = 0.528 and x 2 = 1.472.



x(1)

Iteration

x(2)

f(x1)

f(x2)

Interval

2

−0.0557

0.5280

−1.1675

−2.8859

[−1.0000, 1.4720]

3

−0.4163

−0.0557

−1.0015

−1.1675

[−1.0000, 0.5280]

4

−0.0557

0.1673

−1.0758

−1.0015

[−0.4163, 0.5280]

5

−0.1934

−0.0557

−1.0015

−1.0148

[−0.4163, 0.1673]

6

−0.0557

0.0295

−1.0175

−1.0015

[−0.1934, 0.1673]

7

0.0295

0.0821

−1.0015

−1.0004

[−0.0557, 0.1673]

8

−0.0031

0.0295

−1.0004

−1.0035

[−0.0557, 0.0821]

9

−0.0231

−0.0031

−1.0000

−1.0004

[−0.0557, 0.0295]

The midpoint of the final interval is−0.013095 and f (midpoint) = −1.000. The maximum of the function is −1.002 and the x value = −0.055696 d. Newton ( f, −1, 20)



−1 0.718281828

0.2058711269

Answers to Selected Problems



369

0.0198090911 0.00019491102 1.90103 ⋅10−8 1.030036 ⋅10−11 1.030036 ⋅10−11 The value of x is essentially 0.

Exercises for Chapter 5 1. Find the directional derivative of f ( x , y) = 1 + 2x y at the point (3, 4) in the I direction of v = 4, −3  = 2.3 2. The directional derivative is 62.24 > 0. Hence, the slope is positive and the water is getting shallower. 3. Direction of the gradient is , and the slope is 17.44

Exercises for Chapter 6 1. a. PD, convex; b. Indefinite, neither; c. ND concave; d. Indefinite, neither; e. Indefinite, neither; f. Indefinite, neither; g. Indefinite neither; h. Indefinite, neither 2. a. Saddle at (0, 0); b. Global min at (0, 0); c. Global max at (0, 0); d. > f := 3⋅ x + 5⋅ y − 4⋅ x 2 + y 2 − 5 ⋅ x ⋅ y; f := 3 x + 5 y − 4 x 2 + y 2 − 5 x y

(

)

> Hessian f , [ x , y ] ; ⎡ −8 ⎢ ⎣⎢ −5

−5 ⎤ ⎥ 2 ⎦⎥

> dfx := diff ( f , x ) ; dfx := 3 − 8 x − 5 y > dfy := diff ( f , y ) ; dfy := 5 + 2 y − 5 x

370 ◾

Answers to Selected Problems

(

)

> solve {dfx = 0, dfy = 0} , { x, y } ;

{

x=

31 25 ,y=− 41 41

}

> f := a ⋅ x 2 + b ⋅ x ⋅ y + c ⋅ y 2 ; f := a x 2 + b x y + c y 2

(

)

> Hessian f , [ x , y ] ; ⎡ 2a ⎢ ⎢⎣ b

⎤ ⎥ 2c ⎥ ⎦ b

Convex : a > 0, c > 0, 4ac > b2 Conxave: a < 0,c < 0, 4ac > b2. 3a. f := x 2 + 3 ⋅ x ⋅ y − y 2 ; f := x 2 + 3 x y − y 2 dfx := diff ( f , x ) ; dfx := 2 x + 3 y dfy := diff ( f , y ) ; dfy := 3 x − 2 y

(

)

solve {dfx = 0, dfy = 0} , { x, y } ;

{ x = 0, y = 0} 3c. f := − x 2 − x ⋅ y − 2 ⋅ y 2 ; f := − x 2 − x y − 2 y 2 dfx := diff ( f , x ) ; dfx := −2 x − y

Answers to Selected Problems

dfy := diff ( f , y ) ; dfy := −x − 4 y

(

)

solve {dfx = 0, dfy = 0} , { x, y } ;

{ x = 0, y = 0} 3e. f := 2 ⋅ x + 3 ⋅ y + 3 ⋅ z − x ⋅ y + x ⋅ z − y ⋅ z − x 2 − 3 ⋅ y 2 − z 2 ; f := 2 x + 3 y + 3 z − x y + x z − y z − x 2 − 3 y 2 − z 2

(

)

A := Hessian f , [ x, y,z ] ; ⎡ −2 ⎢ A := ⎢ −1 ⎣⎢ 1

−1 −6 −1

1 −1 2

IsDefinite ( A,'query' = 'negative _ definite ') ; true dfx := diff ( f , x ) ; dfx := 2 − y + z − 2 x dfy := diff ( f , y ) ; dfy := 3 − x − z − 6 y dfz := diff ( f , z ) ; dfz := 3 + x − y − 2 z

(

)

solve {dfx = 0, dfy = 0, dfx = 0} , { x, y, z } ;

{

x = 0, y = 0, z =

3 2

}

⎤ ⎥ ⎥ ⎦⎥

◾ 371

372



Answers to Selected Problems

4. f := exp ( x − y ) + x 2 + y 2 ; f := e x − y + x 2 + y 2 ;

(

)

Hessian f , [ x , y ] ; ⎡ ex− y + 2 ⎢ x− y ⎢⎣ − e

− ex− y ex− y + 2

⎤ ⎥ ⎥⎦

dfx := diff ( f , x ) ; dfx := e x − y + 2 x dfy := diff ( f , y ) ; dfy := − e x − y + 2 y

(

)

fsolve {dfx = 0, dfy = 0} , { x, y } ;

{ x = −.2835716452, y = 0.2835716452} subs

({x = −.2835716452, y = 0.2835716452} , Hessian ( f ,[ x, y ])) ⎡ e −.5671432904 + 2 ⎢ −.5671432904 ⎢⎣ − e

− e −.5671432904 e −.5671432904 + 2

⎤ ⎥ ⎥⎦

evalm ( % ) ⎡ 2.567143290 A := ⎢ ⎣⎢ −.5671432904

−.5671432904 ⎤ ⎥ 2.567143290 ⎦⎥

Point is a minimum 4. Find and classify all critical points of f(x, y) = (x 2 + y2)1.5 − 4(x 2 + y2) {x = 0, y = 0.} saddle All local minimums as Hessian is positive definite. {x = 2.666666667, y = 0.}, {x = −2.666666667, y = 0.}, {x = 0, y = 2.666666667}, {x = 0, y = −2.666666667}

Answers to Selected Problems

◾ 373

7. Find all the extrema and then classify the extrema for the following functions: ⎡ 6x a. Hessian is ⎢ ⎢ −6 y ⎣

⎤ ⎥ 48 y 2 − 6x ⎥ ⎦ −6 y

{x = 0, y = 0}, H is indefinite so saddle point {x = 3/8, y = 3/8} Hessian is PD so local minimum; {x = 3/8, y = −3/8} and Hessian is PD so local minimum b. w(x, y, z) = x2 + 2xy − 4z + yz2 Critical points are: 2^(2/3), −(1/2)*2^(2/3) + I*sqrt(3)*2^(2/3)*(1/2), −(1/2)* 2^(2/3) − I*sqrt(3)*2^(2/3)*(1/2), 2^(2/3), −(1/2)* 2^(2/3) + I*sqrt(3)*2^(2/3)*(1/2), −(1/2)* 2^(2/3) − I*sqrt(3)*2^(2/3)*(1/2) We will take only the real roots x = y= z = 22/3 Hessian is indefinite. Critical point is a possible saddle.

Exercises for Chapter 7 1. GIVEN: MAX f(x,y) = 2xy + 2y − 2x 2 − y2

2. Given: MAX f(x,y) = 3xy − 4x 2 − 2y2 Assume our tolerance for the magnitude of the gradient is 0.10. Start at the point (x,y) = (1,1).

374



Answers to Selected Problems

3. A MAX f(x,y) = −x 3 + 3x + 8* y − 6y2 Start at (1,1).

Answers to Selected Problems

◾ 375

Exercises for Chapter 8 4 8 8 16 ⎤ ⎡ 1a. ⎢ x = , y = , λ1 = , x 2 + y 2 = ⎥ 5 5 5 5⎦ ⎣ ⎡⎣x = 0.8000000000, y = 1.600000000, λ1 = 1.600000000, x 2 + y 2 = 3.200000000⎤⎦ 12 4 6 9⎤ ⎡ b. ⎢ x = , y = , λ1 = , (x − 3)2 + ( y − 2)2 = ⎥ 5 5 5 5⎦ ⎣ 2 ⎡x ⎣ = 2.400000000, y = 0.8000000000, λ1 = −1.200000000,(x − 3.)

+ ( y − 2.)2 = 1.800000000⎤⎦ 2 2 c. ⎡x ⎣ = −.7071067810, y = 0.7071067810, λ1 = −1., x + y

+ 4 ⋅ x y = −.9999999994⎤⎦ 2 2 ⎡x ⎣ = 0.7071067810, y = 0.7071067810, λ1 = 3., x + y

+ 4 ⋅ x y = 2.999999998] d. ⎡⎣11.6799999999999998, [ x = 1.59999999999999986, y = 1.20000000000000016 ⎦⎤ ] 292 ⎤ 8 6 9 56 2 ⎡ 2 ⎢⎣ x = 5 , y = 5 , λ1 = 5 , λ2 = 25 , x + y + 4 x y = 25 ⎥⎦ , ⎡⎣ x = 0, y = 2, λ1 = −3, λ2 = 8, x 2 + y 2 + 4 x y = 4⎤⎦ ⎡x ⎣ = 1.600000000, y = 1.200000000, λ1 = 1.800000000,

λ2 = 2.240000000, x 2 + y 2 + 4 ⋅ x y = 11.68000000⎤⎦ , ⎡⎣ x = 0, y = 2., λ1 = −3., λ2 = 8., x 2 + y 2 + 4 ⋅ x y = 4.⎤⎦

(

3. ⎣⎡ x = 0., y = RootOf −1 + 2 _ Z 2 − z 2 ,label = _ L3

)

λ1 = 0.5000000000, x 2 + y 2 + z 2 = RootOf ( −1 + 2 _ Z 2 − z 2 ,) ] , label = _ L3 ) + z 2 ]

376



Answers to Selected Problems

(

)

⎡ x = RootOf _ Z 2 − 1 − z 2 ,label = _ L4 y = 0., λ1 = 1., ⎣

(

)

x 2 + y 2 + z 2 = RootOf _ Z 2 − 1 − z 2 ,label = _ L4 + z 2 ⎤⎦ , 4. z = 14.2189, x = 2.2.354, y = 3.3279 x = 2.2254, y = 3.32379, λ1 = 1.44794, λ2 = −0.43175, z = 14.289 6. x = 6, y = 4, w = 3, Z = 72, λ = 6.

Exercises for Chapter 9 1. 2. 3. 4. 5. 6. 7.

x = 5.447368421, y = 4.368421053, λ1 = −0.236 λ2 = 0, Z = 30.64473684 x = 3.973684211, y = 2.684210526, λ1 = −0.868, λ2 = 0, Z = 26.22368421 x = −3, y = −2, λ1 = 0 λ2 = 0, Z = −6 x = −0.5, y = 2.75, Z = 6.375, λ1 = 2.50, λ2 = 6.25 x = 2.139387691, y = 3.069693846, λ1 = 2.4936, λ2 = −0.1835, Z = −6.4136 (0,0) x = 2.25, y = 3.42847, λ1 = 3.42857, z = 4.57142

Exercises for Chapter 10 Exercises for Section 10.2 1. Max : 5x − x 2 + 8 y − 2 y 2 s.t. 3x + 2 y ≤ 6 x, y ≥ 0 Start at ( 0,0 ) 2. Max : −(x − 6)2 − ( y − 8)2 s.t. − x + 2 y ≤ 4 2 x + 3 y ≤ 12 x, y ≥ 0 Start at (1,1) 3. Min : z = x 2 + y 2 − 4x − 4 y + 8 s.t. x + 2 y < 4 x, y ≥ 0

Answers to Selected Problems



377

Start at (0,0), perform ten iterations. Do you have a solution after ten iterations? 4. Resolve problem three using the new starting point algorithm. Does it converge to a solution any faster?

Exercises for Section 10.3 Use Wolfe’s method to solve the following QPP. 5. Min : z = 2 x12 − x 2 s.t. 2x1 − x 2 ≤ 1 x1 + x 2 ≤ 1 x1 , x 2 ≥ 0 6. Min : z = x1 + 2x 22 s.t. x1 + x 2 ≤ 2 2 x1 + x 2 ≤ 3 x1 , x 2 ≥ 0

Exercises for Section 10.4 1. Max : − ( x − 6 ) − ( y − 2 ) 2

2

s.t. − x + 2 y ≤ 4 x 2 + y 2 ≤ 14 0 ≤ x, 0 ≤ y 2. Max : − ( x − 6 ) − ( y − 8) 2

2

s.t. − x + 2y ≤ 4 x 2 + y 2 ≤ 14 0 ≤ x, 0 ≤ y with(Optimization); f101 := -x^2-2*y^2+5*x+8*y; 2 2 f101 := -x + 2 y + 5 x + 8 y c10 := {x >= 0, y >= 0, 3*x+2*y