301 120 24MB
English Pages 352 [421] Year 2020
This page intentionally left blank. Inside front cover.
Continuing Financial Modelling Dr. Liam Bastick
Copyright © 2020 SumProduct Pty Limited. Published in 2020 by SumProduct Pty Limited. Level 9, 440 Collins Street, Melbourne, Vic 3000, Australia. Simultaneously published in the USA by Holy Macro! Books, PO Box PO Box 541731, Merritt Island FL 32954. All rights reserved. Author: Dr. Liam Bastick Indexer: Nellie Jay Compositor: Joseph Kirubakaran Cover Design: Shannon Travise Distributed by Independent Publishers Group, Chicago, IL ISBN 978-1-61547-068-6 Print, 978-1-61547-154-6 Digital Library of Congress Control Number: 2020999999 No parts of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, without either the prior written permission of the publisher, or authorisation through payment of the appropriate photocopy fee to the Copyright Clearance Center. Requests for permission should be addressed to the publisher, SumProduct Pty Limited, Level 9, 440 Collins Street, Melbourne, Vic 3000, Australia, tel: +61 3 9020 2071, email: contact@ sumproduct.com. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in rendering public services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. SumProduct (www.sumproduct.com) is such an organisation that undertakes both tailored training and consulting services. Neither the author nor the publisher is liable for any actions prompted or caused by the information presented in this book. Any views expressed or implied herein are those of the author and do not represent the views of the organisations he works for. Microsoft and Excel are registered trademarks of the Microsoft Corporation.
About the Author Dr. Liam Bastick FCA FCMA CGMA MVP Starting off as a university lecturer, Liam has over 30 years’ experience in financial model development / auditing, valuations, mergers and acquisitions, project finance, public private partnerships, strategy, training and consultancy. Indeed, he has been appointed as an independent expert for the courts of Victoria and New South Wales, in Australia. He has considerable experience in many different sectors (e.g. banking, energy, media, mining, oil and gas, private equity, retail, transport and utilities) and has worked in many countries (including Australia, Belgium, Denmark, France, Germany, Hong Kong, Indonesia, Malaysia, Netherlands, New Zealand, Philippines, Singapore, Switzerland, United Kingdom, United States and Vietnam). He has worked with many internationally recognised clients, constructing and reviewing strategic, operational, planning and valuation models for many high profile International Public Offerings (IPOs), Leveraged Buy-Outs (LBOs) and strategic assignments. With over 1,000 articles written for the accounting profession, he is a regular contributor to the American Institute of Certified Public Accountants (AICPA), Chartered Accountants Australia and New Zealand (CAANZ), Certified Practising Accountants Australia (CPAA), the Chartered Institute of Management Accountants (CIMA), the Institute of Chartered Accountants in England and Wales (ICAEW), Finance 3.0, Microsoft’s Excel Blog and various LinkedIn specialist discussion groups. Liam is a Fellow of the Institute of Chartered Accountants (FCA), a Fellow of the Institute of Chartered Management Accountants (FCMA), a Chartered Global Management Accountant (CGMA), and is also a professional mathematician, specialising in probability and number theory. A frequent public speaker, Liam attends Excel and Power BI conferences around the globe and has been a central organiser for the Excel Summit South and Excel Virtually Global. He has also authored and edited several books including the sister volume An Introduction to Financial Modelling, as well as the Power BI MVP Book and Excel Insights. Since 2012, he has been recognised by Crimewatch and Microsoft, the latter as a Most Valuable Professional (MVP) in Excel, one of 66 such awardees worldwide (as at the time of writing). He still follows Derby County and the England cricket team.
Preface It’s just like waiting for a bus: you wait forever for one and then several come at once. Well, it took me nigh on 30 years to put keyboard to virtual paper for the first tome, An Introduction to Financial Modelling, and then I wrote three more in less than two years! This book is not so much the sequel to my first sortie into the world of financial modelling, as the continuing reconnaissance mission. If I don’t say so myself, the first book contained a lot of useful tips, tricks and ideas about how to build a financial model, but there was so much that was left unsaid. This book is a companion volume to what I wanted to be a simple base. It attempts to address some of the gaps, such as modelling inventory, considering what-if? analysis and extending the model to develop a valuation. But there’s much more I wanted to address, and I have tried my best to include most of those topics within these pages. I have been lucky enough to remain a Most Valuable Professional (MVP) by Microsoft for services to Excel – one of 66 so-recognised “experts” as at the time of writing. Now I know an “ex” is a has-been and a “spurt” is a drip under pressure, but I hope you’ll see me as a farmer – someone out(-)standing in their field! It takes thousands of mistakes to get good at something. I hope you take all the tips out of this volume and avoid the aforementioned errors. I’d like to thank those that helped contribute to this book. There are quite a few: Tim Heng for technically editing it, putting up with the usual raft of bad jokes and arguing points ad nauseum, Jonathan Liau, Hanh Tran and Greg Liu for firming up the examples, Bill Jelen for assisting getting it out into the wild, and most importantly my immediate family, Nancy and little Layla, who continue to support me. I let my daughter, Layla Bastick, have it the first time around, so I thought it best to let her have the final word once more (she always does anyway): “Surprisingly, he’s made another book! I haven’t actually read the first one yet because there was too much writing and information. However, I applaud my daddy for being able to write it. I love you daddy.” Liam Bastick, May 2020 www.sumproduct.com
Editor’s notes I’m not quite sure how I got roped into editing yet another of Liam’s books. It’s almost as if the jokes in the first weren’t funny enough, so he’s felt the need to double down this time. Or that the content in the first book wasn’t complex enough, so he’s tripled the challenge here. Of course, that’s not an entirely fair argument. We deliberately went into the first book trying to keep it simple and hands-on practical. The goal was never to make it comprehensive and solve every modelling problem under the sun. We knew there was always going to be a second and third book in the series, which would take particular concepts and ideas and drill into them in far more detail. Even now, this book isn’t going to cover every unique circumstance. There will be ideas here that you might think should be calculated differently. There are certainly sections here that I would do differently. But it’s important to note that nothing in modelling will ever be transferred perfectly from one person’s experience to another’s, and the examples in this book are here to give you inspiration and a solid basis for adapting the concept to your own particular problems. So if you find that you see something in the book that doesn’t quite work for your circumstances, feel free to get in touch and we can have a discussion about it and help you work it out (for a fee, of course). If you find that you disagree with something in the book and feel that you simply must write in to vent and complain, then at the risk of stealing one of Liam’s lines that I edited out, please let us know at [email protected] – we can promise you that no one will read your email. Enjoy your reading, and we’ll meet again in a book’s time! Tim Heng, Microsoft Excel MVP www.sumproduct.com
Downloadable Resources A picture may show a thousand words, but it’s incredibly hard to convey exactly how a formula or a set of calculations work without seeing the overall context of the model and being able to drill into the formula itself. With that in mind, we have set up a page on our website for further reference, where we will (for as long as our website and the internet as we know it still exists) provide files containing examples that we have used throughout this book, as well as any supplementary materials that we think may be useful. Head to our website at https://www.sumproduct.com/book-2-resources to download any of the files referred to in our book, as well as any other materials that we think might be useful for you.
Contents About the Author................................................................................................................. iii Preface................................................................................................................................. iv Editor’s notes........................................................................................................................ v Downloadable Resources................................................................................................... vi Chapter 0: Not an Introduction, But a Continuation...........................................................1 Chapter 1: Recap.................................................................................................................3 Chapter 1.1: Best Practice Concept.................................................................................. 4 Chapter 1.2: Time Series Analysis.................................................................................... 10 CHAPTER 1.3: FINANCIAL STATEMENT THEORY.......................................................... 14 CHAPTER 1.4: CONTROL ACCOUNTS........................................................................... 25 Chapter 2: What-If? Analysis.............................................................................................28 Chapter 2.1: Conditional Formulae.................................................................................. 29 CHAPTER 2.2: OFFSET................................................................................................... 67 CHAPTER 2.3: SCENARIO ANALYSIS............................................................................. 68 CHAPTER 2.4: DATA TABLES......................................................................................... 70 CHAPTER 2.5: INDEX and MATCH.................................................................................. 85 CHAPTER 2.6: USING TORNADO CHARTS FOR SENSITIVITY ANALYSIS...................... 89 CHAPTER 2.7: SIMULATIONS ANALYSIS........................................................................ 95 CHAPTER 2.8: VARIANCE ANALYSIS............................................................................ 105 CHAPTER 2.9: BREAKEVEN ANALYSIS........................................................................ 108 Chapter 3: Forecasting Considerations..........................................................................112 CHAPTER 3.1: SEASONAL / CYCLICAL FORECASTING.............................................. 113 CHAPTER 3.2: REVISING FORECASTS........................................................................ 121 CHAPTER 3.3: PRO-RATING FORECASTS OVER TIME................................................ 129 CHAPTER 3.4: CHANGING PERIODICITIES.................................................................. 135 Aggregating Time Periods.............................................................................................. 136 CHAPTER 3.5: MODELLING HISTORICAL, ACTUAL AND FORECAST DATA................ 146 CHAPTER 3.6: ROLLING BUDGETS AND CHARTS...................................................... 152 CHAPTER 3.7: FORECASTING MAXIMUM CASH REQUIRED....................................... 161 Chapter 4: Modelling Inventory.......................................................................................167 Chapter 5: Capital Expenditure.......................................................................................173 Chapter 6: Debt................................................................................................................180 CHAPTER 6.1: DEBT SCULPTING................................................................................ 182 CHAPTER 6.2: CALCULATING INTEREST RATES CORRECTLY................................... 197
CHAPTER 6.3: USEFUL REPAYMENT FUNCTIONS AND FORMULAE.......................... 199 Chapter 7: Valuation Considerations...............................................................................208 CHAPTER 7.1: DERIVING THE CORRECT CASH FLOW FOR DCF............................... 222 CHAPTER 7.2: CONSIDERING DISCOUNT FACTORS.................................................. 228 CHAPTER 7.3: COMMON DCF MODELLING ERRORS................................................. 235 CHAPTER 7.4: USING THE DIVIDEND DISCOUNT MODEL.......................................... 248 CHAPTER 7.5: IRRELEVANT IRR.................................................................................. 250 CHAPTER 7.6: MODIFIED INTERNAL RATE OF RETURN (MIRR).................................. 256 CHAPTER 7.7: SMOOTHING CAPITAL EXPENDITURE................................................. 262 CHAPTER 7.8: CALCULATING ECONOMIC LIVES........................................................ 266 Chapter 8: Linking Models...............................................................................................270 Chapter 9: Life’s Too Short...............................................................................................283 CHAPTER 9.1: SECTION NUMBERING......................................................................... 284 CHAPTER 9.2: US vs. EUROPEAN DATES.................................................................... 290 CHAPTER 9.3: SHEET REFERENCING......................................................................... 295 CHAPTER 9.4: REDUCING FILE SIZE............................................................................ 300 CHAPTER 9.5: TAKING IT TO THE LIMIT....................................................................... 308 CHAPTER 9.6: ORDER OF OPERATIONS..................................................................... 315 CHAPTER 9.7: KEEPING STYLES UNDER CONTROL.................................................. 323 CHAPTER 9.8: WEARING PROTECTION....................................................................... 328 Chapter 10: Look To The Future......................................................................................337 CHAPTER 10.1: DYNAMIC ARRAYS............................................................................. 338 CHAPTER 10.2: XLOOKUP AND XMATCH.................................................................... 380 CHAPTER 10.3: LET IT BE............................................................................................ 394 CHAPTER 10.4: NEW DATA TYPES.............................................................................. 397 CHAPTER 10.5: STOCKHISTORY................................................................................. 403 RESOURCES................................................................................................................ 407 Index.................................................................................................................................408
viii
CHAPTER 0: Not an Introduction, But a Continuation
Chapter 0: Not an Introduction, But a Continuation They say I always have to undertake things an even number of times: the first time to do it and the second time to apologise. Well, just like that embarrassing condition you can only tell your doctor about, I’m back. This sister volume to An Introduction to Financial Modelling starts where that book ended. The first tome addressed my frustration that there’s never been a really practical book that helps novices and the experienced alike to build better financial models. It began with the key things you needed to know and use in Excel, discussed what “Best Practice” really is and then explained a bullet-proof method so that you could build three-way integrated relationships (sounds kinky I know) between your financial statements, namely the Income Statement, the Balance Sheet and the Cash Flow Statement. Heck, it was all about getting your financial models to work and that blessed Balance Sheet to balance first time, every time. But that’s where that book finished. That’s not the end of the story. If you did read that book, hopefully, you now feel like a more competent modeller and understand why you should model in a certain order. However, to keep the page count down it deliberately ended with providing both a process and a justification for building a model in a particular way. This book addresses some of the complications deliberately excluded from that riveting read. There were various reasons for this (one already mentioned), but perhaps the main factor was that these issues do not occur in every model. But they do happen. That’s what this book focuses on: the common problems in building a model and how best to solve them. Since these issues are unrelated and occur irregularly, this book differs in structure from its predecessor. That book took a premiss, described it and justified it, cradle to grave. It needed to be read linearly – that’s not the case here. Most chapters are standalone and may act more as a reference guide to tackle the common issues that occur. Therefore, dear reader, feel free to dip in and out of this book as you see fit. There is no over-arching, massive case study this time – just the usual Excel examples, proliferated throughout. The plan is therefore as follows: • Recap: It may have been a while since you read the first book, it may be you did not read it all (cheapskate). I recognise we need to start somewhere with a common ground and this chapter recapitulates the salient points from the introductory book. • What-if? analysis: Since the first book came out, if there’s one question I have been asked above all else, it’s “where’s the sensitivity and scenario analysis?”. That subject matter didn’t sit right with me in the first book, with the primary objective to show you how to build three-way integrated financial statements – but it follows on. Therefore, I’ve made it Chapter 2. • Forecasting: At the other end of the spectrum, the first book could have had a preamble too on how to get your inputs in the first place. Chapter 3 takes a look at ways to model your forecasts and how to update forecasts with actuals as they eventuate. • Modelling inventory: I was in two minds about including this in the first book. Inventory is often an important part of a financial model, but was deliberately excluded last time out. There was a good reason for this: unlike other areas, modelling inventory was a little more complex and requires more than one control account. When writing the first book, I felt that might have detracted from the emphasis I was placing on control account modelling. I put this omission right in Chapter 4. • Capital expenditure: Another key aspect of any financial modelling, this book looks at key modelling issues associated with non-current asset expenditure. I discussed 1
depreciation in some detail in the first book, but here I turn my attention to errors made in modelling opening Net Book Value depreciation, as well as smoothing capital expenditure and confirming economic lives. The first topic is included in Chapter 5, but the remaining subjects are tacked later in Chapter 7, where I consider a myriad of different valuation considerations. • Debt: I have over 30 years’ experience in debt (very few professionally). It’s another important area to consider in the financial modelling arena, and Chapter 6 looks at both getting the interest right and shows you how to incorporate debt sculpting without a macro, as well as some other useful formulae and functions. • Valuations: I have spent many years building, teaching and reviewing valuation models and have seen many errors made. Chapter 7 looks at some of the common mistakes made – and how you can avoid them. It’s best to use first principles – and keep it very, very simple. • Linking models: There comes a time in every modeller’s professional life where you realise you can’t have everything in one Excel workbook. But are you disciplined enough to structure model relationships appropriately? Chapter 8 provides some simple tips on how to make life easier in a multi-model environment. • Miscellaneous: This ad hoc Chapter 9 looks at common issues encountered whilst modelling. Perhaps more of a reference section than other parts of the book, this chapter highlights limits and calculation orders, whilst also providing useful tricks and tips for file protection, keeping file size lower and ensuring the number of workbook styles does not spiral out of control. • The future: Excel is moving on; are you? The final chapter addresses three key topics which are going to revolutionise the way we both use Excel and model in the very near future. This chapter looks at dynamic arrays, the new functions XLOOKUP and XMATCH (goodbye VLOOKUP and INDEX MATCH!) and rich data types. Know what’s coming (some of which has arrived) now. Remember, above all, this book is primarily a practical book. Make use of the extensive Excel models, grouped by chapter / section, to visualise the important concepts discussed here. Get that laptop out (it’s not just for private browsing). There are examples aplenty and the best way to understand is to do. Enjoy!
Liam, New Year’s Eve, 2019.
2
CHAPTER 1: Recap
Chapter 1: Recap The previous book, An Introduction to Financial Modelling, discussed the key Excel functions and features for financial modelling, how they should be implemented in a “Best Practice” way, and then went on to discuss financial statement modelling methodology using control accounts. That was the best part of 400 pages. If you want the full warts and all summary, please refer to that book or go and watch the movie (although to be honest, it’s not Tarantino’s finest). Here, I will provide a brief summary. If there is only one thing you ever take away from any of my work it should be that financial modelling has one basic rule: KEEP IT SIMPLE STUPID That’s it. A colleague of mine once talked about the Rule of Thumb: Excel formulae in your formula bar should be no longer than your thumb:
I have not changed my mind (or the graphic) since the first book. Keeping things short and simple forces you to structure your workbooks in such a way that other people – unsophisticated end users – may follow your thinking and calculations without ever having to refer to the formula bar and Excel Help. That has to be a “Best Practice” concept, surely..?
3
CHAPTER 1.1: BEST PRACTICE CONCEPT Spreadsheeting is often seen as a core skill for accountants, many of whom are reasonably conversant with Excel. However, many modellers frequently forget that the key end users of a spreadsheet model (i.e. the decision makers) are not necessarily sophisticated Excel users and often only see the final output on a printed page, e.g. as an appendix to a Word document or as part of a set of PowerPoint slides. With this borne in mind, it becomes easier to understand why there have been numerous high-profile examples of material spreadsheet errors. I am not saying that well-structured models will ensure no mistakes, but in theory it should reduce both the number and the magnitude of these errors. Modellers should strive to build “Best Practice” models. Here, I want to avoid the semantics of what constitutes ‘best” in “Best Practice”. “B and “P” are in capitals deliberately as I see this as a proper noun insofar no method is truly “best” for all eventualities. I would rather we consider the term as a proper noun to reflect the idea that a good model has four key attributes: • Consistency; • Robustness; • Flexibility; and • Transparency. Our company calls this CRaFT. We try to keep it simple (how many times have I said that so far?). Looking at these four attributes in turn can help model developers decide how best to design financial models.
4
CHAPTER 1: Recap
Consistency Models constructed consistently are easier to understand as users become familiar with both their purpose and content. This will in turn give users more comfort about model integrity and make it easier to add / remove business units, categories, numbers of periods, scenarios etc. Consistent formatting and use of styles cannot be over-emphasised. Humans take in much information on a non-verbal basis. Consider the following old ‘Print’ dialog box from c. Excel 2003:
True, this interface has long since been replaced, but even if you have never seen this dialog box before, you just know where you need to input data. We may not realise it, but we have all been indoctrinated by Microsoft. When modelling, we should exploit this mindset: the worksheets in my workbooks all contain objects or cells that may be modified by the user are readily identifiable without the reading of any instructions:
5
There are other key elements of a workbook that should be consistent. These include: • Formulae should be copied uniformly across ranges, to make it easy to add / remove periods or categories as necessary • Sheet titles and hyperlinks should be consistently positioned to aid navigation and provide details about the content and purpose of the particular worksheet • For forecast spreadsheets incorporating dates, the dates should be consistently positioned (i.e. first period should always be in one particular column), the number of periods should be consistent where possible and the periodicity should be uniform (the model should endeavour to show all sheets monthly or quarterly, etc.). If periodicities must change, they should be in clearly delineated sections of the model. If you do have a model where you want the first 12 months (say) to be monthly, then annually thereafter, always model everything at the lowest level of granularity (here, monthly) and then use SUMIF to aggregate months into years on output sheets later – it makes formulae so much easier to create and manipulate. But more on that later… This should reduce referencing errors, increase model integrity and enhance workbook structure.
Robustness Models should be materially free from error, mathematically accurate and readily auditable. Key output sheets should ensure that error messages such as #DIV/0!, #VALUE!, #REF! etc. cannot occur (ideally, these error messages should not occur anywhere). My old boss used to promote the idea of “cockroach theory”: once you saw one of these errors in a model, you would believe the model was infested and never trust it after that. Removing these prima facie errors is straightforward and often highlights that the modeller has not undertaken a basic review of his or her work after completing the task (see later). When building, it is often worth keeping in mind hidden assumptions in formulae. For example, a simple gross margin calculation may calculate profit divided by sales. However, if sales are non-existent or missing, this calculation would give #DIV/0! The user therefore has two options: • Use an IF statement to check that sales are not zero (proactive test); or • Construct an error check to flag if sales are zero (reactive test, not recommended in this instance). However, checks are useful in many situations, and essentially each will fit into one of three categories: 1. Error checks – the model contains flawed logic or prima facie errors, e.g. Balance Sheet does not balance, cash in cashflow statement does not reconcile with the balance sheet, or the model contains #DIV/0! errors etc; 2. Sensitivity checks – the model’s outputs are being derived from inputs that are not deemed to be part of the base case. This can prevent erroneous decisions being made using the “Best Case”; and 3. Alert checks – everything else! This flags points of interest to users and / or developers’ issues that may need to be reviewed: e.g. revenues are negative, debt covenants have been breached, etc.
6
CHAPTER 1: Recap
Incorporating dedicated worksheets into the model that summarise these checks will enhance robustness and give users more confidence that the model is working / calculating as intended.
The above is a sanitised screenshot from a real financial model. It is an extract from a worksheet with no fewer than 800 checks with the overall total included at the bottom (this links to the overall check at the top of the sheet, displayed in all worksheets throughout the model). Each check may be switched off if necessary and each check hyperlinks back to where the check is in the model. If you were the recipient of such a model, assuming the checks have been calculated correctly (!), would you feel more comfortable with this model compared to the usual fare received?
7
Flexibility One benefit of modelling in a spreadsheet package such as Excel is to be able to change various assumptions and see how these adjustments affect various outputs. Therefore, when building a model, the user should consider what inputs should be variable and how they should be able to vary. This may force the model builder to consider how assumptions should be entered. The most common method of data entry in practice is simply typing data into worksheet cells, but this may allow a model’s inputs to vary outside of scoped parameters. For example, if I have a cell seeking ‘Volumes’, without using data validation I could enter ‘3’, ‘-22.8’ or ‘dog’ in that cell. Negative volumes are nonsensical and being able to enter text may cause formula errors throughout the model. Therefore, the user may wish to consider other methods of entry including using drop down boxes, option buttons, check boxes and so on. I strongly recommend that all inputs are entered as positive numbers, wherever possible, just change the descriptions accordingly. If I were to tell you that last year costs were $10,000 but they have increased 10% this year, you would understand me. But what would you make of me telling you costs were minus $10,000 and had increased by -10%!? The aim is to have a model provide sufficient flexibility without going overboard.
Transparency As stated above, many modellers often forget that key decision makers base their choices on printed materials: consequently, models must be clear, concise and fit for the purpose intended. I always say if you can follow it on a piece of paper (i.e. no formula bar), it’s transparent. Most Excel users are familiar with keeping inputs / assumptions away from calculations away from outputs. However, this concept can be extended: it can make sense to keep different areas of a model separate, e.g. revenue assumptions on a different worksheet from cost(s) of goods sold assumptions, and capital expenditure assumptions on a third sheet, and so on. This makes it easier to re-use worksheets and ringfence data. Keeping base case data away from sensitivity data is also important, as many modelling mistakes have been made from users changing the wrong, yet similar, inputs. Aside from trying to keep formulae as simple as possible, it makes sense to consider the logical flow of a model at the outset too. Indeed, including a simple flowchart within an Excel workbook can be invaluable: as the saying goes, a picture is worth a thousand words, and can actually help to plan the structure and order of the spreadsheet build.
Again, this graphic comes from a genuine model, albeit modified. It should be noted that not only does this graphic show how the model flows, each box within the graphic is actually a hyperlink that takes you to the relevant section of the model, complete with documentation. 8
CHAPTER 1: Recap
Similarly, a Table of Contents constructed with hyperlinks helps users and developers alike navigate through larger Excel models:
In summary, it’s all about design and scoping. The problem is, we are all time poor in today’s business environment with perpetual pressure on producing results more and more quickly. Consequently, we dust off old templates, fit square pegs in round holes and produce mistakeladen spreadsheets time and time again resulting in costly management decisions. The whole process is simply a false economy. Time spent on better scoping out the model and designing the layout better will lead to fewer mistakes and greater efficiencies in the long term. 9
CHAPTER 1.2: TIME SERIES ANALYSIS Most forecast models project key outputs over multiple periods. Typically, these periods are not headed “Time 1”, “Time 2”, etc., but display end dates to assist users to understand payback periods, seasonality, trends and so on. An example time series could contain some or all of the following:
The above is an example of time series data in a real-life financial model. My question is this though: are all of these rows absolutely necessary? I would suggest not. Essentially, three lines are necessarily needed when modelling (the rest may be derived as necessary): • Start date: This will allow for models where the first period is not a “full” period (often called a ‘stub’ period), e.g. a business may wish to project its profits from now until the end of the calendar year for the first year. • End date: This will define the end of the period and will often coincide with reporting dates, e.g. end of financial year or quarter ends. By having both the start date and end date defined, a modeller can determine the number of days / weeks / months in the period, which financial year the period pertains too and so forth. • Counter: Start and end dates are insufficient. Constructing calculations based on consideration of a date is fraught with potential issues in Excel. This is because dates are really serial numbers in Excel, which may differ depending upon the underlying operating system (e.g. Day 1 for Microsoft Excel for Windows is 1 January 1900, whilst Day 1 is 1 January 1904 for older versions of Microsoft Excel for the Macintosh). Further, if you are building a monthly model you may wish to divide an annual figure evenly instead of based on the number of days. This is also the easiest way to identify the first and last periods in a robust manner. So, bearing this in mind, how do you build up the necessary formulae for these three line items allowing for the more common eventualities? Well, to begin with, there’s only really one troublesome formula. This is because: • The Counter is simply the last period’s number plus one. I tend to use the formula =N(Previous_Cell)+1, where the N() function takes the numerical value in the previous cell, and more importantly, text is ignored so that #VALUE! errors will not arise. • The Start Date is simply the Model Start Date for the first period and the day following the last period’s end date otherwise. This can simply be written as =IF(Counter=1, Model_Start_Date, Previous_Period_End_Date+1).
10
CHAPTER 1: Recap
Therefore, we need only consider the formula for the Period End Date. Consider the following simple example:
I have selected an arbitrary start date (Model_Start_Date) of 17 January 2020, and assumed that the number of months in a full period (Periodicity) is 3. The third line item is a little more subtle: this specifies which periods are period ends by specifying one month that will be a period end month. For example, tax may be paid quarterly in the months of January, April, July and October. By entering a Periodicity of 3 and specifying an Example_Reporting_ Month of any of 1, 4, 7 or 10, this will provide sufficient information to work out the quarter ends, i.e. 31-Jan, 30-Apr, 31-Jul and 31-Oct. The Reporting_Month_Factor is simply the minimum of these acceptable alternative values and is calculated automatically here. The approach I will use here requires that the periodicity is a divisor of the number of months in a year (Months_in_Year) – which is why my example only allows the Periodicity to be 1, 2, 3, 4, 6 or 12. In the example above, we are building a quarterly model where December is one of the quarter ends. Therefore, the possible quarter end months are:
This example table allows for up to 12 month ends (i.e. for a monthly model). So how do we derive the necessary formula? I will give you some insight into my simplistic view of the world. First, I would construct the following table:
11
This simple table considers all 12 months of the year for the Model_Start_Date (first column). The middle column displays which month would be the first quarter, given the assumption regarding the month of the Model_Start_Date. Therefore, a start date in January, February or March will give rise to a March quarter end, etc. I can use an array formula to calculate this month number dynamically. This is not necessary, and the values could just be typed in – remember, this table is simply a tool to ascertain how to construct the formula required. The final column is then the difference between the end date month and the Model_Start_ Date month. It is slightly more complicated than this as we need to consider what happens if the Model_Start_Date month exceeds the final end date month. For example, in my tax example above, tax arising in November (month 11) is after the final payment period of the year (month 10). This would be paid in month 1 of the following year instead. The point is, the final column highlights the pattern of how many months after the Model_ Start_Date the first reporting period will occur. We can now use two functions in tandem to derive this first period end date: • EOMONTH(Date, Months) returns the last day of the month so many months from the date specified. For example, =EOMONTH(11-Dec-20, 14) would be 28-Feb-22, i.e. the end date 14 months after the end of December 2020 • MOD(Number, Divisor) returns the remainder when Number is divided by the Divisor. For example, =MOD(17, 6) is 5, since 17/6 = 2 remainder 5. With trial and error the number of months we need to add on can be calculated as follows: =MOD(Perodicity + Reporting_Month_Factor - MONTH(Model_Start_Date), Perodicity) and therefore if we call this equation our Additive_Factor, then the reporting end date will be: =EOMONTH(Model_Start_Date, Additive_Factor) In the example file, I have checked my workings, viz.
12
CHAPTER 1: Recap
Furthermore, a robust yet flexible time series can be constructed:
Even allowing for flexible start dates and “Reporting Month Factors”, the above will not work in all circumstances. Other periodicities may be sought, whilst some businesses require weekly reporting or 5-4-4-week period regimes. Nonetheless, the above approach can be modified and extrapolated to consider most complications.
13
CHAPTER 1.3: FINANCIAL STATEMENT THEORY You can’t build a financial model without understanding financial statement theory. Yes, I realise many of you will work in finance, be accountants or experienced analysts, but I need to ensure we are all on the same page, understanding the layout and purpose of each financial statement. Like its predecessor, this book is not an accounting book. I am not going to talk about Accounting Standards (such as the International Financial Reporting Standards, IFRS, or the US Generally Accepted Accounting Principles, US GAAP), but rather just explain generic principles. I appreciate things work differently in different parts of the world so I intend to be as general as I can be. Nevertheless, most accounting regimens recognise the same three primary financial statements, so let’s start with reviewing all of them.
Income Statement Also known as the profit and loss account, revenue statement, statement of financial performance, earnings statement, operating statement or the statement of operations, this is the financial statement that shows the net operating profit of an entity for a given period of time. It works on an accruals basis, which whether you are an accountant or no, is probably how you think. Allow me to explain. The Income Statement is essentially the Net Operating Profit (accrued income less accrued expenditure) for a period of time after tax.
Income is recognised when products are delivered or services are provided, not when payment is received. Similarly, costs are attributed to the period they are incurred, not when they are necessarily paid. If our company sells one million widgets within the financial year at $1 each and incurs direct cost of 75c per widget, we would expect a gross profit of $250,000, viz.
14
CHAPTER 1: Recap
The cash position could be radically different. We may have had to pay all of the costs and not yet received any monies. However, this is not how we think. We all attribute on an accruals basis, i.e. what pertains to the period in question. And there’s more. If I asked, what would you model first, then second, then third, who here was thinking, revenue, costs of goods sold and then operating expenditure? Like it or not, we are walking talking income statements:
Do you note the order? Costs of Goods Sold (COGS) is the first expense after revenue and is defined as costs directly attributable to the sale. Direct costs may include raw material and some labour costs. Indirect costs, on the other hand, are legitimate costs of the business too, just not directly attributable to the sale. Typical examples include rent and utilities. Some could be argued either way (e.g. freight); the aim is to be consistent. Therefore, it makes sense that the direct costs are attributed first so that the gross profit (often referred to as contribution) can be assessed before the allocation of other costs, which in some instances may be rather arbitrary. 15
Clearly, Tax Expense is the final expense as it encapsulates all of the other incomes and expenses, but why does Depreciation Expense come before Interest Expense? This is perhaps not so clear cut and I offer two reasons, both of which can be argued with: 1. Funding: If you consider the income statement to contain the period’s “fair share” of income and expenditure, one significant cost is Capital Expenditure. This is essentially the purchase of significant business assets to generate future profits for more than one year. It would be wrong to apportion all of this expenditure to one period if the benefit will extend over several / many years and depreciation is the allocation of that cost to the period in question. If financing, you often want to compare the funding – and its associated costs – against this allocated amount. Therefore, Depreciation Expense should be stated ahead of Interest Expense in the Profit and Loss account. 2. Valuation: It is hard to believe that Discounted Cash Flow (DCF) valuations and Net Present Value (NPV) analysis are still fairly recent valuation tools. Prior to the popularity of these approaches, the main method of choice was based on earnings multiples. For example, if companies A and B were similar in operation and profit margins, but B were twice the size of A, shouldn’t B have a valuation approximately double A? It’s difficult to argue with this logic. Capital structures (i.e. the mix of debt to equity) are irrelevant as if a company is purchased, the chances are the capital structure will be revised by the purchaser. Since depreciation is a relevant expense in the earnings multiple calculation, it should appear ahead of the debt expense (i.e. interest) so that the earnings figure to be used is readily visible. Why have I made such a big deal of the order? Go and ask a non-accountant the order they would build a model in: the vast majority would build the income calculations first, the direct costs second and so on. The reason the Income Statement is always so popular is this is how people think. I am a great believer in “if it ain’t broke, don’t fix it” and this ideology very much applies here. We should seriously consider building a model in the P&L order as this is most intuitive to modellers and users alike.
16
CHAPTER 1: Recap
Balance Sheet The Balance Sheet used to be known as the Net Worth statement; while the Income Statement reports on financial performance, the Balance Sheet summarises the corresponding financial position. For a particular moment in time (the date stated) it displayed what a business was worth. It is a cumulative statement aggregating all of the factors that contribute to the value:
There are various ways of presenting the Balance Sheet. I am quite fond of this approach as it places Current Assets (items worth something to the business that will be held for less than or equal to one year) next to Current Liabilities (amounts owed that need to be paid within the next year). The ratio Current Assets divided by Current Liabilities assesses a business’s ability to pay its bills and Current Assets less Current Liabilities shows the working capital of a business on the face of a primary financial statement. The line items down to Net Assets are often colloquially known as the “top half” of the Balance Sheet (even if it is nearer 90% of all line items!) and the Total Equity section is known as the “bottom half”. What is good about this presentation is the top half summarises what is controlled by a business whereas the bottom half communicates ownership. This is why debt is not in the bottom half of the Balance Sheet. In the past, the Equity section was often headed “Financed by” and that always confused me as I did not understand why debt was not there. The reason it is not is because the shareholders of the business (the 17
Equity) control the debt repayments and servicing. Yes, banks and other financial institutions may put contracts in place, but that is so they may go to court to force companies to pay if payment is not made voluntarily. Consequently, debt is “controlled”, it usually is for a period of greater than one year and hence it is not in the bottom half but rather in the Long Term Liabilities section instead. Modellers tend to create Balance Sheets as a bit of an afterthought, but they are more important than that. Values may be stated in the dollars of the day when purchased (this is known as historical cost accounting), which makes any summary meaningless as you are comparing apples with pears, but it is still better than nothing. In fact, most modellers hate Balance Sheets. They never seem to balance, reconcile or be understood. But this can all be circumvented with control accounts (see below). Balance Sheets are essential. There is one other issue. Balance Sheets by their very nature are cumulative. They are stated at a point in time. They have to balance. So what if the Balance Sheet did not balance at the model start date? As a modeller, there is nothing you can do about this: this is an opening assumption (the Opening Balance Sheet). If this were to happen to you, reject the Opening Balance Sheet and wait until someone who knows what they are doing gives you a proper one. All a modeller may ever be held accountable for is that the change in Net Assets equals the change in Total Equity.
18
CHAPTER 1: Recap
Cash Flow Statement The Cash Flow Statement never used to be one of the so-called primary financial statements. Originally, it was the reconciliatory note that demonstrated how the cash stated on the Balance Sheet had been derived. Changes in Balance Sheet and Income Statement numbers are often known as a “revision of accounting policies” and as long as the auditors and shareholders agree the changes everyone nods sagely and does not even bat an eyelid. If you start messing around with the Cash Flow Statement that’s called fraud and you can go to jail. There are three sections to the Cash Flow Statement: • Operating • Investing • Financing. Can you think of an example of an Operating Cash Flow? Who said Revenue? That isn’t one. Cash Receipts is the cash flow equivalent of Revenue. Be careful: it’s making mistakes like this in modelling that cause Balance Sheets not to balance. Similarly, an example of Investing Activities might be Purchases of Non-Current Assets rather than “Capital Expenditure”. Debt drawdowns and repayments would be two examples of Financing Activities. Now, what about the other way around? Where does Interest Paid go? (You didn’t realise you were taking a multiple choice exam?) The “proper” answer is Operating Activities, although I imagine you might be thinking Financing Activities. Three points: 1. As a company expands, sometimes its working capital (essentially cash and cash equivalents readily available after setting aside current bills) is insufficient to facilitate the growth required. Business owners may put more cash into their business (equity) or alternatively take out financing (debt). Should they choose the latter option, the mandatory servicing of that debt is an operational cost of the business – so Interest Paid should go in Operating Activities. 2. If all you read above was “blah blah blah… Operating Activities”, I think you might not be alone. A simpler – although not quite correct – explanation is as follows. As detailed above, the Income Statement is essentially the Net Operating Profit after tax. Interest Expense is clearly an expense in the Profit and Loss Account. Therefore, it makes sense that the cash equivalent of Interest Expense – Interest Paid – is in the cash proxy for the P&L, namely the Cash Flows from Operating Activities section. 3. Some companies do indeed place Interest Paid in the Financing section of the Cash Flow Statement. This has been due to past practices (consistency) and case law precedent. So it can reside here, although it probably makes more sense to be in Operating Activities as explained above. Hopefully, that makes sense. So what about Interest Received? Where should that be placed? Operating? Investing? Financing? Well, as Meat Loaf said, two out of three ain’t bad: • Banks and other financial institutions may place Interest Received in Operating Activities. This is because Interest Received may be their main source of income, e.g. from mortgages and unsecured loans. • Other companies may place Interest Received in Investing Income. In this instance, interest has been earned and received from surplus cash on deposit.
19
Do you see in either scenario it would be incorrect to net off this amount with Interest Paid? The only time the two line items are in the same section (Operating Activities) is when Interest Received is essentially Cash Receipts and why would you want to combine this with debt servicing? This clearly highlights the accounting point of no net off. It is better if line items are shown gross so that end users may better understand their financials / forecasts. One last one: let’s consider Dividends Paid. This line item goes in Financing Activities. Some people do not understand why Interest Paid and Dividends Paid are (usually) placed in different sections of the Cash Flow Statement. Interest Paid is the mandatory servicing of debt; Dividends Paid is the voluntary servicing of equity. Hence it is a financing decision and therefore placed in the Financing Activities section. I have concentrated on these elements as this is where many modellers make mistakes. Hopefully, this discussion makes things clearer and will prevent you from falling for some of the same traps your peers have repeated time and time again. This all leads us nicely into the example Cash Flow Statement:
The above is an example of what is known as a direct Cash Flow Statement. This is one of two forms to the Cash Flow Statements, the other being indirect. In many accounting jurisdictions it is stipulated that one variant must be displayed in the financial statements and the other should be the reconciliatory note to said accounts. It usually does not matter which way round this is done, as long as it is consistent from one period to the next. Both variants affect the Net Operating Cash Flow section only of the Cash Flow Statement. They are defined as follows: • Direct: This can reconcile Operating Cash Flows back to a large proportion of the bank statements. It is a summary of Cash Receipts, Cash Paid, Interest Paid and Tax Paid. 20
CHAPTER 1: Recap
• Indirect: This starts with an element of the Income Statement and adds back non-cash items (deducting their cash equivalents) and adjusts for working capital movements. A typical indirect Cash Flow Statement may compare to the direct version as follows:
As explained above, the indirect version is calculated as follows: • Start with a line item from the Income Statement (here, Net Profit After Tax) • Add back non-cash items (Depreciation Expense, Interest Expense and Tax Expense) • Adjust for working capital movements (increases and decreases in Current Assets and Current Liabilities) • Deduct the cash equivalents of the non-cash items added back: o Instead of Interest Expense deduct Interest Paid o Instead of Tax Expense deduct Tax Paid o Instead of Depreciation Expense don’t do anything. So why is Depreciation Expense excluded altogether? Hands up those that said, “It’s not a cash item”. Well, Interest Expense and Tax Expense are not cash items either. That is an insufficient reason. What is the cash equivalent of Depreciation Expense? It’s the Purchase of Non-Current Assets – and that is found in Investing Activities. The reason Depreciation Expense is excluded is for two reasons: (1) yes, it is a non-cash item, but (2) it is a double count. So which one should you model? Most modellers prefer the indirect method as they will have already built the Income Statement and can simply make adjustments for the Cash Flow 21
Statement, but this is a fallacious approach. This is often what causes problems in Balance Sheet reconciliations. The direct method facilitates the incorporation of control accounts and control accounts are a financial modeller’s best friends. I will explain further shortly. Use the direct method and calculate the indirect variant later, if required. This statement is important, especially in times of cash flow problems, where modelling shorter periods may lead to better decision making. Model Cash Flow Statements on a direct basis and use the periodicity required by the business to make effective business decisions.
Linking Financial Statements You may have heard of the phrase “three-way integrated”. With regard to financial statements, “three-way” simply means incorporating all three financial statements and “integrated” means that if inputs in financial data were to change, the financials would update accordingly so that the Balance Sheet would still balance. This requires linking up the financial statements. So how many do you need? One? Eight? 50? The correct answer, believe it or not, is two:
As long as the Net Profit After Tax links into the Retained Earnings section of the Balance Sheet and the Net Increase / Decrease in Cash Held from the Cash Flow Statement links into the Current Assets section of the Balance Sheet, you have all of the links you require to put a financial model together.
22
CHAPTER 1: Recap
Appropriate Order of the Financial Statements Right, it’s time to invoke the Goldilocks Analogy. You are familiar with the tale of Baby Bear, Mummy Bear and Daddy Bear? Well, each of these is akin to one of our financial statements. • Income Statement: This statement is “Baby Bear”. I am neither talking about the magnitude of the numbers nor the number of line items within the financial statements. I am considering how small or large the Income Statement is conceptually compared with the other statements.
The Income Statement considers the Net Operating Profit after tax. The Cash Flow Statement considers Operating Cash Flows, but it also considers Investing and Financial ones too. The Balance Sheet incorporates the summary (NPAT) of the Income Statements so must also be at least as large. Therefore, the Income Statement is the baby of the bunch.
• Cash Flow Statement: At the risk of sounding sexist (before you all send me hate mail, I am talking about size only, not importance), this one is “Mummy Bear”. As discussed above, it considers more factors than the Income Statement (albeit from a different perspective), but since it is also summarised in the Balance Sheet (Cash), it is the ‘middle’ statement. • Balance Sheet: So by a process of elimination, the Balance Sheet is “Daddy Bear”. Not only does it summarise the other two financial statements, but it also details financials not captured elsewhere, e.g. movements between Non-Current and Current, and transfers in Reserves. Now while you are all wondering what my drug of choice is, I had better explain why this is important. Earlier, I stated that when we start to build a model, in general we start to work our way down the Income Statement. That made sense and is commensurate with the magnitude of the concept of the financial statement. It also suggests that the Cash Flow Statement should be built second, which again makes sense, given the Balance Sheet includes a summary of the other two statements. This gives us our conceptual order of constructing three-way integrated financial statement in a model:
23
To summarise, we should: • Develop the three financial statements, building up by line item and total • Link the Income Statement and Cash Flow Statement into the Balance Sheet • Add error checks to ensure no errors, that the Balance Sheet balances and is solvent (for example). Then, incorporate control accounts…
24
CHAPTER 1: Recap
CHAPTER 1.4: CONTROL ACCOUNTS Control accounts are easy to construct and even easier to understand. Consider the reconciliation of the line item Accounts Receivable (or Debtors):
Chances are you have probably seen something similar to this before, maybe from accounting / finance studies. This reconciliation is known as a control account: it is a reconciliation of a Balance Sheet item from one period to the next (“b/f” means brought forward or last period and “c/f” means carried forward or current period). Typically (although not always), the line items between the opening and closing balances come from the Income Statement and Cash Flow Statement. This is consistent with the idea that the Balance Sheet is stated at a point in time whereas the other two statements are for periods of time. In the example above, if the opening balance of Accounts Receivable is $120,000 and we make further sales in the period of $64,700, assuming there are no bad debts (more on that later) and the cash received is $82,750, then the closing balance for Accounts Receivable has to be $101,950. In other words, assuming the opening balance was $120,000, entering: • Sales of $64,700 in the Income Statement; • Cash Receipts of $82,750 (as a positive number) in the Cash Flow Statement; and • Closing Accounts Receivable of $101,950 in the Balance Sheet means that the three-way integrated financial statements must balance. The end. Modelling financial statements really is that simple. Control accounts tell you three key things: 1. Number of calculations that need to be entered into the financial statement so that they balance: This is always one less than the number of rows in the control account. The reason it is one less is because the opening balance is simply the closing balance calculated from the period before. 2. The order to build the calculations into the financial statements: This is always row 2 first, then row 3, then row 4 and so on. Think of it this way: assuming no opening balance (which there would not be in the beginning), if there were no sales, there could be no payments received. If there are no sales and no receipts, the difference between them (the amount owed, the Accounts Receivable) would also be zero. It is a logical order. 3. It identifies the key driver: Often you want to undertake sensitivity and scenario analysis in your models, but sometimes you may be unsure which variables should be included in the analysis. Line 2 of the control account is always the key driver. As above, if there were no sales, there could be no payments received. If there are no sales and no receipts, the difference between them (the amount owed, the Accounts Receivable) would also be zero. To make a point, I have repeated myself deliberately. To make a point, I have repeated myself deliberately. To make a point, I have repeated myself deliberately… 25
Therefore, in our example, we can conclude: • In order to make the Balance Sheet balance we need to construct three calculations which need to be incorporated into the financial statements: Sales, Cash Receipts and closing Accounts Receivable. • The order to calculate them should be Sales, Cash Receipts and finally closing Accounts Receivable. • The key driver of Accounts Receivable is Sales. The whole concept of double entry is you do one thing, then do another and voila! everything balances. Everything is performed in pairs. But I am telling you that you need to create three calculations. Does that go against everything you believe? Have I discussed debits and credits anywhere? (The answer is yes, in that last sentence.) Often in accounting, we talk about “reversing journals”: this is code for “given we are forced into a double entry system, this is incorporated as a fiddle factor to make it work”. In fact, in the example case study used in our first book, not one control account contained an even number of calculation entries. So much for double entry. We are now in a position to now formulate an action plan regarding the order to construct a financial model:
Building a Financial Model This approach remains moot on the order of calculation construction. This is how to build a hassle-free, three-way integrated financial model: 1. Create the forecast chart of accounts from either previous models, existing financials, ledgers, journals, trial balances, etc. 2. Add in the subtotals for each chart of account so that all totals flow their respective financial statements 3. Add error and other checks to these outputs (e.g. balance checks, cash in cash flow equals cash movement on Balance Sheet) as necessary, updating the Error Checks worksheet as necessary 4. Create the Opening Balance Sheet, ensuring it uses the same format as the forecast Balance Sheet 5. Ensure the Opening Balance Sheet balances, else reject 6. Add checks as necessary 7. Link the financial statements together, adding any checks as necessary 8. Zero the Opening Balance Sheet 9. Ensure all checks are “OK” 10. Begin with the Income Statement, take the first line item in this account (e.g. Revenue) 11. Create calculations if not already computed 12. Construct control account 26
CHAPTER 1: Recap
13. Add checks if necessary 14. Link control account to financial statements, ensuring checks are all OK (correct if necessary) 15. Move to the next line item in the financial statement not yet calculated 16. Return to Point 11 17. Once the Income Statement is completed, consider the first line item on the Cash Flow Statement not yet linked 18. Once the Cash Flow Statement is also completed, consider the first line item on the Balance Sheet not yet linked 19. Once the Balance Sheet has been completed, return to the Opening Balance Sheet and add back the original data 20. Correct any opening balance errors if necessary.
It seems a lot, but it really isn’t as bad as it sounds. If you want to learn more and see an example model build, check out An Introduction to Financial Modelling. In the meantime, let’s take things forward…
27
Chapter 2: What-If? Analysis This topic was touched upon in the last book, but was not fully fleshed out. Consequently, the most popular request has been to provide more details on how to prepare conditional formulae and model what-if? analysis. There are four common what-if? methods of analysis: 1. Scenario analysis 2. Sensitivity analysis 3. Simulations analysis 4. Breakeven analysis. Before I go through each of these I turn, I first need to explain conditional formulae and revisit several key ideas explored in the introductory volume, namely the OFFSET, INDEX and MATCH functions, more detail on scenario analysis and Data Tables. Are you sitting comfortably? Whether you are or not, let’s begin.
28
CHAPTER 2: What-If? Analysis
CHAPTER 2.1: CONDITIONAL FORMULAE In Excel, you frequently require a formula to calculate differently in different situations. This type of formula is known as a conditional formula. That is, the formula will only calculate in a given manner for a certain situation or set of scenarios. For example, imagine you walk into a shop to buy socks. Once you get over the shock you are reading a financial modelling book on socks, you set your mind on buying 20 pairs. The socks come in three colours: red, white, and blue. You buy the pairs randomly but ensure you have bought 20 pairs. Assuming your favourite colour is red, you want to know exactly how many pairs of red socks you bought – but only if you bought more red socks than the other colours, otherwise you will take some of the blue and / or white socks back and swap them for more red pairs. You could write a formula such as: =IF you buy more red socks than other colours THEN count the number of red pairs purchased ELSE return some blue / white pairs and swap them for red pairs. This IF-THEN-ELSE construct creates a conditional formula, that is a result for a particular situation or scenario, and this syntax forms an important platform in many Excel spreadsheets. In this section, we will be looking at how you can analyse data for entries that meet certain criteria.
29
Meet Your Basic IF Function So what’s the most Important Function in Excel? Did you realise that’s what IF is an abbreviation for? Not surprising as I just made it up. However, there is some truth in the jest. The syntax for IF demonstrates just how useful this function is for financial modelling: =IF(logical_test, [value_if_true], [value_if_false]) This function has three arguments: • logical_test: this is the “decider”, that is, a test that results in a value of either TRUE or FALSE. Strictly speaking, the logical_test tests whether something is TRUE; if not, it is FALSE • value_if_true: what to do if the logical_test is TRUE. Note that you do not put square brackets around this argument. This is just the Excel syntax for saying sometimes this argument is optional. If this argument is indeed omitted, this argument will have a default value of TRUE • value_if_false: what to do if the logical_test is FALSE (strictly speaking, not TRUE). If this argument is left blank, this argument will have a default value of FALSE. This function is actually more efficient than it may look at first glance. Whilst the logical_test is always evaluated, only one of the remaining two arguments is computed, depending upon whether the logical_test is TRUE or FALSE. Care should be taken with logical tests as this is the source of many, many errors in spreadsheets. Logical tests assess the criterion/criteria stipulated, no more no less. It assumes a binary universe: X and NOT(X). This isn’t always how our minds think, as I will explain with an exaggerated example. Intrepid explorer Ivor Challenge is lost in the jungle and needs to find shelter for the night as a rainstorm beckons. Immediately ahead is a clearing with two caves. He writes a formula to determine which cave to sleep in: =IF(Cave 1 has a bear, sleep in Cave 2, sleep in Cave 1). The logical_test is to check whether Cave 1 contains a bear. As it turns out, it doesn’t so he sleeps in there and is mauled to death by the lioness who was in there. Next day, his wife, Cher Challenge, goes searching for him, gets tired and comes across the same caves and uses the same formula to determine which cave to sleep in: =IF(Cave 1 has a bear, sleep in Cave 2, sleep in Cave 1). The logical_test is to check whether Cave 1 contains a bear. As it turns out, this time there is (together with some human bones) and so she sleeps in Cave 2 and is eaten by the other bear. When using IF formulae, you need to train yourself to think logically like a computer. Common sense does not apply. Consider the logic function NOT(expression), which is everything that is not equivalent to the expression. The opposite of a boy is “not a boy”: “girl” is incorrect. Take care with inequalities in particular. The opposite of x is greater than y is either x is less than or equal to y, or NOT(x is greater than y). This is a common error and it has caused embarrassing mistakes time and time again in business. 30
CHAPTER 2: What-If? Analysis
Returning to the IF function, let’s consider an example:
In this example, the intention is to evaluate the quotient Numerator / Denominator. However, if the Denominator is either blank or zero, this will result in an #DIV/0! error. Excel has several errors that it cannot evaluate, such as #REF!, #NULL, #N/A, #Brown, #Pipe. OK, so one or two of these I may have made up, but prima facie errors should be avoided in Excel as they detract from the key results and cause the user to doubt the overall model integrity. Worse, in some instances these errors may contribute to Excel crashing and/or corrupting. This is where IF comes in. In my example above, =IF(Denominator=0, , Numerator / Denominator) tests whether the Denominator is zero. This is the conditional formula. If so, the value is unspecified (blank) and will consequently return a value of zero in Excel; otherwise, the quotient is calculated as intended. This type of conditional formula is known as creating an error trap. Errors are “trapped” and the ‘harmless’ value of zero is returned instead. You could put “n.a” or “This is an error” as the value_if_true, but you get the picture. It is my preference not to put a zero in for the value_if_true: personally, I think a formula looks clearer this way, but inexperienced end users may not understand the formula and you should consider your audience when deciding to put what may appear to be an unnecessary zero in a formula. The aim is to keep it simple for the end user. An IF statement is often used to make a decision in the model: =IF(Decision_Criterion=TRUE, Do_it, Don’t_Do_It) This automates a model and aids management in decision making and what-if analysis. IF is clearly a very powerful tool when used correctly. IF should never be used to look up data: there are plenty of functions out there to help with that problem, but we will discuss these later.
31
Mixing Up Your IFS As a model developer and reviewer, I must confess I remain unconvinced about this one. If you have ever used a formula with nested IF statements beginning with =IF(IF(IF… then maybe this next function is for you – however, if you have ever written Excel formulae like this, then maybe Excel isn’t for you! There are usually better ways of writing the formula using other functions. Most variants of Excel have the relatively new function IFS. The syntax for IFS is as follows: IFS(logical_test1, value_if_true1, [logical_test2, value_if_true2], [logical_test3, value_if_true3], …) where: • logical_test1 is a logical condition that evaluates to TRUE or FALSE • value_if_true1 is the result to be returned if logical_test1 evaluates to TRUE. This may be empty • logical_test2 (and onwards) are further conditions that evaluate to TRUE or FALSE also • value_if_true2 (and onwards) are the respective results to be returned if the corresponding logical_test evaluates to TRUE. Any or all may be empty. Since functions are limited to 254 arguments (sometimes known as parameters), the new IFS function can contain 127 pairs of conditions and results. One thing to note is that IFS is not quite the same as IF. With the IF statement, the third argument corresponds to what do if the logical_test is not TRUE (that is, it is an ELSE condition). IFS does not have an inherent ELSE condition, but it can be easily generated. All you have to do is make the final logical_test equal to a condition which is always true such as TRUE or 1=1 (say). Other issues to consider: • Whilst the value_if_true may be empty, it must not be omitted. Having an odd number of arguments in an IFS statement would give rise to the “You’ve entered too few arguments for this function” error message • If a logical_test is not actually a logical test (for example, it evaluates to something other than TRUE or FALSE), the function returns an #VALUE! error. Numbers still appear to work though: any number other than zero evaluates as TRUE and zero is considered to be FALSE • If no TRUE conditions are found, this function returns the #N/A error. To show how it works, consider the following example.
32
CHAPTER 2: What-If? Analysis
Here, would-be gurus are graded based on evaluation criteria in the table, applied in a particular order: =IFS(H13="Yes", I13, H14="Yes", I14,H15="Yes", I15,H16="Yes", I16, TRUE, "Not a Guru") I think it’s safe that although it is reasonably straightforward to follow, it is entirely reasonable to say it’s not the prettiest, most elegant formula ever put to Excel paper. In particular, do pay heed to the final logical_test: TRUE. This ensures we have an ELSE condition as discussed above. To be fair, one similar solution using legacy Excel functions isn’t any better: =IF(H13="Yes", I13, IF(H14="Yes", I14, IF(H15="Yes", I15, IF(H16="Yes", I16, "Not a Guru")))) You may note I am not supplying multiple examples of IFS formulae. This is because wherever possible you should try and replace the logic with a simpler, more accessible, logic for end users. For instance, sometimes the logic of an elongated IF or IFS formula may be condensed to =IF(Multiple Conditions = TRUE, Do Something, Do Something Else) In this situation, there is a function in Excel that can help. My old English teacher said you should never start or finish a sentence with the word “and”. AND is one of several Excel logic functions (others include NOT [already mentioned earlier, which takes the logical opposite of an expression] and OR). It returns TRUE if all of its arguments evaluate to TRUE; it returns FALSE if one or more arguments evaluate to FALSE. One common use for the AND function is to expand the usefulness of other functions that perform logical tests. For example, the IF function performs a logical test and then returns one value if the test evaluates to TRUE and another value if the test evaluates to FALSE. By using the AND function as the logical_test argument of the IF function, you can test many different conditions instead of just one. For example, imagine you are in New York on a Monday. Consider the expression =AND(condition1, condition2, condition3)
33
where: • condition1 is the condition, “today is Monday” • condition2 is the condition, “you are in New York” and • condition3 is the condition, “this author is the best looking guy you have ever seen”. This would clearly be FALSE as not everywhere in the world it would be Monday (that is, condition1 would be breached)… As alluded to above, the syntax for AND is as follows: AND(logical1, [logical2], …) where: • logical1: the first condition that you want to test that can evaluate to either TRUE or FALSE • logical2: additional conditions that you want to test that can evaluate to either TRUE or FALSE, up to a maximum of 255 conditions. logical2 is optional and is not needed in the syntax. It should be noted that: • The arguments must evaluate to logical values, such as TRUE or FALSE, or the arguments must be arrays or references that contain logical values • If an array or reference argument contains text or empty cells, those values are ignored • If the specified range contains no logical values, the AND function returns the #VALUE! error value. To highlight how AND works:
34
CHAPTER 2: What-If? Analysis
For a more practical example, consider the following summary data table:
Here, we have a list of staff in column A, with identification of those who work in Sales (that is, eligible for a bonus) in column B. Details of the sales made, the threshold for getting a bonus and what rate it is paid, are detailed in columns C, D and E respectively. The formula in cell F2: =IF(AND(B2="yes", C2-D2>=0), C2*E2,) denotes the Bonus Paid and is conditional on them working in Sales (B2=”yes”) and that the sales made were at or above the required threshold (C2 - D2>=0). If both conditions are TRUE, then a bonus (C2 * E2) is paid accordingly (putting nothing after the final comma is the equivalent of saying “else zero”). This is a prime example of IF and AND working together – and more often than not, these formulae are much easier to read than their IF(IF(IF… or IFS counterparts. The other logic function not yet mentioned, OR, is similar to AND, but only requires one condition to be TRUE. Similar to AND, the OR function may be used to expand the usefulness of other functions that perform logical tests. For example, the IF function performs a logical test and then returns one value if the test evaluates to TRUE and another value if the test evaluates to FALSE. By using the OR function as the logical_test argument of the IF function, you can test many different conditions instead of just one. For example, imagine you are in London on a Tuesday. Consider the expression =OR(condition1, condition2, condition3) where: • condition1 is the condition, “today is Tuesday” • condition2 is the condition, “you are in London” and • condition3 is the condition, “the Earth is flat”. This would clearly be TRUE as you are definitely in London (that is, condition2 holds). The syntax for OR is as follows: OR(logical1, [logical2], …) 35
where: • logical1: the first condition that you want to test that can evaluate to either TRUE or FALSE • logical2: additional conditions that you want to test that can evaluate to either TRUE or FALSE, up to a maximum of 255 conditions. logical2 is optional and is not needed in the syntax. It should be noted that: • The arguments must evaluate to logical values, such as TRUE or FALSE, or the arguments must be arrays or references that contain logical values • If an array or reference argument contains text or empty cells, those values are ignored • If the specified range contains no logical values, the OR function returns the #VALUE! error value. In summary, OR works as follows:
For a more practical example, consider the following summary data table:
36
CHAPTER 2: What-If? Analysis
Now there is a complex formula: =IF(OR(AND(B2="yes", C2 - D2>=0), AND(B2"yes", C2 - $C$13>=0)), C2 * IF(B2="yes", E2, $C$15),) It isn’t quite as bad as it first seems. This is based on the AND case study from earlier, but it also allows for Non-Sales staff to participate in the bonus scheme too. The logical_test in the primary IF statement, OR(AND(B2="yes", C2 - D2>=0), AND(B2"yes", C2 - $C$13>=0)) Is essentially OR(condition1, condition2). The first condition is as before for Sales staff, whereas the second, AND(B2"yes", C2 - $C$13>=0) checks whether Non-Sales staff have exceeded the Non-Sales Staff threshold (cell C13). Do you see that the check for Non-Sales staff is given by B2”yes” (B2 is not equal to ”yes”) rather than B2=”no”. This takes me back to my earlier point about ensuring you develop your logical_test correctly. It’s a subtle point, but will ensure all staff are considered (rather than excluding staff where no entry has been made in column B). The other IF statement, IF(B2="yes", E2, $C$15) simply ensures the correct bonus rate is applied to the sales figure. To summarise so far, sometimes your logical_test might consist of multiple criteria: =IF(condition1=TRUE, IF(condition2=TRUE, IF(condition3=TRUE, formula,),),) Here, this formula only gives a value of 1 if all three conditions are true. This nested IF statement may be avoided using the logical function AND(Condition1, Condition2,…) which is only TRUE if and only if all dependent arguments are TRUE, =IF(AND(condition1, condition2, condition3), formula,) which is actually easier to read. A similar example may be constructed for OR also. However, even using these logic functions, formulae may become – or simply look – complex quite quickly. There is an alternative: flags. In its most common form, flags are evaluated as =(condition=TRUE) * 1 condition=TRUE will give rise to a value of either TRUE or FALSE. The brackets will ensure this is condition is evaluated first; multiplying by 1 will provide an end result of zero (if FALSE, as FALSE * 1 = 0) or one (if TRUE, TRUE * 1 = 1). I know some modellers prefer TRUEs and FALSEs everywhere, but I think 1’s and 0’s are easier to read (when there are lots of them) and more importantly, easier to sum when you need to know how many issues there are. Flags make it easier to follow the tested conditions. Consider the following:
37
In this illustration, you might not understand what the MOD function does, but hopefully, you can follow each of the flags in rows 4 to 7 without being an Excel guru. Row 9, the product, simply multiplies all of the flags together (using the PRODUCT function allows you to add additional conditions / rows easily). This effectively produces a sophisticated AND flag, where all of the formulae are mercifully short. As a brief aside, some readers ask me why I use PRODUCT rather than MIN. That’s a good question, especially given I use MAX (below). I confess it is partly a preference and partly the fact that if you are modelling optimisation problems, MIN can give rise to non-smooth outputs (not a good thing) where PRODUCT does not. If I wanted the flag to be a 1 as long as one of the above conditions is TRUE (that is, I wish to construct an OR equivalent), that is easy too:
Flags frequently make models more transparent and this example provides a great learning point. Often, we mistakenly believe that condensing a model into fewer cells makes it more efficient and easier follow. On the contrary, it is usually better to step out a calculation. If it can be followed on a piece of paper (without access to the formula bar), then more people will follow it. If more can follow the model logic, errors will be more easily spotted. When this occurs, a model becomes trusted and therefore is of more value in decision-making. Be careful though. Sometimes you just can’t use flags. Let me go back to my first example in this chapter – but this time using the flag approach:
38
CHAPTER 2: What-If? Analysis
Here, the flag does not trap the division by zero error. This is because this formula evaluates to =#DIV/0! * 0 which equals #DIV/0! If you need to trap an error, you must use an IF function.
39
Meet the IF Siblings If you are unfamiliar with the following function, you can still probably guess what SUMIF does: it combines SUM with IF to provide conditional summing, that is, where you wish to add numerical values provided they meet a certain criterion. For example, imagine you were reviewing the following data summary:
The function SUMIF(range, criterion, [sum_range]) is ideal for summing data based on one requirement: • range is the array that you wanted evaluated by the criterion (in this instance, cells F12:F21) • criterion is the criterion in the form of a number, expression, or text that defines which cell(s) will be added, such as “X”, 1, G26 or “”&G27 (this last one means “not equal to the value in cell G27”) • sum_range are the actual cells to be added if their corresponding cells in range match the criterion. This is an optional argument; if it is not specified, the range is used instead. So, to find the sales for Business Unit 1 in the above example, you can use the formula =SUMIF(F12:F21, 1, H12:H21) (which is $1,000), or to find the total sales of Product X, the formula could be modified to =SUMIF(G12:G21, “X”, H12:H21) (which is $1,200) (note that any text must be in inverted commas). SUMIF is fine when there is only one condition. However, how would you find the total sales of Product Z in Business Unit 1 using this function? That’s two criteria and SUMIF does not work with multiple conditions. There are various alternatives using other functions, but it is possible to solve this problem simply using SUMIF. It is often possible to cheat with SUMIF by making a ‘mega-criterion’ out of multiple criteria. This works on joining criteria together usually by using the ampersand (‘&’) operator.
40
CHAPTER 2: What-If? Analysis
Let’s consider our example, slightly revised, from above.
A new column has been inserted (column H), with a formula combining the contents of columns F and G (for example, the formula in cell H12 is =F12&G12). Provided that all possible combinations are unique (that is, no duplicates can be generated), a simple SUMIF can then be applied: =SUMIF(H12:H21, G26&G27, I12:I21) This is by far and away the simplest solution – if it works. It can fall down though (in another example, the concatenation ”111” might refer to Product 1 in Business Unit 11 or Product 11 in Business Unit 1). In this instance, I suggest you should use a delimiter to make the distinction clear, such as G26&” – “&G27, so that you would get 1 – 11 and 11 – 1 for the scenarios described. There are other ways to solve this, but more on that later. There is another “sibling”: COUNTIF. This function counts the number of cells that meet a particular criterion; for example, to count the number of times a particular city appears in a customer contact list. The COUNTIF function employs the following syntax to operate: COUNTIF(range, criterion) The COUNTIF function has the following arguments: • range: this is required and represents the range from which you want to count the cells meeting the specified criterion • criterion: the condition to be met.
41
It should be further noted that: • COUNTIF ignores upper and lower case in text strings. The criterion is not case sensitive, so “red” and “RED” will match the same cells • Wildcard characters are permitted. The characters, question mark (?) and asterisk (*) can be used in criterion. The question mark matches any single character, whereas an asterisk matches any sequence of characters. If you actually want to find a question mark or asterisk use the tilde (~) in front of the required character • COUNTIF is pedantically precise. Therefore, make sure your data does not contain erroneous characters. In particular, when counting text values, make sure that text doesn’t contain leading, excess or trailing spaces, inconsistent use of straight / curly quotation marks or non-printing characters. The CLEAN (function that removes all non-printable characters) and TRIM (a useful text manipulating function that removes excess spaces) functions can eradicate most of these issues • Range names may be used with COUNTIF if required • range can be a range in the same worksheet, a different worksheet or even a range in another workbook. However, if you refer to a second or subsequent workbook, these workbooks must be open for COUNTIF to work as intended, otherwise #VALUE! will be returned • The wrong value may be returned for long strings. The COUNTIF function returns incorrect results when you try to use it to match strings longer than 255 characters. However, there is a workaround available. You may use the CONCATENATE function or the concatenate (&) operator, for example, =COUNTIF(A1:A7,”long string”&”another long string”) • If no value is returned where one is expected, check to see whether the criterion argument should be in quotation marks, for example “>=14”. To illustrate:
42
CHAPTER 2: What-If? Analysis
It should be further noted that the COUNTIF function will not count cells based on cell background or font colour. However, it is useful for counting weighted averages:
Some modellers would write this formula all in one cell, but it is easier to follow when stepped out as above. The SUMIF formula in cell G28 calculates the total sales for Product Type ‘X’ and the COUNTIF formula in cell G35 calculates the number of occurrences of ‘X’. Calculating the average then becomes trivial: =IF(G350, G28/G35,) where an IF statement is used to avoid an #DIV/0! error from potentially occurring. For the record, there is actually an AVERAGEIF function, which employs the following syntax to operate: AVERAGEIF(range, criterion, [average_range]) The AVERAGEIF function has the following arguments: • range: this is required. One or more cells to use for the criterion, including numbers, names, arrays or references that contain numbers • criterion: this is also required. The criterion in the form of a number, expression, cell reference, or text that defines which cells are averaged. For example, criteria can be expressed as 32, “32”, “>32”, “apples”, B4 or “>=”&B4 43
• average_range: Optional. The actual set of cells to average. If omitted, range is used. Be careful though: if an average_range is specified that is not of the same dimensions as range, the average_range will be modified by using the top, left cell specified in the range but then use the dimensions that correspond in size and shape to range, such as
It should also be noted that: • Cells in range that contain TRUE or FALSE are ignored • If a cell in average_range is an empty cell, AVERAGEIF ignores it • If range is a blank or text value, AVERAGEIF returns the #DIV0! error value • If a cell in criteria is empty, AVERAGEIF treats it as a zero value • If no cells in the range meet the criteria, AVERAGEIF returns the #DIV/0! error value • You can use the wildcard characters, question mark (?) and asterisk (*), in criteria. A question mark matches any single character; an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character. Here’s an illustration:
44
CHAPTER 2: What-If? Analysis
The IFS Family We have already looked at how to perform conditional formulae with a single criterion (and how to circumvent this limitation with SUMIF and concatenation). Further, we have also considered IFS, AND, OR and flags. The next logical progression is the “IFS” family of functions, starting with SUMIFS. SUMIFS is similar to the SUMIF function, just with the ability to allow for more conditions (criteria), although the syntax might not look similar when you first inspect it: =SUMIFS(sum_range, criterion_range1, criterion1, [criterion_range2, criterion2], …) If you think about it, the syntax is consistent with SUMIF. This function allows various ranges (criterion_range1, criterion_range2, …) to be assessed against multiple criteria (criterion1, criterion2, …). The key difference is that the range to be conditionally summed, sum_range, is the first argument of the function rather than the last. This is so there is never any confusion regarding what is to be totalled – it’s always the first argument of the function.
Unlike the solution proffered in the SUMIF section, the helper column (column H) is no longer required. There is a potential issue though – and this affects SUMIF as well. In fact, it’s an issue that harks back to the SUM function. SUM, SUMIF and SUMIFS have an Achilles’ Heel: numbers that look like numbers but are considered text (a common problem with data imported from management information systems) are treated as zero. This can lead to the dreaded right formula, wrong result.
45
Let me explain, starting with the SUM function. Consider the following example:
In this example, I have totalled the values in cells E3:E7 in two distinct ways: the first uses the aforementioned SUM function, the other has added each cell individually using the ‘+’ operator. Are you thinking you’d be mad to use the alternative (second) approach – especially if there were many more rows? Well, take another look:
In this example, cell E5 has been modified. It has been stored as text, even though it looks like the number 3. SUM treats this as having zero value whereas the more convoluted addition carries on regardless. Simplest may not always be bestest. In an example like the one above, this may be easy to spot, but would you stake your life that the sum
46
CHAPTER 2: What-If? Analysis
is correct? There is a simple way to check using the COUNT function. COUNT counts the number of numbers in a range, so we can use it to spot numbers that aren’t numbers:
Here, the formula in column I highlights when a number is not a number. Note how it reports by exception: if the cell in question contains a number then COUNT(Cell_Reference) equals 1 and =1-COUNT(Cell_Reference) equals zero. Only non-numbers will be highlighted – it’s better to know I have two errors rather than 14,367 values working correctly. Let me now return to SUMIFS with this all borne in mind:
47
There is only difference between this example and the earlier illustration: cell I15 has now been entered as text. Therefore, the $400 is not recognised as a value and the summation has been reduced by this amount accordingly. Care needs to be taken with these functions if conditional summations are to be relied upon in a financial model (the same may be said for PivotTables too). There is an alternative – but more on that later in the chapter. As SUMIF has its analogous SUMIFS, so does the conditional count function, COUNTIF. The multiple criteria variant, COUNTIFS function applies one or more criteria to cells across multiple ranges and counts the number of times all criteria are met. This is essentially the “multiple” version of COUNTIF. The COUNTIFS function employs the following syntax to operate: COUNTIFS(criterion_range1, criterion1, [criterion_range2, criterion2]…) and has the following arguments: • criterion_range1: this is required and represents the first range in which to evaluate the associated criteria • criterion1: this is also required. The criteria must be in the form of a number, expression, cell reference or text that define which cells will be counted. For example, criteria can be expressed as 32, “>32”, B4, “apples” or “32” • criterion_range2, criterion2, ...: these arguments are optional but must appear in associated pairs. Up to 127 range / criteria pairs are allowed.
48
CHAPTER 2: What-If? Analysis
It should be further noted that: • COUNTIFS ignores upper and lower case in text strings. Criteria are not case sensitive, so “red” and “RED” will match the same cells • Each range’s criteria are applied one cell at a time. If all of the first cells meet their associated criteria, the count increases by 1. If all of the second cells meet their associated criteria, the count increases by 1 again, and so on until all of the cells are evaluated • If the criterion argument is a reference to an empty cell, the COUNTIFS function treats the empty cell as a zero value • Wildcard characters are permitted. The characters, question mark (?) and asterisk (*) can be used in criterion. The question mark matches any single character, whereas an asterisk matches any sequence of characters. If you actually want to find a question mark or asterisk use the tilde (~) in front of the required character. Here are similar examples to the COUNTIF function from earlier:
49
As before, a weighted average may be constructed on multiple criteria:
Again, this could be constructed using the AVERAGEIFS function instead:
50
CHAPTER 2: What-If? Analysis
The AVERAGEIFS function employs the following syntax to operate: AVERAGEIFS(average_range, criterion_range1, criterion1, [criterion_range2, criterion2], ...) and has the following arguments: • average_range: this is required. One or more cells to average, including numbers, names, arrays or references that contain numbers • criterion_range1: this is required, consisting of one or more cells to use for the criterion1 and can include numbers or names, arrays or references that contain numbers • criterion_range2,…: up to a further 126 ranges (with associated criteria) are optional • criterion1: this is also required. The criterion in the form of a number, expression, cell reference or text that defines which cells are averaged. For example, criteria can be expressed as 32, “32”, “>32”, “apples”, B4 or “>=”&B4. • criterion2, ...: this is required if the corresponding optional criterion_range is invoked. Syntax should be similar to that used for criterion1. It should be further noted that: • If average_range is a blank or text value, AVERAGEIFS returns the #DIV0! error value • If a cell in a criteria range is empty, AVERAGEIFS treats it as a zero value • Cells in range that contain TRUE evaluate as 1; cells in range that contain FALSE evaluate as zero • Each cell in average_range is used in the average calculation only if all of the corresponding criteria specified are TRUE for that cell • Unlike the range and criterion arguments in the AVERAGEIF function, in AVERAGEIFS each criterion_range must be the same size and shape as average_range • If cells in average_range cannot be translated into numbers, AVERAGEIFS returns the #DIV0! error value • If there are no cells that meet all the criteria, AVERAGEIFS returns the #DIV/0! error value • You can use the wildcard characters, question mark (?) and asterisk (*), in criteria. A question mark matches any single character; an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character.
51
It is fairly straightforward to use:
There are two other functions that belong in this IFS family. However, unlike their namesakes, MAXIFS and MINIFS have no “singular” alternative. That doesn’t mean you couldn’t just use a single criterion: it’s just being newer functions, these two do not require counterparts (which you could argue are more accidents of history). The MAXIFS function has the following syntax: MAXIFS(max_range, criterion_range1, criterion1, [criterion_range2, criterion2], ...) This function returns the maximum value among cells specified by a given set of conditions or criteria, where: • max_range is the actual range of cells in which the maximum is to be determined • criterion_range1 is the set of cells to evaluate with the criterion specified • criterion1 is the criterion in the form of a number, expression or text that defines which cells will be evaluated as a maximum • criterion_range2 (onwards) and criterion2 (onwards) are the additional ranges and their associated criteria. 126 range / criterion pairs may be specified. All ranges must have the same dimensions otherwise the function returns an #VALUE! error. MINIFS behaves similarly but returns the minimum rather than the maximum value among cells specified by a given set of conditions or criteria.
52
CHAPTER 2: What-If? Analysis
This example is preferable to its standard Excel counterpart: {=MAX(IF(G13:G31=H34, IF(H13:H31=H35, IF(I13:I31=H36, J13:J31))))} This formula is known as an array formula (you do not type in the ‘{‘ and ‘}’ braces, but enter the formula using the keystroke CTRL + SHIFT + ENTER). These can be cumbersome and typically, are not readily understood. The point here is that they are no longer needed as you may use MAXIFS and MINIFS instead.
53
Working with Multiple Criteria: Using SUMPRODUCT as an Alternative I must admit this is one of my favourite functions in Excel – so much so our company was named after it (time for a shameless plug)! At first glance, the basic form SUMPRODUCT(array1, array1,...) appears quite humble. Before showing an example, though, let’s look at the syntax carefully: • We’ll discuss them in more detail, but a vector for Excel purposes is a collection of cells either one column wide or one row deep. For example, A1:A5 is a column vector, A1:E1 is a row vector, cell A1 is a unit vector and the range A1:E5 is not a vector (it is actually an array, but, again, more on that later). The ranges must be contiguous; and • This basic functionality uses the comma delimiter (,) to separate the arguments (vectors). Unlike most Excel functions, it is possible to use other delimiters, but this will be revisited shortly. To show how SUMPRODUCT works, imagine you worked in retail and the electronic register had developed a fault. Consequently, all sales had to be kept in a tally chart, that is, the pricing points were listed in the first column of a spreadsheet and the sales were then noted in the second column. At the end of the day, you would have to calculate the total revenue and reconcile it with the payments received:
The sales in column H are simply the product of columns F and G, for example, the formula in cell H12 is simply =F12 * G12.Then, to calculate the entire amount cell H19 sums column H. This could all be performed much quicker using the following formula: =SUMPRODUCT(F12:F17, G12:G17)
54
CHAPTER 2: What-If? Analysis
That is, SUMPRODUCT does exactly what it says on the tin: it sums the individual products.
I mentioned the comma delimiter earlier. You can multiply the vectors together instead: =SUMPRODUCT(F12:F17 * G12:G17) This will produce the same result. However, there is an important difference. If you think back to our earlier example on SUMIFS, I demonstrated the issue with values that were not numerical in type. SUMPRODUCT can bypass that particular problem:
55
SUMPRODUCT will work with numbers that aren’t really numbers. However, if you look at the formula in the example, you can be forgiven for not understanding the formula. Let me explain. Where SUMPRODUCT comes into its own is when dealing with multiple criteria. This is done by considering the properties of TRUE and FALSE in Excel, namely: • TRUE*number = number (for example, TRUE*7 = 7); and • FALSE*number = 0 (for example, FALSE*7=0). Consider the following example:
we can test columns F and G to check whether they equal our required values. SUMPRODUCT could be used as follows to sum only sales made by Business Unit 1 for Product Z: =SUMPRODUCT((F12:F21=1) * (G12:G21="Z") * H12:H21). For the purposes of this calculation, (F12:F21=1) replaces the contents of cells F12:F21 with either TRUE or FALSE depending on whether the value contained in each cell equals 1 or not. The brackets are required to force Excel to compute this first before cross-multiplying. Similarly, (G12:G21=”Z”) replaces the contents of cells G12:G21 with either TRUE or FALSE depending on whether the value “Z” is contained in each cell. Therefore, the only time cells H12:H21 will be summed is when the corresponding cell in the arrays F12:F21 and G12:G21 are both TRUE, then you will get TRUE*TRUE*number, which equals the said number. Note also that this uses the * delimiter rather than the comma, analogous to TRUE*number. If you were to use the comma delimiter instead, the syntax would have to be modified thus: =SUMPRODUCT(--(F12:F21=1), --(G12:G21="Z"), H12:H21) Minus minus? The first negation in front of the brackets converts the array of TRUEs and FALSEs to numbers, albeit substituting -1 for TRUE and 0 for FALSE. The second minus sign negates these numbers so that TRUE is effectively 1, rather than -1, whilst FALSE remains equals to zero. This variant often confuses end users which is why I recommend the first version described above. 56
CHAPTER 2: What-If? Analysis
You can get more sophisticated:
In this scenario, the end user pays invoices only where the invoice number matches the number “checked” on an authorised list. In the illustration above, two invoices (highlighted in red) do not match. SUMPRODUCT can be used to sum the authorised amounts only as follows: =SUMPRODUCT((F12:F21=G12:G21) * H12:H21) The argument in brackets only gives a value of TRUE for each row when the values in columns F and G are identical. SUMPRODUCT and SUMIFS truly part company in the following comprehensive example. Consider the following:
57
So far, I have only considered SUMPRODUCT with vector ranges. Using the multiplication delimiter (*), it is possible to use SUMPRODUCT with arrays (an array is a range of cells consisting of both more than one row and more than one column). In the above example, SUMPRODUCT has been used in its elementary form in cells J34:O34. For example, the formula in cell J34 is: =SUMPRODUCT($I$30:$I$33 * J$30:J$33) and this has then been copied across to the rest of the cells. The comma delimiter could have been used here too. To calculate the total costs of this retail bank example, this could be calculated as: =SUMPRODUCT(J22:O22, J34:O34) However, the formula in cell I41 appears more – and unnecessarily – complicated: =SUMPRODUCT($I$30:$I$33 * $J$30:$O$33 * $J$22:$O$22) The use of the multiplication delimiter in this case is deliberate (the formula will not work if the delimiters were to become commas instead).It should be noted that this last formula is essentially =SUMPRODUCT(Column_Vector * Array * Row_Vector) where the number of rows in the Column_Vector must equal the number of rows in the Array, and also the number of columns in the Array must equal the number of columns in the Row_Vector.
58
CHAPTER 2: What-If? Analysis
The reason for this extended version of the formula is in order to divide the costs between Budget and Standard costs in my example. For example, the formula in cell J41 becomes: =SUMPRODUCT($I$30:$I$33 * $J$30:$O$33 * $J$22:$O$22 * ($H$30:$H$33=J$40)) that is, the formula is now of the form =SUMPRODUCT(Column_Vector * Array * Row_Vector * Condition) where Condition uses similar logic to the TRUE/FALSE examples detailed earlier. This is a powerful concept that can be used to replace PivotTables for instance. You can add more than one Condition, but for the formula to work without issue, include them all multiplicatively at the end of the formula. There are valid / more efficient alternatives to SUMPRODUCT in some instances. For example, dealing with multiple criteria for vector ranges, the SUMIFS function is up to six times faster, but will only work with Excel 2007 and later versions. Further, it cannot work with arrays where the dimensions differ such as in the example above. Over-use of SUMPRODUCT can slow the calculation time down of even the smallest of Excel files, but it is a good all-rounder. Used sparingly it can be a highly versatile addition to the modeller’s repertoire. It is a sophisticated function, but once you understand how it works, you can start to use SUMPRODUCT for a whole array of problems (pun intended!). Here is an extension of this last example. Let’s assume some of the data has prima facie errors:
Some of our inputs have become garbled. How can we calculate as before but ignore the errors in the data? SUMIF and SUMIFS will ignore errors, but unfortunately, SUMPRODUCT does not. Without using VBA, we wish to develop a single formula that can be copied across, populate the orange cells in the Summary Report (cells J39:K39) so that it calculates the total cost of Budget and Standard products, ignoring the errors and text in the tables above. Note that this should be flexible so that if any other types of prima facie errors were to occur (such as #VALUE!, #DIV/0!) the formula should ignore any of these other error types as well. 59
A very “basic” solution would be the following wonder: =($I$15*IFERROR(N($J$15),)*$J$22*($H$15=J$38))+($I$15*IFERROR(N($K$15),) *$K$22*($H$15=J$38))+($I$15*IFERROR(N($L$15),)*$L$22*($H$15=J$38)) +($I$15*IFERROR(N($M$15),)*$M$22*($H$15=J$38))+($I$15*IFERROR(N($N$15),) *$N$22*($H$15=J$38))+($I$15*IFERROR(N($O$15),)*$O$22*($H$15=J$38)) +($I$16*IFERROR(N($J$16),)*$J$22*($H$16=J$38))+($I$16*IFERROR(N($K$16),) *$K$22*($H$16=J$38))+($I$16*IFERROR(N($L$16),)*$L$22*($H$16=J$38)) +($I$16*IFERROR(N($M$16),)*$M$22*($H$16=J$38))+($I$16*IFERROR(N($N$16),) *$N$22*($H$16=J$38))+($I$16*IFERROR(N($O$16),)*$O$22*($H$16=J$38)) +($I$17*IFERROR(N($J$17),)*$J$22*($H$17=J$38))+($I$17*IFERROR(N($K$17),) *$K$22*($H$17=J$38))+($I$17*IFERROR(N($L$17),)*$L$22*($H$17=J$38)) +($I$17*IFERROR(N($M$17),)*$M$22*($H$17=J$38))+($I$17*IFERROR(N($N$17),) *$N$22*($H$17=J$38))+($I$17*IFERROR(N($O$17),)*$O$22*($H$17=J$38)) +($I$18*IFERROR(N($J$18),)*$J$22*($H$18=J$38))+($I$18*IFERROR(N($K$18),) *$K$22*($H$18=J$38))+($I$18*IFERROR(N($L$18),)*$L$22*($H$18=J$38)) +($I$18*IFERROR(N($M$18),)*$M$22*($H$18=J$38))+($I$18*IFERROR(N($N$18),) *$N$22*($H$18=J$38))+($I$18*IFERROR(N($O$18),)*$O$22*($H$18=J$38)) Lovely. It works, but it’s far from pretty. Sounds like me. I will spare you and not explain this monster of a formula, other than to say that the N function returns the value of any contents that appear to be numbers and treats text as zeros and errors as errors. It’s a “brute force and ignorance” approach but it does appear to get the job done. The problem is, what if (a) I add more rows and / or columns and / or conditions and (b) I don’t have a spare six hours to understand how the formula works? It’s very simple to miss a cross-multiplication too. No, I suggest the following – much shorter – solution instead: {=SUMPRODUCT($I$30:$I$33 * IF(NOT(ISNUMBER($J$30:$O$33)), , $J$30:$O$33) * $J$22:$O$22 * ($H$30:$H$33=J$38))}
60
CHAPTER 2: What-If? Analysis
It’s a short, “simple” array formula (one that requires entering using CTRL + SHIFT + ENTER) – don’t try typing in the brace brackets ({ and }) yourself. SUMIF and SUMIFS are more forgiving of these errors. For example,
the “????” text (which could just as simply be an error) is ignored as it does not need to be summed. Here’s another example.
Here, the #DIV/0! error occurs in the data to be summed, but there is another way to circumvent this issue. Did you know that you can incorporate error results into your conditions? You could try the following for example: =SUMIFS(I12:I21, F12:F21, G26, G12:G21, G27, I12:I21, "#DIV/0!") 61
The last argument ensures that the #DIV/0! results will be ignored, regardless of whether they form part of your target data. You could repeat this with any other error results as well, though this may result in a very long formula if you don’t consider shortcuts! Returning to our problem, my question deliberately used multiple rows and columns (an “array” of data) because this useful function (or its related SUMIF function) will not work in this instance. Remembering that the formula before was essentially =SUMPRODUCT(Column_Vector * Array * Row_Vector * Condition) you can see that all data is considered (SUMIF and SUMIF exclude the erroneous data).This is why there is a problem. The key trick concerns Array: we must check whether the relevant element of the Array contains a number or not. To be clear, blank cells, text or errors could all create issues in my cross multiplication, so we really want to identify when the cell value is not a number. This is what the formula IF(NOT(ISNUMBER($J$30:$O$33)), , $J$30:$O$33) would do – only consider the numerical values. After that, it is simply a case of crossmultiplying and adding the array syntax (this is necessary for Excel to consider errors in an array). If you require more conditions, rows or columns, simply add them. Easy! This needs to replace the original Array reference: {=SUMPRODUCT($I$30:$I$33 * IF(NOT(ISNUMBER($J$30:$O$33)), , $J$30:$O$33) * $J$22:$O$22 * ($H$30:$H$33=J$38))} with the braces (‘{‘ and ‘}’ as a result of entering the formula using CTRL + SHIFT + ENTER so that ISNUMBER will calculate on a cell by cell basis). Some of you may consider that the expression IF(NOT(ISNUMBER(Array)), , Array) appears a little convoluted and could be readily replaced by IF(ISNUMBER(Array), Array,) – which simply swaps the order of the TRUE and FALSE conditions. That’s true (pun intended), but NOT(ISNUMBER()) is a powerful combination that comes up time and time again in Excel error testing (for example, to highlight cells that do not contain numbers using conditional formatting). Therefore, my use of NOT() to negate TRUE and FALSE was deliberate so that you know how to report by exception. So far, I have only considered multiple criteria as AND conditions: what happens if you wish to use an OR relationship instead? That is, I don’t need all criteria specified to hold. Let’s imagine we run a car sales company with four divisions: North, South, East, and West. Further, we only sell two types of car: the Mercudi and the Lexota:
62
CHAPTER 2: What-If? Analysis
Imagine you are the General Manager responsible for the North Division and Mercudi sales. Each month you have to provide a report summarising the total sales you are responsible for. This requires analysis of multiple criteria, but it is an OR, rather than an AND, situation. We need to include sales of North Division and sales of Mercudi. However, if we do it this simply, sales of Mercudi made by the North Division will be double counted:
63
If we specify the criteria in the spreadsheet as follows:
The formula in this instance would be: =SUMPRODUCT((F12:F29=G34) * H12:H29) + SUMPRODUCT((G12:G29=G35) * H12:H29) - SUMPRODUCT((F12:F29=G34) * (G12:G29=G35) * H12:H29) However, there’s a simpler approach. Not many know this obscure but useful little Excel function. SIGN(number) is: • 1 if number is positive; • 0 if number is zero; and • -1 if number is negative. It’s only when you start combining this function with SUMPRODUCT do you realise how useful it can be. For example, in our scenario above, consider the following formula: =SUMPRODUCT(SIGN((F12:F29=G34) + (G12:G29=G35)) * H12:H29) Inside the nested SIGN function, there are two criteria: • Whether the Division is North (F12:F29=G34); and • Whether the car sold is the Mercudi (G12:G29=G35). Each criterion will either be TRUE (1) or FALSE (0), so the possible values inside the SIGN function are zero (neither criteria satisfied), one (only one criterion satisfied) or two (both criteria satisfied). If neither criterion is true, SIGN will return a value of zero; if one or more criteria is true, SIGN will return a value of one and hence sum the relevant values in column H. This is precisely what is required. With more criteria considered, the simplicity of SUMPRODUCT(SIGN) becomes even more pronounced. SIGN becomes extremely valuable with more criteria. For example, consider a similar situation but with four criteria:
64
CHAPTER 2: What-If? Analysis
In this example, having become a very successful General Manager, you have acquired greater responsibility: not only do you remain responsible for the North Division and Mercudi sales, but you are now mentor to salesperson Alice and for trying to push credit (finance) sales. As before, each month you have to provide a report summarising the total sales you are responsible for, which now considers four criteria: division, car, salesperson and finance:
65
If we specify the criteria in the spreadsheet as follows:
The “long” formula in this instance which would ensure the overlaps are not counted more than once would be: =SUMPRODUCT((F12:F29=G34) * J12:J29) + SUMPRODUCT((G12:G29=G35) * J12:J29) + SUMPRODUCT((H12:H29=G36) * J12:J29) + SUMPRODUCT((I12:I29=G37) * J12:J29) - SUMPRODUCT((F12:F29=G34) * (G12:G29=G35) * J12:J29) - SUMPRODUCT((F12:F29=G34) * (H12:H29=G36) * J12:J29) - SUMPRODUCT((F12:F29=G34) * (I12:I29=G37) * J12:J29) - SUMPRODUCT((G12:G29=G35) * (H12:H29=G36) * J12:J29) - SUMPRODUCT((G12:G29=G35) * (I12:I29=G37) * J12:J29) - SUMPRODUCT((H12:H29=G36) * (I12:I29=G37) * J12:J29) + SUMPRODUCT((F12:F29=G34) * (G12:G29=G35) * (H12:H29=G36) * J12:J29) + SUMPRODUCT((F12:F29=G34) * (G12:G29=G35) * (I12:I29=G37) * J12:J29) + SUMPRODUCT((F12:F29=G34) * (H12:H29=G36) * (I12:I29=G37) * J12:J29) + SUMPRODUCT((G12:G29=G35) * (H12:H29=G36) * (I12:I29=G37) * J12:J29) - SUMPRODUCT((F12:F29=G34) * (G12:G29=G35) * (H12:H29=G36) * (I12:I29=G37) * J12:J29) It’s just so pretty. PhD’s are available for all those of you who can follow this formula in a heartbeat. However, SUMPRODUCT(SIGN) is not just shorter, it’s easier to follow: =SUMPRODUCT(SIGN((F12:F29=G34) + (G12:G29=G35) + (H12:H29=G36) + (I12:I29=G37)) * J12:J29) Simple, isn’t it? The world of conditional formulae and multiple criteria may appear complex, but with practice, it doesn’t take too long to master and avoid ridiculously lengthy, memory intensive calculations.
66
CHAPTER 2: What-If? Analysis
CHAPTER 2.2: OFFSET We need this function for the next section. This is a function modellers either love or hate. It considers disposition or displacement and has the following syntax: OFFSET(reference, rows, columns, [height], [width]) The arguments in square brackets (height and width) may be omitted. In its most basic form, OFFSET(reference, rows, columns) will select a reference rows rows down (-rows would be rows rows up) and columns columns to the right (-columns would be columns columns to the left) of the reference. For example, consider the following grid:
OFFSET(A1, 2, 3) would take us two rows down and three columns across to cell D3. Therefore, OFFSET(A1, 2, 3) = 16, viz.
OFFSET(D4, -1, -2) would take us one row up and two columns to the left to cell B3. Therefore, OFFSET(D4, -1, -2) = 14, viz.
67
CHAPTER 2.3: SCENARIO ANALYSIS It is this OFFSET displacement technique (described above) that can create a scenario table:
Essentially, the assumptions used in the model are linked from cells L17:L24. These values are drawn from the scenario table to the right of the highlighted yellow range (e.g. cells N17:N24 constitute Scenario 1, the “Base” case; cells O17:O24 constitute Scenario 2). The Scenario Number is located in cell H12. Using OFFSET, we can retain all scenarios and select as we see fit. For example, the formula in cell L18 (highlighted) is simply =OFFSET(M18, , $H$12), that is, start at cell M18 and displace zero rows and the value in H12 columns across. In the illustration above, the formula locates the cell one column to the right, which is Scenario 1. The advantage of OFFSET over other functions such as INDEX, CHOOSE and LOOKUP functions (see later in this chapter) is that the range of data can be added to without having to amend their respective formulae. The image below shows Scenarios 6 and 7 added in an instant. The other functions require a specified range whereas we can keep adding scenarios without changing the formula / making the model inefficient.
68
CHAPTER 2: What-If? Analysis
It should be noted that OFFSET is a volatile function, i.e. a function that causes recalculation of a formula in the cell where it resides every time Excel recalculates. This occurs regardless of whether precedent cells / calculations have changed, or whether the formula also contains non-volatile functions. One test to check whether your workbook is volatile is close a file after saving and see if Excel prompts you to save it a second time (this is an indicative test only). OFFSET is also what is known as a non-auditable function in that it does not recognise dependent cells that are linked via an OFFSET function. For example, in my illustration above, the $3.70 in cell N18 is clearly used. However, if you were to select this cell and trace dependents (ALT + M + D), you would get the following message:
This should not put you off using OFFSET; it is a function that frequently calculates much quicker than the alternative options and its advantages may often outweigh the potential pitfalls.
69
CHAPTER 2.4: DATA TABLES Sometimes, you want to flex one or two variables to see how these changes in input affect key outputs. This may be performed using Excel’s built-in Data Tables. Data Tables are ideal for executive summaries where you wish to show how changes in a particular input affect a key output. However, you should use them sparingly. If you can achieve the same functionality without using Data Tables, then you should do that:
In this illustration, the key output revenue has been given in cell G11. We want to summarise what happens if we increase (“flex”) this figure by a given percentage, with the inputs specified in cells G17:G26. This can be simply computed by using the formula =$G$11 * (1 + $F17) in cell G17 and simply copying this calculation down. Data Tables should really be used when such simple calculations are not possible, and you want to flex one variable (known as a “one-variable” or “one-dimensional (1-D)” Data Table) or two (known as a “two-variable” or “two-dimensional (2-D)” Data Table). Let’s take a look at each in turn.
70
CHAPTER 2: What-If? Analysis
1-D Data Tables This is best illustrated using another example:
It’s not vital you understand what this spreadsheet is doing. It is essentially using inputs in cells G11 and G16:L16 to generate an output in cell G30, as it calculates what cash received in row 16 would be worth now if interest were 8.0% per period (known as the “Net Present Value” (NPV)). Now, for more sophisticated readers, yes, I appreciate this example could be constructed using a similar technique to our revenue example using the NPV function: I just wanted to construct a slightly more complex alternative that could still be followed! Therefore, with a simple Net Present Value calculated for a total of six periods (0 to 5 inclusive), the output for a discount rate of 8.0% (cell G11) is $9,482 (cell G30). But what if I wanted to know how the NPV would change if I varied the input discount rate? It is very easy to construct a table (a Data Table) similar to the one displayed in cells F38:M50 above. The required discount rates are simply typed into cells F39:F50, but the heading in cell G38 is not what it seems.
71
For a 1-D Data Table to work using a columnar table similar to the one illustrated, the top row of the second and any subsequent columns has to contain the reference to the output cell(s). Many modellers will do this, putting the headings in the row above instead and then they may or may not hide this row in order to compensate. There is a crafty alternative (employed above). Using CTRL + 1, ALT + H + O + E or select ‘Format Cells…’ from the ‘Format’ drop-down in the ‘Cells’ grouping of the ‘Home’ tab of the Ribbon to Format Cells. Then, if we go to the ‘Number’ tab we can still type the formula(s) in but change the outward appearance of the cell. It is with this borne in mind that cell G38 is formatted as follows:
Here, I have typed in “”NPV”;”NPV”. Essentially, what I have done here is replaced all nonnegative numbers with the text “NPV” and negative numbers with the text “NPV”. You might wonder why I have I typed this in twice? If the number is negative and the second “NPV” has not been defined the negative number would be replaced by “-NPV” instead – which is not what we want. Once this formatting has been done and the formula =G30 has been typed into the header in cell G38 (giving it the appearance “NPV”, then select cells F38:M50 and go to ‘Data Table…’ in the What-If Analysis drop-down list in the ‘Forecast’ grouping of the ‘Data’ tab on the Ribbon (ALT + A + W + T):
72
CHAPTER 2: What-If? Analysis
This calls the ‘Data Table’ dialog box:
At this point, confusion often sets in as users are often unsure whether they should be entering details in the ‘Row input cell:’ and / or ‘Column input cell:’ input boxes. The rules are very simple: • Referenced directly, the inputs and outputs must be on the same sheet as the Data Table (although there are ways and means around this) • Use only one input box if you want to flex one input; use both if you wish to flex two • If inputs are in a column in the Data Table, use the ‘Column input cell:’ input box • If inputs are in a row in the Data Table, use the ‘Row input cell:’ input box. Here, my inputs are in a column and I want to use them to substitute for the value in cell G11, so I select cell G11 for the ‘Column input cell:’ input box. Clicking ‘OK’ results in the following summary:
73
That’s it – you have your “What-if?” analysis. It should be noted that at this point you may not enter any rows or columns into the Data Table (or delete any either). This is because the formula {=TABLE(,G11)} has been entered into cells G36:G47. This is a special formula that cannot be typed in directly, even using CTRL + SHIFT + ENTER, and can only be created or modified by using the Data Table feature. If the table had been across a row instead, ensure that the input values are in the top row, and that the ‘headings’ are in the first column (that is, transpose the example table, above). Then, you would populate the ‘Row input cell:’ box instead. 1-D Data Tables do not need to be simply two columns or two rows. It is entirely possible to display the effects on more than one output at the same time provided you wish to use the same inputs throughout the sensitivity analysis as follows:
Sometimes, you may find all of the numbers in your Data Table are identical. If this happens, you need to check your calculation settings. To do this, go to Excel Options (File -> Options or ALT + F + T) and then select ‘Formulas’. In the ‘Calculation options’ section, please ensure the ‘Workbook Calculation’ is set to ‘Automatic’:
74
CHAPTER 2: What-If? Analysis
Any other setting will not calculate Data Tables correctly. The reason why files might be set to anything other than Automatic is because Data Tables can consume a significant amount of memory and slow down workbook calculations – hence the options to disable them.
75
2-D Data Tables These Data Tables are similar in idea: they simply allow for two inputs to be varied at the same time. Let’s extend the 1-D example as follows:
This example is similar, but only calculates the NPV for a certain number of periods – specified in cell G11. Our 2-D Data Table (which is cells F40:L52, not F41:L52) can answer the question, “What is the NPV of our project over x periods with a discount rate of y%?”. If anything, a 2-D Data Table is simpler than its 1-D counterpart since there is little confusion over row and column input cells. Again, the output needs to be in the table, this time it must be in the top left-hand corner of the array. In our example, it is disguised as “Discount Rate” using similar number formatting to that described earlier. The inputs required now form the remainder of the top row and the first column of the Data Table. With cells F40:L52 highlighted, the Data Table dialog box is opened as before:
76
CHAPTER 2: What-If? Analysis
Since the top row represents the inputs for the Number of Periods, the ‘Row input cell:’ should reference $G$15, whilst the discount rate inputs (‘Column input cell:’) should link to $G$11 once more. Once ‘OK’ is clicked, the Data Table will populate as required – simple!
77
Data Table inputs should be hard coded Don’t use formulae for inputs in either the first row or column of a Data Table. Let me explain why, by considering the following example:
To be fair, this spreadsheet is arguably too simple to create a Data Table output, but I am using it to highlight the dangers of using formulae for inputs. In this example, all cells in yellow are inputs. The calculation in cell H15 is very simple: =H12*H13. But that’s not the point here. Cell H20 contains “On”, which is used for the formula in cell H22: =IF(H20="On", H15,) that is, the formula refers to the Total Revenue in cell H15 if the value in cell H20 is “On”. The reason this cell appears to be a heading that says “Total Revenue” is we have used the number formatting (CTRL + 1) trick again:
78
CHAPTER 2: What-If? Analysis
The ‘Number’ formatting is ‘Custom’ and has the key "Total Revenue";"Total Revenue" This means that if the value is a non-negative number (that is, a zero or a positive number), the value will appear as “Total Revenue” (the text before the delimiting semi-colon), and if it is negative it will also appear as “Total Revenue” (the text after the semi-colon). After the required input values (100 to 110 inclusive, as displayed) have been hard coded into cells G23:G33, the range G22:H33 has been selected, and then a Data Table has been created by selecting ‘Data Table…’ from the ‘What-If Analysis’ drop-down in the ‘Forecast’ group of the ‘Data’ tab of the Ribbon:
Since the inputs go down a column and the input cell is in cell H12, the resulting ‘Data Table’ dialog has been populated thus:
79
Assuming workbook calculations are set to ‘Automatic (Alt + T + O), that’s all you have to do – simple! So, what’s the problem? Consider this revised example:
Here, the columnar inputs (cells G53:G63) have been replaced by a formula: =IF(G52="", $H$42, G52 + 1) This seems to be fairly innocuous and theoretically, should make the worksheet more efficient as inputs do not need to be typed in twice. However, look closer. The values in cells H55:H63 are wrong. This is a common trap. It’s dangerous using formulaic inputs in a Data Table. So what went wrong? A 1-dimensional columnar Data Table works procedurally as follows: 1. Take the first input and put it in the input cell (so here, the value in cell G53 – 100 presently – would be copied as a value into cell H42)
80
CHAPTER 2: What-If? Analysis
2. This would cause the values in the formulaic inputs to update (so cells G53:G63 would be updated to [still] display 100, 101, …, 109, 110) 3. The result (cell H45, $100,000) would be recorded in the first row of outputs (cell H53) 4. The second input – currently 101 (cell G54) – would then be pasted as a value into the input cell (cell H42) 5. This would cause the values in the formulaic inputs to update (so cells G53:G63 would be updated to now display 101, 102, …, 110, 111 – these values have changed) 6. The result (cell H45, $101,000) would be recorded in the second row of outputs (cell H54) (this is why this output remains correct) 7. The third input – now revised to 103, not 102 (cell G55) – would then be pasted as a value into the input cell (cell H42) 8. This would cause the values in the formulaic inputs to update (so cells G53:G63 would be updated to now display 103, 104, …, 112, 113 – these values have changed) 9. The result (cell H45, $103,000, being $103 multiplied by 1,000) would be recorded in the third row of outputs (cell H55) (this is why this output is incorrect) 10. The fourth input – now revised to 106, not 103 (cell G56) – would then be pasted as a value into the input cell (cell H42) 11. This would cause the values in the formulaic inputs to update (so cells G53:G63 would be updated to now display 106, 107, …, 115, 116 – these values have changed) 12. The result (cell H45, $106,000, being $106 multiplied by 1,000) would be recorded in the fourth row of outputs (cell H56) (this is why this output is also incorrect) 13. And so on… 14. When all outputs have been determined, the Data Table input values (cells G53:G63) are then reset to the original values (100 to 110 inclusive). Explained like this, it’s easy to see the problem. If cell G53 had been left as a hard-coded value, or linked to an independent cell elsewhere, this would not have happened. However, people don’t get this, and the internet is littered with end users moaning that their Data Tables are wrong and Excel makes errors. It doesn’t; people do. Be careful; use inputs!
81
Data Tables on different sheets I have a saying that anything is possible in Excel. Maybe one day I may come unstuck, but today is not that day. The issue is that Excel restricts where the referred inputs must be located, i.e. they must be positioned on the same page. If you try and reference cells on another worksheet, or become cunning and use range names which refer to cells on another worksheet (a useful workaround on many occasions), you will encounter the following error message:
Most financial modellers will recall the mantra of keeping inputs separate from calculations separate from outputs. Data Tables force you to put outputs on the same worksheet as the inputs which can confuse end users and make it difficult to put all key outputs together. So how can you get around this? My solution assumes you do not wish to hide Data Tables on the input sheet and then link them to another worksheet (this is cumbersome and can make the model less efficient). To make things more “difficult”, I will assume that you have already built your financial model and the Data Tables are to be incorporated as an afterthought. There could be two inputs to incorporate. I will explain how to create one of them (you then just follow the process twice). Firstly, create a “dummy” input cell on the same worksheet as the Data Table. This needs to be protected such that data cannot be entered into this cell. I will assume that this cell is W44 (say) on the Sheet2 worksheet, i.e. the same sheet as the Data Table.
Secondly, link the Data Table (ALT + D + T) to this dummy input (in the illustration here I assume that the Data Table is a 1-dimensional Data Table):
82
CHAPTER 2: What-If? Analysis
Thirdly, let us assume you actually want the Data Table to link to “Input 1”(cell D4) on Sheet1:
Fourthly, since we have already built the model this input will already be linked throughout the model. Since I do not wish to change all the dependent formulae, I first cut (NOT copy) the input into an adjacent cell:
Fifthly, a copy is pasted back into the original cell (here, this was cell D4):
Finally, the value in cell E4 is replaced with the following formula =IF(Sheet2!W44="", D4, Sheet2!W44) and then formatted / protected to ensure end users do not actually type into this cell:
83
The Data Table will now work. This is because: • The Data Table links directly to a cell on the same sheet as the Data Table, but indirectly to the input on the other worksheet; • Cell E4 on Sheet1 is now the cell that drives all calculations throughout the model, even though it appears to have been added; • Cell D4 on Sheet1 still appears to be – and acts like – the original input it replaces. The associated file shows how this might work in practice. Data Tables can be really useful for executive summaries, but there are drawbacks to consider: • Data Tables can slow down the file calculation time dramatically. For example, if you have just three 2-D Data Tables, each with ten inputs on each axis, the model calculation time could increase by a factor of up to 300 (= 3 x 10 x 10).
Microsoft has recognised this issue and allows you to change Excel’s Calculation option (found in ALT + T + O, under ‘Calculation’) to ‘Automatic except tables’. I strongly recommend you do not implement this option. End users tend to assume Excel is always calculating everything automatically and some do not know how to check / modify this functionality.
Instead, I would build in ‘On / Off’ switches next to the Data Tables themselves. These are transparent and intuitive and have the same effect; and finally
• This approach may not work where formulae dependent on the input cells selected use OFFSET or INDIRECT. The technique can still be employed, but it may be safer not to cut and paste but add an input cell elsewhere instead.
84
CHAPTER 2: What-If? Analysis
CHAPTER 2.5: INDEX AND MATCH INDEX and MATCH – as a combination – are two of the most useful functions at a modeller’s disposal – although it is in danger of retirement soon with the new XLOOKUP function (see later). They provide a versatile lookup in a way that LOOKUP, HLOOKUP and VLOOKUP simply cannot. The best way to illustrate this point is by means of an example. Here is a common problem. Imagine you have built a financial model and your Balance Sheet – ahem! – contains misbalances. You need to fix it. Now I am sure you have never had this mistake yourself, but you have “close friends” that have encountered this feast of fun: solving Balance Sheet errors can take a long while. One of the first things modellers will do is locate the first period (in ascending order) that has such an error, as identifying the issue in this period may often solve the problem for all periods. Consider the following example:
This is a common modelling query. The usual suspects, LOOKUP and HLOOKUP / VLOOKUP do not work here: • LOOKUP(lookup_value, lookup_vector, [result_vector]) gives the wrong date as the balance checks are not in ascending order (i.e. ascending alphanumerically with no duplicates); whilst • HLOOKUP(lookup_value, table_array, row_index_num, [range_lookup]) gives #VALUE! since the first row must contain the data to be ‘looked up’, but the Balance Check is in row 13 in our example above, whereas the dates we need to return are in row 4 – hence we get a syntax error. There is a solution, however: INDEX MATCH. They form a highly versatile tag team, but are worth introducing individually.
85
INDEX Essentially, INDEX(array, row_num, [column_num]) returns a value or the reference to a value from within a table or range (list). For example, INDEX({7,8,9,10,11,12}, 3) returns the third item in the list {7,8,9,10,11,12}, i.e. 9. This could have been a range: INDEX(A1:A10, 5) gives the value in cell A5, etc. INDEX can work in two dimensions as well (hence the Column_num reference). Consider the following example:
INDEX(F11:L21, 4, 5) returns the value in the fourth row, fifth column of the table array F11:L21 (clearly 26 in the above illustration).
86
CHAPTER 2: What-If? Analysis
MATCH MATCH(lookup_value, lookup_vector, [match_type]) returns the relative position of an item in an array that (approximately) matches a specified value. It is not case sensitive. The third argument, match_type, does not have to be entered, but for many situations, I strongly recommend that it is specified. It allows one of three values: • match_type 1 [default if omitted]: finds the largest value less than or equal to the lookup_value – but the lookup_vector must be in ascending order, limiting flexibility; • match_type 0: probably the most useful setting, MATCH will find the position of the first value that matches lookup_value exactly. The lookup_vector can have data in any order and even allows duplicates; and • match type -1: finds the smallest value greater than or equal to the lookup_value – but the lookup_vector must be in descending order, again limiting flexibility. When using MATCH, if there is no (approximate) match, #N/A is returned (this may also occur if data is not correctly sorted depending upon match_type). MATCH is fairly straightforward to use:
In the figure above, MATCH(“d”, F12:F22, 0) gives a value of 6, being the relative position of the first ‘d’ in the range. Note that having match_type 0 here is important. The data contains duplicates and is not sorted alphanumerically. Consequently, using match_type 1 and -1 would give the wrong answer: 7 and #N/A respectively.
87
INDEX MATCH Whilst useful functions in their own right, combined they form a highly versatile partnership. Consider the original problem:
MATCH(1, J13:U13, 0) equals 5, i.e. the first period the balance sheet does not balance in is Period 5. But we can do better than that. INDEX(J4:U4,5) equals May-20, so combining the two functions: INDEX(J4:U4, MATCH(1, J13:U13, 0)) equals May-20 in one step. This process of stepping out two calculations and then inserting one into another is often referred to as “staggered development”. No, this is not how you construct a financial model late in the evening after having the odd drink or two! Do note how flexible this combination really is. We do not need to specify an order for the lookup range, we can have duplicates and the value to be returned does not have to be in a row / column below / to the right of the lookup range (indeed, it can be in another workbook never mind another worksheet!). With a little practice, the above technique can be extended to match items on a case sensitive basis, use multiple criteria and even ‘grade’. But watch out: XLOOKUP is on the horizon (please refer to Chapter 10.2)…
88
CHAPTER 2: What-If? Analysis
CHAPTER 2.6: USING TORNADO CHARTS FOR SENSITIVITY ANALYSIS I define sensitivity analysis as meaning the flexing of one or at most two variables to see how these changes in input affect agreed key output(s). With respect to constructing a tornado chart, I need to become even more specific. Here, we consider the flexing of only one variable at a time. Tornado charts are a type of bar chart that reflect how much impact varying an input has on a particular output, providing both a ranking and a measure of magnitude of the impact, sometimes given in absolute terms (as in our detailed worked example below) and sometimes in percentage terms. Example of a Tornado Chart
As you can see in the above mock example, a base line is drawn for a selected output (the vertical line in this graphic) which corresponds to all inputs set at their “base” settings, i.e. with no sensitivities incorporated. The variables are ranked so that the input that causes the most variation in the chosen output is shown first, the assumption that causes the second greatest movement is ranked second, and so on. The ends of the bars show how much the output is affected by the sensitivity, and the end product frequently resembles a ‘tornado’, hence the name for this bar chart. In theory, this chart can show end users which assumptions appear to be the key drivers of a particular output and this can greatly assist management decision-making. There are issues with this rather simple tool. If all inputs are varied similarly (e.g. ±10%) this is often known as a “deterministic” tornado chart, as this determines which inputs have most effect in such circumstances. However, this is often unrealistic as this does not take into account the likelihood of such variations (e.g. foreign exchange rates may vary by ±30%, whereas fixed costs may only range by ±3%, say). When the probability of such a variation is noted (not considered here), this is often known as a “probabilistic” tornado chart instead. Please note these explanations have been made simple deliberately here!
89
Further, it assumes that the relationship between inputs and outputs is ‘monotonic’, i.e. continuing to increase (or decrease) an input value should not suddenly make the output change direction. For example, increasing costs should continue to decrease profits (they should not suddenly rise). If this is not the case, then the variations may not show the maximum / minimum values of the outputs for the range, and therefore the chart would be utterly meaningless. But enough of this criticism. Let’s assume it all works and go through an example.
90
CHAPTER 2: What-If? Analysis
Walkthrough Example The example file contains two key worksheets, the second being a summary Income Statement:
This example is driven by the following inputs:
It is not that important to fully understand how these calculations work. The point here is to demonstrate how to construct a tornado chart. In the image above, note that column I (Sensitivity) is blank, but allows us to sensitise the inputs and hence see how the outputs vary as a consequence. The other key worksheet in the model creates the tornado chart. It contains the sensitivity inputs and 1-dimensional (1-D) Data Tables based on Net Profit After Tax (cell K41, shown above).
91
For this example to work correctly, the sensitivity inputs (cells I12:I19) should all be set to zero so that the base case is displayed. Also, note that in our example here, all inputs are flexed by ±10%. It is common to use symmetrical variations, but it does not necessarily mean all inputs should be flexed by similar proportions, as the discussion on deterministic versus probabilistic tornados (above) observed. The reader will note that rows 30, 32, 34, 36, 38, 40 and 42 have all been hidden from view and a cursory glance at these rows will make it clear how easy it is to change each flex if required. In our example though, we will keep it simple. Further, it should be recognised that the middle column of the data table is necessary, although it looks superfluous initially. It acts as a check that the output is indeed the “base output” with zero flex, but it is also required to gauge the magnitude of the variation in outputs. This “raw data” needs to be ‘cleaned’. This is done as follows.
The table is replicated so that the variation to the base case is detailed in columns I:K. The spread is calculated in column M (using = ABS(I48 - K48) for example here), as this is needed to rank the assumptions. Columns F and N are used to make adjustments to this spread if necessary, so that no two spreads will be exactly the same. I do not go into detail as how this is done here, as this is a personal choice and the technique I have adopted will not work for all scenarios. I encourage readers to review my approach and make up their own minds as to how to avoid having ties. The aim is to ensure that no two spreads are identical as this causes problems for ranking. 92
CHAPTER 2: What-If? Analysis
Column O then ranks the adjusted spreads, using the formula =RANK(N48, $N$48:$N$55), with 1 being the largest adjusted spread and 8 the smallest. This “Cleaned Data” table now requires reordering (using INDEX MATCH for example) prior to chart construction:
We are now in a position to create the chart. In any case, the table (cells G60:K67) should be selected. Then: • Click on the ‘Insert ’ tab on the Ribbon, then select ‘Bar’ from the ‘Charts’ group and ‘Clustered’ from the ‘2-D Bar’ section (ALT + N + B + ENTER) • With the chart selected, click on the ‘Chart Tools, Layout’ tab and then select ‘Chart Title’ from the ‘Labels’ group, then ‘Above Chart’ (ALT + JA + T + down arrow twice + ENTER) for a title of ‘Tornado Chart’ • Similarly, type in ‘Sensitivity’ for the Primary Vertical Axis title (ALT + JA + I + V + down arrow once + ENTER) • Type in ‘NPAT’ for the Primary Horizontal Axis title (ALT + JA + I + H + down arrow once + ENTER). This will generate a rather raw looking chart that will probably not look dissimilar to the following illustration:
93
The legend should be removed (right click on it and select ‘Clear’ or ‘Delete’ from the shortcut menu) first of all, although it will still be evident that the bars are not aligned, are too small and inverted. You may be tempted to change the ranking so that the chart is displayed correctly, but often the data and the chart go hand in hand in outputs and it makes more sense to have the key driver at the head of the table. To correct these issues, we need to make further changes: • Right click on the vertical axis and select ‘Format Axis…’ from the shortcut menu • Change ‘Axis Labels’ to ‘Low’ in the drop-down box • Check ‘Categories in reverse order’ • Click ‘Close’ • Right click on the horizontal axis and select ‘Format Axis…’ from the shortcut menu • Change ‘Axis Labels’ to ‘Next to axis’ in the drop-down box • Click ‘Close’ • Right click on the data bars and select ‘Format Data Series…’ from the shortcut menu • Use the sliders to select an ‘Overlap’ of +100 and a ‘Gap width’ towards ‘No Gap’, say 20 • Click ‘Close’. The plot area colour, bar colours and gridlines can be formatted as required; similarly, the horizontal axis numbers may be custom formatted too. Eventually, you should end up with a dynamic tornado chart that looks similar to the following graphic:
94
CHAPTER 2: What-If? Analysis
CHAPTER 2.7: SIMULATIONS ANALYSIS Risk analysis and quantification of uncertainty are a fundamental part of every decision we make. We are constantly faced with uncertainty, ambiguity and variability. Monte Carlo simulation analysis is a sophisticated methodology, which can provide a sampling of all the possible outcomes of business decisions and assess the impact of risk, allowing for better decision-making under uncertainty. “Monte Carlo sampling” is just buzzword bingo for “sample at random” – but it’s an important technique. By using this method, you can calculate: • the expected value (or average) • the range of values (distribution) • the likelihood of these range of values being achieved (probabilistic distribution). And best of all, often it can be done in Excel without add-ins or macros. So what is it? And how do we do it?
95
Understanding the Issues Let’s start simply. Imagine I toss an unbiased coin: half of the time it will come down heads, half tails:
If I toss two coins, I get four possibilities: two Heads, a Head and a Tail, a Tail and a Head, and two Tails.
In summary, I should get two heads a quarter of the time, one head half of the time and no heads a quarter of the time. Note that (1/4) + (1/2) + (1/4) = 1. These fractions are the probabilities of the events occurring and the sum of all possible outcomes must always add up to 1. The story is similar if we consider 16 coin tosses, say:
96
CHAPTER 2: What-If? Analysis
Again, if you were to add up all of the individual probabilities, they would total to 1. Notice that in symmetrical distributions (such as this one) it is common for the most likely event (here, eight heads) to be the event at the midpoint. Of course, why should we stop at 16 coin tosses?
All of these charts represent probability distributions, i.e. it displays how the probabilities of certain events occurring are distributed. If we can formulate a probability distribution, we can estimate the likelihood of a particular event occurring (e.g. probability of precisely 47 heads from 100 coin tosses is 0.0666, probability of less than or equal to 25 heads occurring in 100 coin tosses is 2.82 x 10-7, etc.). Now, I would like to ask the reader to verify this last chart. Assuming you can toss 100 coins, count the number of heads and record the outcome at one coin toss per second, it shouldn’t take you more than 4.0 X 1022 centuries to generate every permutation. Even if we were
97
to simulate this experiment using a computer programme capable of generating many calculations a second it would not be possible. For example, the Japan Times announced a new computer that could compute 200,000,000,000,000 (200 trillion) calculations per second (known as 200 petaflops!). If we could use this computer, it would only take us a mere two hundred million centuries to perform this computation. That’s almost 50% longer than the universe has been estimated to be in existence. Ain’t nobody got time for that. Let’s put this all into perspective. All I am talking about here is considering 100 coin tosses. If only business were that simple. Potential outcomes for a business would be much more complex. Clearly, if we want to consider all possible outcomes, we can only do this using some sampling technique based on understanding the underlying probability distributions.
98
CHAPTER 2: What-If? Analysis
Probability Distributions If I plotted charts for 1,000 or 10,000 coin tosses similar to the above, I would generate similarly shaped distributions. This classic distribution which only allows for two outcomes is known as the Binomial distribution and is regularly used in probabilistic analysis. The 100 coin toss chart shows that the average (or ‘expected’ or ‘mean’) number of heads here is 50. This can be calculated using a weighted average in the usual way. The ‘spread’ of heads is clearly quite narrow (tapering off very sharply at less than 40 heads or greater than 60). This spread is measured by statisticians using a measure called standard deviation which is defined as the square root of the average value of the square of the difference between each possible outcome and the mean, i.e.
where: σ = standard deviation
N = total number of possible outcomes
Σ = summation
xi = each outcome event (from first x1 to last xN)
μ = mean or average.
The Binomial distribution is not the most common distribution used in probability analysis: that honour belongs to the Gaussian or Normal distribution:
Generated by a complex mathematical formula, this distribution is defined by specifying the mean and standard deviation (see above). The reason it is so common is that in probability theory, the Central Limit Theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and standard deviation, will approximate to a Normal distribution.
99
The Normal distribution’s population is spread as follows:
i.e. 68% of the population is within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations. Therefore, if we know the formula to generate the probability distribution – and here I will focus on the Normal distribution – it is possible to predict the mean and range of outcomes using a sampling method. The formula for the Normal distribution is given by
(homework: prove this from first principles).
100
CHAPTER 2: What-If? Analysis
Bringing it Back to Excel So the Gaussian – or Normal – distribution is important in the world of simulations analysis. Excel has a function that can calculate it: • NORM.DIST(x, mean, standard_deviation, cumulative) When the last argument is set to TRUE, NORM.DIST returns the cumulative probability that the observed value of a Normal random variable with specified mean and standard_deviation will be less than or equal to x. If cumulative is set to FALSE (or 0, interpreted as FALSE), NORM.DIST returns the height of the bell-shaped probability density curve instead. As an illustration:
Check out the example though to see how it can catch out the inexperienced user when trying to assess just the (non-cumulative) probabilities. Simulations Analysis in Excel Since it is clear we cannot model every possible combination / permutation of outcomes, the aim is to analyse a representative sample based on a known or assumed probability distribution. There are various ways to sample, with the most popular approach being the “Monte Carlo” method, which involves picking data randomly (i.e. using no stratification or bias). Excel’s RAND() function picks a number between 0 and 1 randomly, which is very useful as cumulative probabilities can only range between 0 and 1. NORM.INV(probability, mean, standard_deviation) returns the value x such that the cumulative probability specified (probability) represents the observed value of a Normal random variable with specified mean and standard_deviation is less than or equal to x. In essence, this is the inverse function of NORM.DIST(x, mean,standard_deviation, TRUE):
101
Still awake? If so, we are in business if we assume our variable is Normally distributed. We can get Excel to pick a random number between 0 and 1, and for a given mean and standard deviation, generate a particular outcome appropriate to the probability distribution specified, which can then be used in the model as in the following illustration:
The mean and standard deviation are easy to calculate – simply list all of your historical data and use the Excel functions AVERAGE for mean and STDEV.S (we use this as it is only a sample, rather than STDEV.P for the population) for the standard deviation. Here, three variables, Sales, Foreign Exchange and Gross Margin all employ the NORM. INV function to generate the assumptions needed to calculate the Gross Profit. We can run the simulation a given number of times by running a simple one-dimensional Data Table. The actual approach is a little crafty though:
102
CHAPTER 2: What-If? Analysis
Since the variables use the RAND function to generate random numbers, each time the end user presses ENTER or F9, the variables will recalculate (this quality is known as ‘volatility’). I have created a Data Table (ALT + D + T) to create multiple trials (the headings are actually the outputs required using clever number formatting to disguise what they are). Once dynamic arrays become Generally Available, this technique will become even simpler. Since each Data Table entry causes the model to recalculate for each column input, the values will change automatically. On this basis, note that the column input cell in the example above refers to F31 which is unused by any formula on the spreadsheet. The example Excel file we have provided has 1,000 rows (i.e. 1,000 simulations). Since the variables have been generated randomly, this is a simple Monte Carlo simulation – no fancy software or macros required! It only requires a quick collation step to summarise the outputs graphically:
103
Do remember: • Not all variables are Normally distributed. Consequently, using the NORM.INV function may be inappropriate in some instances; • The above example has assumed all variables are independent. If there are interrelationships or correlations between inputs, this simple approach would need to be revised accordingly; and • Working with probabilities is notoriously counter-intuitive. Take care with interpreting results and always remember that the results only represent a sample of possible outcomes (the sample size may be too small for extrapolation to the entire population). If in doubt, consult an expert (hi, our company is available for bar mitzvahs and chargeable work…).
104
CHAPTER 2: What-If? Analysis
CHAPTER 2.8: VARIANCE ANALYSIS If there’s one thing that can be tedious in Excel, it’s creating a variance report. Imagine you had originally created a budget and as more information came to light, you decided to revise the numbers as follows:
That’s easy enough, right? Put the original budget in the first line and the reforecast data in the second line et voila! You can produce your variance analysis in the bottom row. But that’s not how management want it. That would be too easy. They want it to look like this:
It’s all well and good, but this means a different formula in each cell:
Yuck. That means plenty of opportunities to reference the wrong cell as well as making it impossible to copy the calculation across the row. Thousands of accountants face this thankless task every day – this report can be ridiculously time-consuming, is so easy to calculate incorrectly or misreference, and is not scalable. Is there a better way? It will be a very short section if I say “no” at this point…
Here, I have created a formula in cell F20 that can be copied across the entire row of the Report Table – even if the number of periods were to be extended. It’s a very simple formula: =OFFSET($E$10, MOD(COLUMNS($F19:F19) - 1, COLUMNS($F$19:$H$19)) + 1, ROUNDUP(COLUMNS($F19:F19) / COLUMNS($F$19:$H$19), 0))
105
I think that might need some explanation! Let me start with the principle. Let’s look closer at the source data:
Imagine your cursor is positioned in cell E10 (the cell with the red X in it): • To get to the January budget data, you would have to move one cell down and one column to the right • To get to the January reforecast data, you would have to move two cells down and one column to the right • To get to the January variance, you would have to move three cells down and one column to the right • To get to the February budget data, you would have to move one cell down and two columns to the right • etc. Do you see? To get to any data in January, you have to move one column to the right; to get to any data in February you have to move two columns to the right, and so on. To get to any budget number you have to move one row down; to get to any reforecast number you have to move two rows down; to get to any variance figure, you have to move three rows down. There’s a function that can help us with this: OFFSET (for more information, please refer to Chapter 2.2) comes to the rescue again. We could get a formula to work as follows:
We need the number of rows reference in the OFFSET function to go 1, 2, 3, 1, 2, 3, 1, 2, … as the formula is copied across and we want the number of column references in the OFFSET function to go 1, 1, 1, 2, 2, 2, 3, 3, … Using COLUMNS($E13:E13) as our counter (COLUMNS simply determines the number of columns in the cited range), we can generate both of these functions easily. To generate the first sequence (1, 2, 3, 1, 2, 3, 1, 2, …) the MOD function works wonders. The MOD function, MOD(number, divisor), returns the remainder after the number (first
106
CHAPTER 2: What-If? Analysis
argument) is divided by the divisor (second argument). The result has the same sign as the divisor. For example, 9 / 4 = 2.25, or 2 remainder 1. MOD(9, 4) is an alternative way of expressing this, and hence equals one (1) also. Note that the 1 may be obtained from the first calculation by (2.25 - 2) x 4 = 1, i.e. in general: MOD(n, d) = n – d * INT(n / d), where INT() is the integer function in Excel. On this basis, MOD(Counter - 1, 3) + 1 will generate our first sequence. As Counter increases, the values will generate 1, 2, 3, 1, 2, 3, 1, … as required. Therefore, =MOD(COLUMNS($F19:F19) - 1, COLUMNS($F$19:$H$19)) + 1 may be used in our example. The COLUMNS function has been used to avoid both a counter and the hard-coded value of three (3). This is in accordance with our Best Practice modelling principles of Flexibility and Transparency (Chapter 1.1). The ROUNDUP(value, n) function rounds the amount value up to n decimal places (so zero will be to the next whole number or integer). Therefore, =ROUNDUP(COLUMNS($F19:F19) / COLUMNS($F$19:$H$19), 0) rounds 1/3, 2/3, 3/3, 4/3, … to the next whole number (1, 1, 1, 2, …). Putting this all together gives us our horror formula =OFFSET($E$10, MOD(COLUMNS($F19:F19) - 1, COLUMNS($F$19:$H$19)) + 1, ROUNDUP(COLUMNS($F19:F19) / COLUMNS($F$19:$H$19), 0)) once more. It may seem strange that I appear to be an advocate for such a formula. It is one of these times where you need to balance flexibility with transparency. Adding helper rows or columns may actually make it more difficult for the end user to follow the calculation logic, ironically. If end users simply see the result
they will understand this without even inspecting the contents of the Formula bar.
107
CHAPTER 2.9: BREAKEVEN ANALYSIS Finally in this chapter, let’s finish on a simpler method of reviewing data: breakeven analysis. This does not provide insight into ranges and / or shapes of output distributions or provide details on the probability of a particular result being achieved. This answers another important question: at what level does this project have to perform in order to not make a loss? This type of calculation may be performed on an accruals basis (Income Statement) or a cash basis (Cash Flow Statement). It simply concerns flexing a parameter to ensure negative consequences are avoided. Therefore, it is a type of (specialised) sensitivity analysis. Consider the following scenario:
Against a unit selling price of $25.00, there are two types of costs: • variable: direct costs that are incurred per toilet roll, e.g. materials, labour and overheads • fixed: indirect costs that will be incurred in any case, e.g. manufacturing and marketing costs. Projected profit for 20,000 sales (being the expected sales of $500,000 in cell H22 divided by the unit selling price of $25 in cell H21) is $63,000 after tax – but what would be the minimum
108
CHAPTER 2: What-If? Analysis
sales amount be to avoid a loss (i.e. “break even”)? If we are risk averse, we may be more interested in this figure than the expected profit. This could be calculated using Excel’s Goal Seek feature, but more information is obtained by calculation. Assuming tax is only paid on profits, breakeven is achieved when the net of sales less the variable costs (the contribution) equals the fixed costs:
From the above image, the number of units to be sold to break even (wipe evenly?) is computed by dividing the fixed costs (cell H19) by the unit contribution (H21 - H13). This provides more insight than using Goal Seek (which would generate the same number). For example, the breakeven number of units may be reduced by: 1. increasing the unit selling price 2. reducing the direct costs 3. reducing the fixed costs. It should also be noted that changing the prevailing tax rate has no effect. Care should be taken if the concept of breakeven is extended to derive the number of units to be sold for a target Net Profit After Tax (NPAT). Consider the following:
109
In the above screenshot, I have now included a target NPAT (cell H25) of $100,000. Simply adding this figure to the fixed costs is incorrect (NPAT is $70,000). This is because we have forgotten tax: profits are derived post tax, so we need to gross up the target figure by (1 - tax rate), viz.
110
CHAPTER 2: What-If? Analysis
Note the formula in cell H27: =(H19 + (H25 / (1 - H23))) / (H21 - H13) You’d think I’m on commission with all those parentheses (well, I’ve certainly moved up a pay bracket…). The calculation H25 / (1 - H23)takes the target NPAT (H25) and scales it up by the tax rate (1 - H23), so that it reverts to the target amount once tax has been deducted. It’s a common mistake made in analytical models and is an error that is easily avoided. Don’t make the same mistake (keep learning and make others instead).
111
Chapter 3: Forecasting Considerations For all the technical aspects discussed in the last book, there was little mention on how to derive forecasts and then summarise them. This chapter intends to redress the balance. Managers and analysts need to work together to improve forecasting. They need to spend more time analysing and less time preparing. What I want to talk about is the need for objective forecasting. By this, I mean something that can be constructed simply such that if anyone follows the same process, they will get the same figures. This needs to be a mechanical, objective process. This way, analysts may prepare this data in moments without feeling emotionally attached to the outputs. Furthermore, operational managers can review the trend and state where future numbers are wrong and all they need to do is explain the variation, i.e. undertake incremental budgeting. There is no need to disagreements or confrontation: both parties should work together as a team. These forecasts may need to be adjusted for seasonality and cyclicality; forecasts may need to be overridden as actual data emerges; forecasting periods may need to be revised to consider delays, changes of deadlines, extensions, etc. This chapter discusses all these sorts of issues. It’s a key skill to pick up and it’s often overlooked as analysts tend to focus on data preparation and financial modelling instead.
112
CHAPTER 3: Forecasting Considerations
CHAPTER 3.1: SEASONAL / CYCLICAL FORECASTING Consider the following:
The aim is to develop a technique to identify what would be next in a series, i.e. forecast the future. There are various approaches you could use: • Naïve method: this really does live up to its billing – you simply use the last number in the sequence, e.g. the continuation of the series 8, 17, 13, 15, 19, 14, … would be 14, 14, 14, 14, … Hmm, great • Simple average: only a slightly better idea: here, you use the average of the historical series, e.g. for the continuation of the series 8, 17, 13, 15, 19, 14, … would be 14.3, 14.3, 14.3, 14.3, … • Moving average: now we start to look at smoothing out the trends by taking the average of the last n items. For example, if n were 3, then the sequence continuation of 8, 17, 13, 15, 19, 14, … would be 16, 16.3, 15.4, 15.9, 15.9, … • Weighted moving average: the criticism of the moving average is that older periods carry as much weighting as more recent periods, which is often not the case. Therefore, a weighted moving average is a moving average where within the sliding window values are given different weights, typically so that more recent points matter more. For example, instead of selecting a window size, it requires a list of weights (which should add up to 1). As an illustration, if we picked four periods and [0.1, 0.2, 0.3, 0.4] as weights, we would be giving 10%, 20%, 30% and 40% to the last 4 points respectively which would add up to 1 (which is what it would need to do to compute the average).
Therefore the continuation of the series 8, 17, 13, 15, 19, 14, … would be 15.6, 15.7, 15.7, 15.5, 15.6, …
All of these approaches are simplistic and have obvious flaws. We are using historical data to attempt to predict the next point. If we go beyond this, we are then using forecast data to predict further forecast data. That doesn’t sound right. We should stick to the next point. Since we are looking at a single point and we can weight the historical data by 113
adding exponents to the calculation, this is sometimes referred to as Exponential Single Smoothing. A slightly more sophisticated method is called regression analysis: well, that takes me back! This is a technique where you plot an independent variable on the x (horizontal axis) against a dependent variable on the y (vertical) axis. “Independent” means a variable you may select (e.g. “June”, “Product A”) and dependent means the result of that choice or selection. For example, if you plotted your observable data on a chart, it might look something like this:
Do you see? You can draw a straight line through the data points. There is a statistical technique where you may actually draw the “best straight line” through the data using an approach such as Ordinary Least Squares, but rather than attempt to explain that, I thought I would try and keep you awake. There are tools and functions that can work it out for you. This is predicting a trend, not a point, so is a representative technique for Exponential Double Smoothing (since you need just two points to define a linear trend). Once you have worked it out, you can calculate the gradient (m) and where the line cuts the y axis (the y intercept, c). This gives you the equation of a straight line: y = mx + c Therefore, for any independent value x, the dependent value y may be calculated – and we can use this formula for forecasting. Of course, this technique looks for a straight line and is known as linear regression. You may think you have a more complex relationship (and you may well be right), but consider the following: • Always split your forecast variables into logical classifications. For example, sales may be difficult to predict as the mix of products may vary period to period; for each product, there may be readily recognisable trends • If the relationship does not appear to be linear, try plotting log x against log y. If this has a gradient of two then y is correlated with x2; if the gradient is three, then y is correlated with x3 etc.
114
CHAPTER 3: Forecasting Considerations
With this all borne in mind, let’s return to my example:
Now let’s be honest, anyone who has historical data looking this perfect should be referred to the auditors, but hey, this is for illustration purposes. I have data from August 2014 until last month. I want to extrapolate it beyond 2020 (I want 2020 foresight!). There are several functions that can help us here, with one of the simplest being TREND. TREND(known_y’s, known_x’s, new_x’s, [constant]) projects assuming that there is a relationship between two sets of variables x (independent variable – here, the dates) and y (dependent variable – the sales). It’s preferable to leave constant blank in the TREND function in order to obtain the best fit. For example:
115
Hence, we can extrapolate the data using the TREND function viz.
Ladies and gentleman, you may have heard of hockeystick projections; well, let me now present you with the swordfish. You extrapolate linearly, you get a straight line. To quote Billie Eilish: well, duh. This isn’t good enough. We need to identify the cyclicality of the data. It appears to go through a cycle once every 12 months. This might not always be the case, but the concept remains the same even if the periodicity is not annual. I want to calculate a periodic growth rate objectively. There’s various ways I can do this. You might argue with me. That’s fine. Feel free to write a brief note and send it to someone who cares. That’s the problem here – it’s subjective until your organisation defines how it is to be measured. Then, everyone follows that process and it becomes objective. In my example, I am going to compare the sum of the sales over the 12 months ending 30 June 2019 with the forecast sales as calculated using TREND over the 12 months ending 30 June 2020:
It is this percentage I will use to grow the forecasts (note the associated Excel file allows for different periodicities as long as the cycle remains constant). I then grow each period’s value by its corresponding value in the previous period by this percentage (7.83% here). This gives me a more realistic chart:
116
CHAPTER 3: Forecasting Considerations
That looks much better. With practice, this approach doesn’t take that long to prepare. Numbers may be varied from this forecast with the operational manager only having to explain these deviations. It makes life easier all round. Once the method of assessing inferred growth rates based upon the TREND function have been agreed and what normalisations to historical data should be input, the process becomes much more straightforward. Of course, this method should be used for all forecast inputs separately and not just on their aggregation, otherwise confusion occurs due to sales mix changes, new products, cut-off periods, etc. But there is an even faster way – if you happen to have Office 365 or Excel 2016 (or later)… Exponential Triple Smoothing (ETS) sounds like a dairy process, but it actually uses the weighted mean of past values for forecasting. It’s popular in statistics as it adjusts for seasonal variations in data, like in the example above. For those who really need to know, Excel uses a variation of the Holt Winters ETS algorithm, although to be honest, I think you should get out more. In Excel 2016, ETS has gone “native”, i.e. it is a standard feature. This includes both a set of new functions such as FORECAST.ETS and other supporting functions for additional statistics. Your dataset does not need to be perfect, as the functions will accommodate up to 30% missing data. But don’t worry about using these functions. Simply highlight the actual data and click on the ‘Forecast Sheet’ button in the ‘Forecast’ group of the ‘Data’ tab of the Ribbon (ALT A + FC):
117
All you need to do is specify the final forecast period at the prompt and that’s it. It produces a raw data sheet, together with confidence intervals (to demonstrate potential spread in the forecast error), which looks something like this:
It’s objective: you select the data and Excel does all the hard yards for you. If you have explained the concept in layman’s terms (as above), then all you need to do is revise the forecast and explain where you wish to change the output from what is generated. Simple!
118
CHAPTER 3: Forecasting Considerations
You may create this longhand. The FORECAST.ETS function calculates or predicts a future value based on existing (historical) values by using this Exponential Triple Smoothing (ETS) algorithm. The predicted value is a continuation of the historical values in the specified target date, which should be a continuation of the timeline. You can use this function to predict future sales, inventory requirements or consumer trends. This function requires the timeline to be organised with a constant step between the different points. For example, that could be a monthly timeline with values on the first of every month, a yearly timeline or a timeline of numerical indices. For this type of timeline, it’s very useful to aggregate raw detailed data before you apply the forecast, which produces more accurate forecast results as well. The FORECAST.ETS function employs the following syntax to operate: FORECAST.ETS(target_date, values, timeline, [seasonality], [data_completion], [aggregation]) The FORECAST.ETS function has the following arguments: • target_date: this is required. This is the data point for which you want to predict a value. The target_date may be date / time or numeric. If the target_date is chronologically before the end of the historical timeline, FORECAST.ETS returns the #NUM! error • values: this is required. The values are the historical values, for which you want to forecast the next points • timeline: this is also required. This is the independent array or range of numeric data. The dates in the timeline must have a consistent step between them and cannot be zero (0). The timeline isn’t required to be sorted, as FORECAST.ETS will sort it implicitly for calculations. If a constant step cannot be identified in the provided timeline, FORECAST. ETS will return the #NUM! error. If the timeline contains duplicate values, FORECAST.ETS will return the #VALUE! error. If the ranges of the timeline and values are not of the same size, FORECAST.ETS will return the #N/A error • seasonality: this argument is optional. This is a numeric value with a default value of 1. This means Excel detects seasonality automatically for the forecast and uses positive, 119
whole numbers for the length of the seasonal pattern. 0 indicates no seasonality, meaning the prediction will be linear. Positive whole numbers will indicate to the algorithm to use patterns of this length as the seasonality. For any other value, FORECAST.ETS will return the #NUM! error • The maximum supported seasonality is 8,760 (number of hours in a year). Any seasonality above that number will result in the #NUM! error • data_completion: this argument is also optional. Although the timeline requires a constant step between data points, FORECAST.ETS supports up to 30% missing data, and will automatically adjust for it. Zero (0) will indicate the algorithm to account for missing points as zeros. The default value of 1 will account for missing points by completing them to be the average of the neighbouring points • aggregation: this is the final optional argument. Although the timeline requires a constant step between data points, FORECAST.ETS will aggregate multiple points which have the same time stamp. The aggregation parameter is a numeric value indicating which method will be used to aggregate several values with the same time stamp. The default value of 0 will use AVERAGE, while other options are COUNT, COUNTA, MAX, MEDIAN, MIN and SUM. Function Number
Function
1
AVERAGE
2
COUNT
3
COUNTA
4
MAX
5
MEDIAN
6
MIN
7
SUM
As for me, I’ll just press the button and keep in mind the simple idea behind it all. After all, during the forecasting season, I’d still like to go home and see the folks.
120
CHAPTER 3: Forecasting Considerations
CHAPTER 3.2: REVISING FORECASTS Imagine you had just finalised the budget for a project and (say) it started in Period 3 and ended in Period 8 as pictured:
Suddenly, your boss told you the amounts needed to be reallocated on a “similar basis” but for Periods 4 to 15. That’s fairly straightforward, as this duration is double the original project length, so you would just attribute half of each period’s amount to the new periods, viz.
But what about a general solution? How would you cope with project advancements or delays and changes in duration at the same time? It sounds pretty horrible, but truth be told, finance staff face these types of challenge day-in, day-out. Assuming no inflationary factors to consider (e.g. time value of money), the problem boils down to pro-rating the original numbers across the new number of periods. The revised start and end dates tell you when the calculations begin, but in essence it is the number of periods in the revised forecast that drives the calculations. You can follow our explanation in the downloadable Excel files that come with this book. Sorry for the algebra, but sometimes that’s what’s needed in a financial model. Let’s assume our original forecast has x periods going from start period t1 to end period tx, and the revised forecast has y periods going from revised start period r1 to revised end period ry.
In this illustration, r1occurs after t1, but this does not have to be true necessarily. Regardless of start and finish dates (which simply governs when the calculations are made), there are basically three scenarios: 1. x > y, i.e. the revised forecast duration is shorter than the original one 2. x < y, i.e. the revised forecast duration is longer than the original one
121
3. x = y, i.e. the durations of both forecast periods are equal (this effectively simply moves the forecast period). Let’s focus on the first scenario for a moment as it brings into focus how we could go about calculating the revised forecast. If the original duration were longer, then the revised forecast will consider the effects of more than one original period in each period, e.g.
In this graphic, the red boxes / yellow shading represent original periods and the blue boxes / borders denote a revised period. If x > y, then the blue box must straddle at least two red boxes. It could be more though, which is what is depicted here, where we have: • a start period, where this is the proportion of the earliest original period considered • middle [or full] period(s), which (when x > y) are original periods that must be fully included. There could be more than one. If x < y, then the middle (full) period is not defined • an end period, which is the proportion of the final original period considered. Sounds confusing? Let’s explain with an example:
In the original forecasts, the cashflows of $1 to $8 (big spenders here!) were allocated across the first eight periods for a total of a rather exorbitant $36. However, the revised forecast wanted the same profile over just periods 4 to 6 (three periods). That is, the start date t1is period 1, x is 8 and the final period tx(t8) is period 8. The start and end dates (r1 and r3, periods 4 and 6 respectively) for the revised forecast just denote when the forecast starts and stops. The key information is that there are only 3 (y)
122
CHAPTER 3: Forecasting Considerations
periods. This means that each period in the revised forecast includes 8/3 (known as the Period Factor in the attached Excel file) which equals two and two-thirds (2.67) periods of the old forecast data viz. • Revised Period 4 = Old Period 1 + Old Period 2 + 2/3 of Old Period 3 = 1 + 2 + (2 x 3) / 3 = 5 • Revised Period 5 = 1/3 of Old Period 3 + Old Period 4 + Old Period 5 + 1/3 of Old Period 6 = (1 x 3) / 3 + 4 + 5 + (1 x 6) / 3 = 12 • Revised Period 6 = 2/3 of Old Period 6 + Old Period 7 + Old Period 8 = (2 x 6) / 3 + 7 + 8 = 19. Our attached Excel file identifies which original periods are used in each revised period,
what the start, middle / full and end periods are,
and what proportions to use of each:
These then cross-multiply the original forecast numbers for the appropriate periods using the SUMIF and SUMIFS functions to get the values explained above. When the revised forecast period is longer than the original one, the problem is slightly simpler as there are no middle / full periods (i.e. no period of original data is ever in just one revised period). Otherwise, the logic remains the same. For those who are interested or are insomniacs, the detail is discussed below, but do feel free to skip it…
Devil’s in the Detail Let’s use the associated Excel file to talk through the formulae we used.
The first section captures the original forecast (inputs) in cells J13:Q13 and automatically computes the start and end period using the array formulae for MIN(IF) and MAX(IF) (cells G16 and G17 respectively). LU_Original_Forecast_Data represents the inputs in row 13 and 123
LU_Periods denote the counters in row 12; the prefix “LU” means nothing more than “Look Up” to highlight that these range names are created to ease referencing. These formulae must be entered using CTRL + SHIFT + ENTER as IF will not work across a range (an array) of cells otherwise (unless you have Dynamic Arrays – more on that in Chapter 10.1). The next section is the Revised Forecast assumptions:
This collects the required start and end periods in cells G27 and G28, together with an error check in cell H28 to ensure that the end period is not before the start period. The first part of the next section simply collates all of the dates to be used:
The key calculation here is the Period Factor (cell H55) which divides the original forecast duration by the revised forecast duration. This represents the number of original periods in each revised period and this is pivotal to all of the calculations. The next part of this section works out how the original periods are reallocated to the revised periods:
124
CHAPTER 3: Forecasting Considerations
The Revised Flag (row 63) use the formula =AND(J$62>=$H$50, J$62=J$68, J$68 * J$690), MIN(Period_Factor, 1) * (J$69-J$68 + 1),), Period_Factor - J$72) Erm, lovely… Again, once you get your head wrapped around it, it’s not so bad. The two IF conditions required (inside the AND expression) check that the periods are not zero and that the end is not before the beginning (as discussed above). If this test is passed, it takes the MIN(Period_Factor, 1) (you cannot count more than the forecast amount in an original period) 126
CHAPTER 3: Forecasting Considerations
and multiplies this by the number of full original periods in the revised period. This is then restricted so that the sum of the Start Part % and the Full Part % cannot exceed the Period_ Factor. This number is calculated only to keep the End Part % honest. Talking of which… The End Part % (row 74), =MOD((Period_Factor - SUM(J72:J73)) * J$63, 1) just mops up the rest of the Period_Factor where the flag is active. This is equal to the section highlighted:
This concludes the percentages needed. We now have identified which periods are the Start Middle and End and what proportions we require for the Start and End. “All” we have to do is multiply it out:
I say “all” because we’ve left the best to last… =(SUMIF(LU_Periods, J$67, LU_Original_Forecast_Data) * J$72) + (SUMIFS(LU_Original_Forecast_Data, LU_Periods, ">="&J$68, LU_Periods, "="&J$68, LU_Periods, "