132 48 17MB
English Pages 312 Year 2021
Business Intelligence Demystified Understand and Clear All Your Doubts and Misconceptions About BI
Anoop Kumar V K
www.bpbonline.com
i
ii
FIRST EDITION 2022
Copyright © BPB Publications, India ISBN: 978-93-91030-087
All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
www.bpbonline.com
Dedicated to Late Shri V.K. Kumar
My father, for inspiring me with his actions and righteousness
iii
iv
About the Author Anoop Kumar V K is a BI professional with over 15 years of experience in BI. Since June 2020 he is working as a freelance data and BI consultant. He is also the founder and managing director of PublicBI UG based in Munich. He started his corporate BI career with Infosys in the year 2006. Since then, he has implemented, led, and managed several end-to-end BI projects across various locations in various industries in various roles using a variety of tools and technologies for customers such as Target, Geogia-Pacific, BT, Telstra, BG group, and E-on Energy Trading. In 2015, Anoop joined Wirecard’s Issuing Division at Munich as a Business Intelligence Business Analyst. He led the BI team, came up with the agile KABI methodology, and played a key role in building the BI solution from scratch for the division in less than a year of team formation. In 2017, while still working for Wirecard as a Principal Data Analyst, he founded his own company, PublicBI UG in Munich with an idea and passion to make better use of public data (superset of open data). In 2018, PublicBI won 2nd prize at EU Datathon 2018, Brussels for the PublicBI EUProc solution, after which he was invited to present the solution at various locations. In his last permanent role as the Head of a department within Data Services division, Wirecard Munich he built, led, and managed a global and central department consisting of 3 teams of project managers, business analysts, and data analysts. Outside work, Anoop likes to be with family and friends, travel, read, blog about BI, meditate, and write his thoughts in his personal blog at http://www.akclarity. com. He can be contacted through his BI website - https://www.akvkbi.com or LinkedIn.
v
About the Reviewers v Harel Sagiv is a BI expert with a rich professional history that includes over 30 years of implementing, developing, and leading ERP projects, IT Master Plans, and BI projects. He specializes in designing BI architectures that are focused on providing business added value while ensuring a robust data infrastructure for future requirements.
In the last 15 years, Harel was responsible for the design and delivery of over 10 full life cycle SAP BI solutions for large companies in Israel, Europe, and the USA. On every project he has been committed to empowering and mentoring BI developers. Harel is an Industrial Engineer and holds a master’s degree in Business Administration.
v Pavan Kumar Bandaru has 20+ years of experience in IT and has worked with various Fortune 500 customers across the world in building Business Intelligence and analytical platforms for deriving better insights. As a leader, in technology and his ideations on strategic and tactical solutions helped many industry vertical leaders to strategize their data needs effectively. He is a strong security & compliance advisor to organizations in delivering techniques related to information protection and data loss prevention.
Pavan is an active speaker, guest lecturer at various institutions, author, technical reviewer, and a blogger on emerging technologies.
vi
Acknowledgement There are many people I want to thank for all the direct and indirect support they have given me while writing this book. First and foremost, to my wife and my entire family for putting up with me while I spent many weekends and nights writing. I could have never completed this book without their support. Again, to my wife, Rekha for her support in data collection, research, and first level review of this book and providing me inputs from a reader’s perspective. My gratitude goes to both the technical reviewers: Pavan Kumar Bandaru and Harel Sagiv, firstly for accepting the roles of technical reviewers and then for taking time out of their busy schedules and carrying out excellent technical reviews of the first 5 chapters and the last 5 chapters respectively. I think it’s important to state here that while I fully acknowledge and appreciate Rekha’s support in gathering some of the information, Pavan’s and Harel’s technical reviews and valuable inputs, I take full responsibility for all content that I have chosen to finally include or exclude and for the order in which it has been structured. I would also like to thank companies such as Pixabay (for images) and Google (for Docs, Sheets, Slides, etc.) as their publicly available services have helped me a lot during the writing of this book. To my colleagues, ex-colleagues, friends, ex-employers, clients, and partners for all the interactions on BI topics. To all the well-wishers across the world for encouraging me to write this book. Finally, I would like to thank the entire team at BPB Publications, and to Siddhant Jain, to whom I owe a great deal for providing me such valuable and excellent comments and suggestions.
vii
Preface Even though Business Intelligence (BI), the way it is in use now, has existed since the 1990s there is still a lot of confusion about it. Since 2016, I have answered hundreds of questions about BI in one of the online Q&A forums and blogs. While answering these questions I noticed that the reason for most of these questions were actually because people have misunderstood BI, and there is lack of credible information. Most of the answers and articles that one can find online are by content marketers, vendors and experts in SEO (search engine optimization) but probably with no hands-on experience in managing, designing or building BI solutions. It actually started to bother me that a lot of articles and answers about BI was/are written by people who have never worked on BI or used BI. I wanted to support people by providing experience-based answers, but these answers in Q&A forums, even after receiving several votes, were still not easy to find in the long list of answers from both reliable and more unreliable sources. That’s when I clearly understood that while there is no dearth of books, articles, blogs, and Q&A forums about BI, there is definitely a shortage of consolidated unbiased and credible information about BI. This is what motivated me, a BI professional, to take time out and dedicate it to write a book about BI and clarify all of the misconceptions and myths about BI that I am aware of. Therefore, the primary goal of this book is to clarify and demystify several myths, misinformation, misconceptions/misunderstandings about BI based on industry experience and provide information about BI in an unbiased and simplified way. But in order to do that, some of the topics have to be dealt in more detail than others. So, you will notice that different topics are dealt intentionally at different levels of detail. As the focus of the book is to clarify those concepts that are currently misunderstood, some of the simple, well-understood, well-documented, and non-confusing concepts in BI such as OLTP vs OLAP, types of OLAP, etc. are not explained in this book. This is a book that enlightens the reader with the reality of BI where reality has been distorted. In terms of the positioning of this book, this isn’t a beginner’s book nor it is an advanced book, it actually lies somewhere in between. You may also find that there are contradictions between what you find in other sources written by non-practitioners and this one. No attempt has been intentionally made to either concur or disagree with others, entire effort and focus has been solely to provide facts and clear out misconceptions.
viii
It is not expected that you will become a BI expert by reading this book. But it will definitely help you lay a very strong foundation and provide the right perspective about BI which can help you in multiple ways such as driving the right BI initiatives, allocating the right budget, building a better BI career, deciding if you want to get into BI or not, hiring the right team member, growing and managing your BI team better, building your BI solutions better, managing your BI projects successfully, choosing the right vendor, etc. Once you get a good hold of the concepts, you will be able to understand any new related concepts quicker because you understand why it is coming up and also be enabled to challenge the buzzwords. You will be in a position to come up with improvements too. If you are only after surface level knowledge to be able to just work in a project as a BI developer or some other role, you don’t need to read this book, you may feel this as an overdose. But if you are interested in getting the overall picture of what happens in a BI project, this book is for you. The target audience could be anyone who is interested in BI, any IT or non-IT professional, BI team members, BI aspirants, managers, business owners, founders, entrepreneurs, decision makers, students, etc. Irrespective of whether you are at C-level or a BI developer, an aspirant or a student, you would be able to get something out of this book. If you want to know BI, you could read this book. If you think you know BI, you should read this book. If you don’t know whether you know BI or not, you must read this book. Finally, I would like to mention that a lot of effort and thinking has been applied to try and capture as much information from my experience and as of date, apart from the effort spent by the reviewers, publishers, etc., I have spent over 800 hours in this effort. I will consider the effort fruitful if at least some of you benefit from this book and recommend this to your colleagues, management, friends, and family. To get the full benefit from this book, I would recommend you go through the chapters in a sequential order. However, if you are totally new to BI, you may glance through the chapter 10 after chapter 1 to get a quick idea about some of the BI concepts and then continue with chapter 2. Note that Chapter 8, which has been intentionally placed there even though I have been advised to move it to the end. I didn’t want to end the book with a topic that, for some, for now, may seem off-track. In each of the chapters, first, some of the concepts that are relevant at that stage are explained and then some of the misconceptions and misinformation related to the concepts discussed are clarified.
ix
Chapter 1 clarifies what exactly BI is, what it is not, and what are the different wrong notions about BI. It provides a clear definition of BI and explains in detail each of the terms used in the definition of BI. It then focuses on clarifying some of the myths related to BI and the chapter ends by clarifying the details about the coinage of the term business intelligence. Chapter 2 describes in detail the different uses of BI and the processes that support the main uses. It touches upon the side benefits and importance of BI. The confusions and misconceptions around business analytics, data analytics, and data mining are clarified in this chapter. It then ends by covering the topic of evolution of BI. Chapter 3 introduces various types of BI based on various parameters. It then clarifies some of the myths about self-service BI and real-time BI. Chapter 4 first explains the main phases in a BI journey and then explains the challenges faced in each of the phases in a BI journey. Chapter 5 introduces a typical internal BI team structure, describes some of the BI organizational models and then introduces the main roles in BI. Confusions around some of the roles in BI are clarified in this chapter. Chapter 6 explains different components that add to the cost of BI, provides sample calculations for the total cost of ownership and ROI (return on investment) for BI. It also clears out some of the misconceptions about the costs of BI and ROI for BI. Chapter 7 presents several ideas to achieve success with BI. The ideas are grouped into different sets based on different groups to which these are addressed. Chapter 8 may seem as a diversion from the theme of this book, however, as mentioned above it is intentionally placed after chapter 7 “Ideas for Success with BI” as it introduces Individual Business Intelligence which focuses on how an individual can also benefit from BI which is also an idea that you can use to achieve success with BI. Chapter 9 describes various BI architectures, highlights the pros and cons between architectures, and also clarifies that BI architecture does not have to be complex. Chapter 10 introduces various technologies, tools, and concepts that are commonly used in BI. It also clarifies about the confusions about different categories of tools and also demystifies some of the concepts including data lake.
x
Downloading the coloured images: Please follow the link to download the Coloured Images of the book:
https://rebrand.ly/0ca5bf Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : [email protected] Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family. Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at : [email protected] for more details. At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
BPB is searching for authors like you If you're interested in becoming an author for BPB, please visit www.bpbonline.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. The code bundle for the book is also hosted on GitHub at https://github. com/bpbpublications/Business-Intelligence-Demystified. In case there's an update to the code, it will be updated on the existing GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https://github.com/bpbpublications. Check them out!
PIRACY If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit www.bpbonline.com.
REVIEWS Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about BPB, please visit www.bpbonline.com.
xi
xii
Table of Contents 1. What is Business Intelligence.................................................................................... 1 Structure................................................................................................................... 2 Objectives................................................................................................................. 2 Introducing BI......................................................................................................... 2 Real examples of misconceptions about BI......................................................... 3 Reasons for misconceptions about BI.................................................................. 4 Definition of Business Intelligence..................................................................... 7
Key terms in the definition of BI.............................................................................. 7
Working of BI........................................................................................................ 12 Realities of BI......................................................................................................... 13 BI is a concept................................................................................................... 13 BI doesn’t solve problems on its own................................................................ 14 Insights from BI is one of the inputs for decision making................................ 15 Ideal BI solutions are rare................................................................................. 16 BI solutions serve a variety of users................................................................. 17 BI is not always expensive and a multi-year project........................................ 20 Demystify coinage of BI....................................................................................... 20 Conclusion............................................................................................................. 22 Points to remember.............................................................................................. 22 Multiple choice questions.................................................................................... 23 Answers............................................................................................................ 25 Questions............................................................................................................... 25 2. Why do Businesses Need BI?................................................................................... 27 Structure................................................................................................................. 27 Objectives............................................................................................................... 28 Introducing Walget – A retail chain example................................................... 28 Main uses of BI...................................................................................................... 31 Decision making............................................................................................... 32 Business performance management.................................................................. 34 Finding opportunities and identifying problems.............................................. 35 Processes that support the main uses of BI....................................................... 36 BI reporting....................................................................................................... 36
xiii
BI reporting versus operational data transfer........................................................ 37
Analytics........................................................................................................... 38 Data mining...................................................................................................... 40 Reporting versus analytics versus data mining in BI...................................... 41 Importance of BI................................................................................................... 44 Why BI is a must have?........................................................................................ 46 Benefits of BI.......................................................................................................... 47 Side benefits of BI.............................................................................................. 48 Which sectors use BI?........................................................................................... 50 Users of BI and purposes.................................................................................. 51 Evolution of BI...................................................................................................... 53 Conclusion............................................................................................................. 56 Points to remember.............................................................................................. 56 Multiple choice questions.................................................................................... 57 Answers............................................................................................................ 59 Questions............................................................................................................... 60 3. Types of Business Intelligence................................................................................ 61 Structure................................................................................................................. 61 Objectives............................................................................................................... 62 BI types based on various parameters............................................................... 62 BI types based on type of analytics.................................................................... 63 BI types based on types of decisions.................................................................. 65 BI types based on solution hosting................................................................... 66 BI types based on solution ownership............................................................... 68 BI types based on type of software license........................................................ 70 BI types based on data freshness....................................................................... 74
Myths about real-time BI....................................................................................... 75
BI types based on sectors.................................................................................. 76 BI types based on departments......................................................................... 77 BI types based on BI integration approach....................................................... 77 Varieties of BI implementation........................................................................... 79 Agile BI............................................................................................................. 79 Out-of-the-box BI.............................................................................................. 80 Self-service BI................................................................................................... 82 Myths about self-service BI.................................................................................... 83
xiv
Concluding remarks on SSBI................................................................................. 84
Conclusion............................................................................................................. 85 Points to remember.............................................................................................. 86 Multiple choice questions.................................................................................... 88 Answers............................................................................................................ 90 Questions............................................................................................................... 90 4. Challenges in Business Intelligence ...................................................................... 91 Structure................................................................................................................. 92 Objectives............................................................................................................... 93 Main phases in a BI journey................................................................................ 93 Initiation phase................................................................................................. 94
Trigger for BI initiatives......................................................................................... 95
Implementation phase....................................................................................... 96 Live phase.......................................................................................................... 97
Further development.............................................................................................. 97 Enhancements........................................................................................................ 97 Maintenance and support...................................................................................... 97 Migration............................................................................................................... 98
Challenges faced in the initiation phase............................................................ 99 Resistance to the BI initiative........................................................................... 99 Building a good business case for BI.............................................................. 101 Acquiring sponsors and promoters................................................................. 101 Getting it prioritized....................................................................................... 102 Challenges in the implementation phase........................................................ 102 Data and information challenges.................................................................... 103 People challenges............................................................................................ 105 Process challenges........................................................................................... 106 Technology challenges..................................................................................... 107 Challenges in the live phase.............................................................................. 108 Challenges faced by BI users........................................................................... 108 Challenges faced by BI technical team............................................................ 109 Conclusion........................................................................................................... 112 Points to remember............................................................................................ 113 Multiple choice questions.................................................................................. 113 Answers.......................................................................................................... 116 Questions............................................................................................................. 116
xv
5. Roles in Business Intelligence............................................................................... 117 Structure............................................................................................................... 117 Objectives............................................................................................................. 118 Setting the context.............................................................................................. 118 Typical BI team structure................................................................................... 119 BI organizational models................................................................................ 120 NGDE.................................................................................................................. 121 SGDE................................................................................................................... 122 CGCE................................................................................................................... 123 CGDE................................................................................................................... 124
BI roles and responsibilities.............................................................................. 125 Technical roles in BI........................................................................................ 126
Techno-functional roles in BI.......................................................................... 133
Management roles in BI................................................................................. 137
Business Intelligence Administrator.................................................................... 126 Business Intelligence Architect............................................................................ 127 Business Intelligence Developer........................................................................... 129 Business Intelligence Quality Assurance Engineer............................................. 132 Business Intelligence Analyst ............................................................................. 133 Business Intelligence Business Analyst............................................................... 135 BI Analyst vs BI Business Analyst...................................................................... 136 C-Level Role ........................................................................................................ 139 Head of Business Intelligence............................................................................... 140 Business Intelligence Team Lead ......................................................................... 140
Exclusions....................................................................................................... 141
Data Steward........................................................................................................ 141 Data Migration Engineer..................................................................................... 141 Project Manager................................................................................................... 141 Suffixes and Prefixes............................................................................................. 142
Conclusion........................................................................................................... 142 Points to remember............................................................................................ 142 Multiple choice questions.................................................................................. 143 Answers.......................................................................................................... 145 Questions............................................................................................................. 146 6. Financials of Business Intelligence...................................................................... 147 Structure............................................................................................................... 147 Objectives............................................................................................................. 148
xvi
Cost of BI.............................................................................................................. 148 People cost....................................................................................................... 149 System cost..................................................................................................... 152 Total cost of ownership.................................................................................... 155 Team options......................................................................................................... 156 Hardware options................................................................................................. 156 Software options................................................................................................... 156
ROI for BI............................................................................................................. 157 ROI for BI – Complex..................................................................................... 158 ROI for BI – Simple........................................................................................ 163 Examples of saving time................................................................................. 166 ROI for BI – Side benefits............................................................................... 168 Conclusion........................................................................................................... 169 Points to remember............................................................................................ 169 Multiple choice questions.................................................................................. 170 Answers.......................................................................................................... 172 Questions............................................................................................................. 172 7. Ideas for Success with BI........................................................................................ 173 Structure............................................................................................................... 173 Objectives............................................................................................................. 174 Ideas for management........................................................................................ 174 Approaches...................................................................................................... 174 BI team setup.................................................................................................. 179 Ideas for BI teams............................................................................................ 183 Approaches...................................................................................................... 183 Ideas for prioritization.................................................................................... 196 Ideas for BI users................................................................................................. 197 Unconventional ideas for BI teams.................................................................. 200 Approaches for development........................................................................... 200 Ideas to deal with data quality........................................................................ 209 Conclusion........................................................................................................... 213 Points to remember............................................................................................ 213 Multiple choice questions.................................................................................. 214 Answers.......................................................................................................... 216 Questions............................................................................................................. 217
xvii
8. Introduction to IBI................................................................................................... 219 Structure............................................................................................................... 219 Objectives............................................................................................................. 220 What is IBI?.......................................................................................................... 220 Points and connections................................................................................... 222 Trigger for IBI................................................................................................. 223 How to start with IBI?........................................................................................ 224 Generic steps................................................................................................... 225 Specific steps................................................................................................... 225 Learnings.............................................................................................................. 226
Conclusion........................................................................................................... 227 Points to remember............................................................................................ 228 Multiple choice questions.................................................................................. 228 Answers.......................................................................................................... 229 Questions............................................................................................................. 230 9. BI Architectures........................................................................................................ 231 Structure............................................................................................................... 231 Objectives............................................................................................................. 232 BI architecture - Explained................................................................................ 232 Examples of BI architecture............................................................................... 232
Data-in-place BI architecture............................................................................... 233 Data repository-based BI architecture.................................................................. 238
Sample BI Architecture................................................................................... 243 Conclusion........................................................................................................... 244 Points to remember............................................................................................ 244 Multiple choice questions ................................................................................. 244 Answers.......................................................................................................... 245 Questions............................................................................................................. 245 1 0. Demystify Tech, Tools and Concepts in BI......................................................... 247 Structure............................................................................................................... 248 Objectives............................................................................................................. 248 Technologies and tools....................................................................................... 249 Technologies commonly used in BI................................................................. 249 Tools commonly used in BI............................................................................. 251 Why so many technologies and tools used in BI?........................................... 252
xviii
Where’s the boundary of BI?........................................................................... 254 DV versus RAP.............................................................................................. 254
ETL versus ETL Tool...................................................................................... 257
Data visualization tool......................................................................................... 254 BI reporting and analytics platform..................................................................... 255 Pros of ETL tool over hand coding....................................................................... 257 Cons of ETL tool over hand coding...................................................................... 258
Concepts............................................................................................................... 259 Concepts commonly used in BI...................................................................... 259 Data mart and data warehouse....................................................................... 262
Data mart............................................................................................................. 262 Data warehouse.................................................................................................... 266
Data lake......................................................................................................... 268 What exactly is data lake?.................................................................................... 269 Use cases of data lake............................................................................................ 270 Myths about data lake and data warehouse.......................................................... 272
Machine learning usage in BI......................................................................... 275 Conclusion........................................................................................................... 275 Points to remember............................................................................................ 276 Multiple choice questions.................................................................................. 276 Answers.......................................................................................................... 278 Questions............................................................................................................. 278 Abbreviations............................................................................................................ 279 References.................................................................................................................. 283 Index....................................................................................................................287-294
What is Business Intelligence
1
Chapter 1
What is Business Intelligence B
usiness Intelligence (BI) means different things to different people. The reasons for this situation are multifold which we will see later in this chapter. What this means for learners is that it is confusing; it places them in a situation where they are not sure which one to trust, not sure which one to learn, and with all the well-marketed and biased information out there, not sure how to separate myths or misconceptions from the facts. So, there is a definite need to reintroduce BI, explain what it really is and what it’s not, and focus on the concept of BI rather than on specific tools and technologies. This chapter attempts to address the aforementioned issues by providing an unbiased definition and explanation of BI concepts based on industry experience. This chapter is very important as it will lay down the foundation for the rest of the book. It is highly recommended to not skip this chapter. After reading through this chapter, you should be able to define BI clearly and explain each of the important terms used in the definition. This chapter will also clear out some of the misconceptions about BI, provide reasons for these confusions, throw light on some of the realities, and help you understand what exactly BI is and what it is not. At the end of the chapter, we will deal with the details related to the coinage of the term BI. It is expected from the reader to at least have a basic understanding of business and information technology (IT). Those who have some knowledge of BI will be able to appreciate the myths, misconceptions, and issues dealt in this chapter. For those who are new, it might be difficult at first to appreciate the misconceptions mentioned
2
Business Intelligence Demystified
in this chapter, yet you will benefit from it and have a good start by learning the concepts.
Structure
The following topics will be covered in this chapter: • Introducing BI o Real examples of misconceptions about BI o Reasons for misconceptions about BI
o Definition of BI v Key terms in the definition of BI • Working of BI • Realities of BI o BI is a concept o BI doesn’t solve problems o Insights from BI is one of the inputs for decision-making o Ideal BI solutions are rare o BI solutions serve a variety of users o It doesn’t cost millions and multiple years • Demystify coinage of BI
Objectives
To understand some of the misconceptions and realities about BI and the reason for such misconceptions. To understand the concept, learn a clear definition, and the contextual meaning of key terms in the definition of BI. The readers will also learn about some of the inputs necessary for decision-making, and become familiar with an architecture of a contemporary BI solution.
Introducing BI
As aforementioned, it has become difficult to separate the myths and misconceptions from the facts. This is probably one of the main reasons why you are reading this book—to demystify BI and clear out such misconceptions. If you were to question 10 people in the IT industry about what BI is, there’s a chance that you will hear 10 or even more different answers. Some of the answers maybe partially correct, some totally incorrect, and if you are lucky you might get one or two correct answers.
What is Business Intelligence
3
Some of the answers that you may hear are as follows: • BI is just a frontend tool to get reports • BI is a tool to get copies of online transaction data • BI is same as data visualizations • BI is same as business analytics or is a subset of business analytics or business analytics is a subset of BI • BI is a portal that provides all information to enable decision-making • BI is everything to do with data including big data analytics. You may hear many variations of this answer as well. Even though some segments of business users—IT professionals, managers, etc., understand BI sufficiently and use it effectively, there is a fair share within those segments, who neither understand nor are able to separate the misconceptions from the facts. Let’s take a look at some of the real-world examples.
Real examples of misconceptions about BI
A senior vice president (SVP) of a large company, once wrongly assumed that the “I” in the term “BI” stands for information (data is what he meant) instead of intelligence. The SVP, like a few others, also thought that the core responsibility of the BI team was to carry out operational data transfer between core systems in the enterprise for day-to-day operations and that the team had nothing to do with the decision-making process. His view was, for decision-making, management information system (MIS) should be used. He is partially right, in the past MIS was used, this was mainly for the top management. Few years ago, an experienced business analyst (BA) in a product company asked me (back then I had just joined as a Business Intelligence Business Analyst), “you are the BI guy, right? Do you guys work on improving the performance of the operational databases so that the performance of the core applications is faster? Is that what you and your team plan to do? Will your team monitor the queries that are hitting the application databases?”. In this case, the BA had got it wrong by equating BI with a database administrator (DBA). In another instance, a business user from the client side in a BI project was not aware that he had the necessary access to create his own reports using Business Objects (BO) tool (a widely used enterprise reporting and analytics platform), which he had access to since years. This BO tool was deployed 4 years earlier and was connected to a data warehouse. He wrongly assumed that the data warehouse was only for data storage purposes. He was pleasantly surprised when I demonstrated to him that he could actually create his own reports and carry out ad-hoc data analysis.
4
Business Intelligence Demystified
In another company, where BI adoption rate was very high (over 90% of the office staff used BI), more than 50% of the BI users in the company assumed that all of the information and insights presented in the portal was developed by the portal development team, the team that had nothing to do with the main part of the BI work. BI users were unaware of the BI team’s existence even though the BI team had over 20 team members. Similar to these examples, there are other real-world examples where BI and BI teams, both have been misunderstood. I have observed such misunderstandings about BI in companies, in social media (LinkedIn for example), in blogs, and in Q&A forums. We can continue listing such examples, but I think you already get the point that there is a misunderstanding, a confusion, and a lack of clarity about BI. Figure 1.1 picks up points from the preceding examples and visualizes the misconceptions from the facts using thumbs-down and thumbs-up respectively.
Figure 1.1: Right and wrong ideas about BI
Reasons for misconceptions about BI
Connecting all of the aforementioned examples, the question we should ask is, why is there so much confusion in the industry about BI? Why is there such a difference in how people perceive BI? Broadly, there are four explanations for it. First, have you heard of the blind men and an elephant story?[15] Six blind men had never known what an elephant looks like, one day each person feels a different part of the elephant’s body as shown in the following Figure 1.2, and describes it differently compared to each other based on their limited experience. That exactly is the first reason. People describe BI based on their limited experience, they haven’t seen or used the whole of it, but only part of it.
What is Business Intelligence
5
Figure 1.2: Blind men and an elephant
Second, market players, especially software vendors, training institutes and consulting companies that expand, contract, or modify the definition to position their products/services in the market and make them stand out. For example, we can notice false claims such as “our tools deal with unstructured data whereas BI doesn’t deal with unstructured data”, “BI is limited to descriptive analytics only, whereas our tool offers predictive analytics”, and so on. Furthermore, almost every other day a new technical term is introduced, few of which go on to become buzzwords. Once there is a buzzword, new BI vendors and IT service companies, with an army of consultants promote it and take it forward by offering tools, solutions and services around the buzzword. They pitch it to their clients and some clients fall for it and go ahead with implementation even if there wasn’t any need to introduce that new technology, tool, software, etc., for that particular business. Soon after, every other company irrespective of whether it’s a product company or a service-based company start their own initiatives to not be left behind. Big data is a good example for such a buzzword. The cycle of how new technical terms are repeated is depicted in Figure 1.3:
Figure 1.3: The cycle of new technical terms
6
Business Intelligence Demystified
So, it naturally becomes very difficult for people to not get confused. And most practitioners, who have worked on real projects, have built BI solutions, and continue to work in BI, don’t usually take out time to write and clear the confusion. Third, a segment of stakeholders, mostly business users, wrongly assume or are made to believe that BI is just a reporting tool or a portal from which they can get their reports. As the frontend (reporting tool, portal, data visualization tool, etc.) is the only tool that they usually access, they wrongly assume that the frontend is the whole BI solution. It couldn’t be further from the truth. It is almost the equivalent of believing that cars are manufactured in showrooms where they are sold just because that is where we buy them. We all know, even if not the whole process, that there are many stages, for example, sourcing raw materials, manufacturing parts, assembling, painting, etc., involved in the background before a car is made available at the showroom for sale. Similarly, to create the reports that business users need, and to provide a reporting and analytics platform (RAP), there are several backend processes involved in transforming the data into relevant information and insights in a scalable and efficient way. All of these together is what constitutes a BI solution. Similar to how a car dealer is not equivalent to a car manufacturer, a reporting tool is not equivalent to a BI solution. A reporting tool or a portal in itself is not a BI solution, it’s a part of the BI solution, and remember it’s usually the only part of the solution that business users get to use/see. Fourth, unfortunately there aren’t any authorized bodies or organizations to provide a standard definition of BI. There are multiple organizations that have defined BI but unfortunately it is not standardized. Forrester[16] defines BI as “A set of methodologies, processes, architectures, and technologies supported by organizational structures, roles, and responsibilities that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision making that contribute to improving the overall enterprise performance”. This definition is different from Gartner’s definition of BI in 2016,[17] “BI is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance”. As of writing, Gartner has already renamed/updated BI to Analytics and Business Intelligence (ABI) [18] with exactly the same definition as above that was provided for BI in 2016. In this book, we will continue to refer to ABI as BI, which means analytics is a part (subset) of BI as we are using BI as an umbrella term that encompasses business analytics/data analytics. In this entire book, we will always refer to BI in the sense of an umbrella term. As we have seen, Gartner’s definition of BI is different from that of Forrester. Similarly, other organizations have defined BI differently compared to both Gartner and Forrester. If these different definitions and terminologies are overwhelming you, worry not, once you understand the concept, you’ll be able to define it in your own words.
What is Business Intelligence
7
Understanding the concept is more important. It’s important to bear in mind that new fancy names will continue to popup, but the concept will remain the same. The core concept of BI is to use data to derive information and insights in order to support decision-making to improve the business. Technology to implement a concept will always continue to change and evolve, but the concept remains the same. Let me explain this point with a simple analogy, let’s assume a vendor stores their customer’s master data in an Excel file and refers to it as customer master data. Now if the customer data becomes so large that it can no longer be stored in an Excel file but only in a database (RDBMS), we would still call it customer master data and not large data. We don’t change the name of a concept based on the technology or tool that implements the concept. Technology to implement BI, as with any other concept, is constantly evolving. It has become important, now more than ever to provide a definition that is business-oriented, current, and clear, a definition that any businessperson can relate to and can easily remember. I base the following definition on my hands-on experience in implementing BI solutions at various companies across locations that have used BI to improve their business.
Definition of Business Intelligence
Business Intelligence is an umbrella term that refers to the overall process in which information and insights are derived from data in a scalable, efficient, and on-going basis and made available to decision makers to support in data-driven decisionmaking in order to improve their business.
Key terms in the definition of BI
We will now explore in detail each of the key terms in the preceding definition:
• Process: The definition of BI itself should already make it clear that BI is a concept and a process, and not a technology or a prescribed set of tools. Process here is used in the sense of an overarching term that has several processes under it. Process includes technologies, business-specific strategies, tools, methodologies, architectures, best practices, and most importantly the people in the business. If we consider BI as a black box, then what goes inside (input) into the black box is data and the output is information, which then leads to insights. In Figure 1.4, we can see BI as a black box with data as input and information and insight as output of the BI process.
Figure 1.4: BI as a black box
8
Business Intelligence Demystified
• Data: In the context of BI, data is the raw form, it is the transactional or operational level records or stored values from which information and insights can be derived after processing. Data can be a collection of numbers, text, images, audio, video, etc. Data can be stored in any form, size, and location. Data gets generated whenever any event or transaction occurs or just based on the current state or status of some object of interest. Data is not limited to internal business (proprietary) data; data here refers to all sorts of data from any source including external, third party, or market data that is relevant for the business. Internal data, for example, could be employee related data (name, address, phone number, gender, etc.), products and services data, or customer related data (server logs, web clicks, app usage, call centre data, product reviews, ratings, etc.) Figure 1.5 showcases some of the formats and sources of data:
Figure 1.5: Some of the formats and sources of data.
As there is a misconception that BI is limited to only structured and internal data, let me clarify and emphasize that data can be collected from various sources such as customer management systems, billing systems, HR systems, websites, apps, log files, devices, social media, etc. External data on the other hand includes market or syndicated data, for example, Nielsen retail data or open data from any other data provider. Anything and everything that can be used to derive business relevant information is data. The much-hyped buzzword and the so-called big data is also data. We can notice that some people who are new to the field of data are
What is Business Intelligence
9
talking about data and big data as if big data is something outside of data, that’s simply and logically not correct, big data, at max, is a subset of data. It is important to note that in reality there is no such thing as small data or big data, all of the data that can be used to derive information and insights about a business, no matter where it is generated from or how it’s generated, whether structured or unstructured, is an input to the BI process. Data can be stored in files (CSV, Excel, XML, JSON, etc.) or in databases (SQL, NoSQL, distributed) or in any other format. Data could be human-readable or not human-readable (only machine-readable) or both. Note that there is a subtle difference between everyday usage and technical usage of the term data. In everyday usage, data (that which we are talking about now) is often referred to as information but in technical (IT) usage it is referred to as data. Data in BI is the raw data (unprocessed data) for a lay person. • Information: When context and meaning is added to data and it’s arranged appropriately it becomes information. In other words, data has the potential to provide information when context and meaning are added to it. Data is the foundational layer based on which information is built. Context to data is provided through metadata (information about data) and meaning is provided through explanation and description of how to use it and what to use it for. Data is processed to derive information from it. It’s to be noted that unlike other processes where raw material is transformed to a product and that particular raw material no longer exists, in case of data, even after information is derived, raw data continues to exist. • Insight: Insight is a deeper understanding or knowledge gained from processing (analyzing, synthesizing, correlating, drilling up/down, combining, verifying, validating, etc.) of information. It is the knowledge that is gained after carefully studying the information, it is not apparent or obvious at first glance. Insight is knowledge based on which important decisions are made. By carrying out data analysis on data that we have collected on any subject, we can find trends, patterns, identify outliers, correlations, and understand more about the subject. It is important to note that insights are contextual, situational, and they can be subjective. An insight for one category of users may not necessarily be an insight for another category of users. For example, an insight to an HR manager about employee behavior need not necessarily be an insight to a sales manager or a marketing manager in the same company as their responsibilities are different. Insight is the second or next level output of BI, the first level output is information. Based on the information derived from data, insight is obtained when users iteratively look for more answers/ details in BI. Conclusions and decisions are made based on insights. Actions
10
Business Intelligence Demystified
are triggered based on insight. In this book data, information, and insights are used according to explanations provided above. • Business: The term business in the BI definition is not limited to commercial enterprises. It is used in a generic sense, referring to all organizations including government bodies, not-for-profit organizations, even organizations such as police departments, that use BI to improve their operations. In this book the terms business and organization are used interchangeably. • Intelligence: It is that which is known about a subject or a situation. In BI, the subject is business. The US Department of Defense Dictionary of Military and Associated Terms defines intelligence as “The product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning……”, based on this we can state that here in BI, intelligence is the product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning a business. • Data-driven: Data-driven decision-making means that the decision makers (mostly management) of businesses take decisions supported by information and insights derived from data and not just based on feelings or intuition. This in no way implies that all decisions based on feelings or intuitions are always wrong or that all decisions based on BI are always right. When organizations grow, it becomes increasingly difficult, if not impossible, for the active founder or the CEO or only the top management to make all of the decisions. The responsibility of decision-making at different levels is delegated to employees at appropriate levels. In such cases, how can organizations ensure that the decision makers make the right decisions to improve the business most of the time? How can organizations increase the probability of the decisions to be right? BI enables decision makers to make more insightful and fact-based decisions, thereby building a data-driven culture in the business. • Improve: The word improve in the BI definition has been overlooked by many. If data is used to carry out regular operations of the business, let’s assume, for example, a customer orders a book from an online bookstore, the online bookstore company uses this data to process the order and delivers the book at the right address. This usage of data (customer address in this case) to deliver the book doesn’t fall under BI. There is no decision made to improve the existing business. It is a regular business operation or transaction. Capturing orders and delivering books are core functions of this business in the example. When we use data for BI, we are using data for more purposes than what it was originally meant for. BI is an add-on to the core functions. The expectation for
What is Business Intelligence
11
using BI is to improve the business, this includes improvements in products, processes, service improvements, employee performance improvement, gaining new markets, etc. In the online bookstore example, the company could use sales data to identify patterns and use the gained knowledge for improving its business. Let’s assume that, the bookstore identifies a pattern based on sales data that there are considerably less orders on Wednesdays, it can then use this knowledge to take some action, for example, launch a promotion with discounts on books sold on Wednesdays in order to increase the sales. • On-going, scalable, and efficient: BI is not a one-off activity that businesses can carry out for a day or a week and then forget about it. It is a regular and continuous process. Information and insights are derived regularly (at least daily if not more frequently), and the health of the organization is monitored. If you give it a thought, BI in simple terms is actually the automation or semiautomation of the information and insights generation process. Just think through as to how was it done before BI or how is it done in companies that don’t use BI? In most cases, the decision-making process is/was as depicted in Figure 1.6:
Figure 1.6: An example of a decision-making process before BI
When managers needed to make decisions, they either used their intuition or ordered their staff to collect information and generate insights. The staff then went about gathering data manually from various systems, analyzed it, and provided the information and insights to the manager. When the information and insights was not sufficient or as per the manager’s expectations, the staff
12
Business Intelligence Demystified
was asked to look further. This cycle, as depicted in Figure 1.6, would go on to repeat a few more times, until the managers were able to get some reliable information and insights. As you can see, this process was not scalable, nor was it efficient. This is the real pain point that BI is able to address. It automates data collection, storage, data preparation, and presentation. Information is kept ready before it is asked. BI is a proactive way of managing a business. BI is both scalable and efficient. BI can save days or weeks or months of manual work, thereby enabling users to do their job better.
Working of BI
A simplified explanation of how BI works was provided in Figure 1.4. If we consider BI as a black box, then data is the input whereas information and insights are the output from BI. Let’s expand on the black box example—add some data sources on the input/left side and add the users on the right/output side, the result is the following Figure 1.7:
Figure 1.7: Working of Business Intelligence
The earlier black box now contains business specific strategies, infrastructure, technologies, tools, applications, best practices, architecture, and more. The first output (information) is expected to provide a quick overview of the business and trigger ideas or questions in the minds of the users. Based on those ideas or questions, BI users can explore further to get more details or information. This is an iterative
What is Business Intelligence
13
process, where after a few iterations (marked as n times in Figure 1.7) the users get the insights based on which they can arrive at conclusions and make decisions in the hope of improving their business. If the users don’t get ideas or questions when they take a look at the first output (information) of BI, then either they already know everything that BI output is showing them or there is something wrong with the BI setup. Note that as mentioned earlier BI is not a one-off process, BI is a continuous process, a continuous journey from data to information to insight to decisions to business improvements as depicted in Figure 1.8:
Figure 1.8: Data to information to insight to decisions to business improvement
Now that we have covered the definition of BI and its working, let’s look at some of the realities and misconceptions about BI.
Realities of BI
As we’ll look at the realities of BI, we will also clear out some of the myths and misconceptions relevant at this point. The six realities of BI relevant to this chapter are as listed: • BI is a concept • BI doesn’t solve problems on its own • Insights from BI is one of the inputs for decision-making • Ideal BI solutions are rare • BI solutions serve a variety of users • BI is not always expensive and a multi-year project
BI is a concept
BI is a concept, it doesn’t prescribe any particular technology, tools, methodologies, or project/product management techniques. BI is agnostic to technologies, tools, and methodologies. BI, in most cases, is also not an off-the-shelf software or tool that you can simply buy, deploy, and expect it to magically work. To make it very clear, it is not necessary for all of the components of a BI solution to be sourced from the same vendor, we will see more on this topic of components of a BI solution in later chapters, specifically in chapter 9 and chapter 10. Just to provide you a quick idea, in one of the client projects that I worked in 2009, the BI solution consisted of IBM Datastage (an ETL tool), SAP Business Objects (an enterprise reporting and analytics
14
Business Intelligence Demystified
platform) with direct web access for ad hoc query for internal users, Microsoft SQL Server (database for data warehouse), and an in-house built web portal for static reports delivery and access for both internal users and customers among other smaller components. The data warehouse was built using a top-down (Bill Inmon) approach with a staging layer, integration layer, and a customer data mart layer. This project initially used a waterfall project management methodology and then later switched to agile methodology. There were mainly two technical teams, one for new development and enhancements, and another responsible for production support. Both the teams had specialists in ETL tool and RAP. Important note: The technical terms such as data warehouse, data mart, ETL, various data layers, etc are covered in chapters 9 and 10. If you are not at all familiar with these terminologies it is recommended to first glance through the explanations provided in chapter 9 and 10 and then come back to chapter 1. A BI solution is not just a simple tool, it is not a software package that can easily be purchased by a business user, installed and is ready to use as it is being marketed by some of the vendors. Software marketed by such vendors will not necessarily suffice the needs of a business user. BI is a concept. Various parts/components of a BI solution can be bought, however, there is still work that needs to be done to put all of it together, to make it ready for use, unless it is a BI as a service (BIaaS) for a specific product.
BI doesn’t solve problems on its own
BI doesn’t solve problems on its own. Just having a BI solution doesn’t mean it will solve all of your business problems. BI helps in identifying the existence of problems. As we have seen in the earlier section working of BI, the first output from BI is information. BI presents information to the user in the form of a summary or performance overview of the business. Various trends, patterns, correlations, anomalies, and outliers are presented to make it easier and comprehendible for the user to notice any problem areas that need attention. Depending on how well a BI system is built, and how much granular data is available in the data warehouse, the BI system can show where exactly the problem lies. A BI solution won’t necessarily answer the question of why exactly that problem has occurred, and how to fix it unless the answer is available in the data collected. The data collected in BI (data warehouse) is limited by what has been implemented up until that point. So, if there wasn’t enough forethought to collect all of the relevant data, the BI system will be unable to answer those questions right away. If there was forethought and data is available in BI, BI can answer those questions too. In any case, the problems have to be solved by putting the actionable insights provided by BI into action by the decision makers. If decision makers decide not to act fully knowing the consequences, obviously just having a BI solution won’t solve
What is Business Intelligence
15
problems or improve the business. Such problems are beyond the scope of BI and needs involvement of higher management to fix it.
Insights from BI is one of the inputs for decision making
If we were to state that companies make all their decisions based on insights from BI, we would be wrong. This fact might come as a surprise to some, who are new to business and are used to hearing or reading about analytics and insights for decisionmaking, as if there are no other inputs. That is not how a real business works. Quite a lot of decisions in businesses still continue to be based on gut feelings alone. Definitely more and more businesses are moving towards a data-driven decisionmaking approach. But even in such businesses, insights from BI are one of the inputs in the decision-making process. Sometimes, when a decision needs to be made, the decision maker relies on BI to understand the situation first, and sometimes insights from BI do trigger the decision-making process. However, insights from BI alone, even though are key inputs, are not the only input, and does not always result into a decision. There are other inputs too that should be considered before arriving at a decision. Let’s imagine a company that is in discussion with another company to acquire it. Will such information be available in the BI system? No, it's confidential information available only to a select group of people who are involved in acquisition process. Or what about information regarding a company’s plans to launch a new product in the next few months? Will this information be available in the BI system? Most likely, no. Most of the data about a subject/product comes into production/operational systems only after the product is launched. Before launch only a small set of configuration data might be loaded and available. As explained above it is not practical to have every single piece of information required for business decision-making within a BI solution. Apart from information that is not readily available in a BI system, the other main inputs are: 1. Business strategy: A company may knowingly go ahead with a short-term loss-making deal to onboard a strategic customer as part of its business strategy to expand in a new geographical market. 2. Experience: A senior manager who has a vast experience in mergers and acquisitions may decide not to proceed with an acquisition with a particular group of companies based on past experiences, regardless of how good the financial prospects may seem on paper. 3. Intuition: We see this all the time; a lot of people make decisions based on intuition.
16
Business Intelligence Demystified
Here, it is not intended to state that using other inputs is right or wrong, intention is to highlight the reality. So, to summarize, in businesses that use BI, we notice that other inputs are also considered before a decision is made as depicted in Figure 1.9:
Figure 1.9: Inputs for decision making
Ideal BI solutions are rare
Ideal BI solutions are rare and mostly exist only in theory. An ideal BI solution is one in which the solution contains data from all lines of businesses across the globe, including data about all products, customers, employees, partners, suppliers, etc., and the solution is built such that all relevant information is made available for all of the right business users at the right time and the solution is able to answer all the questions all the time. Don’t be under the assumption that there will be a well-built BI solution for the entire organization that has all the data it requires for the business to make all of its decisions. If you have worked for any large company, you would have noticed that there can be multiple BI solutions within the same company. Some, understandably have multiple BI solutions because of inorganic growth, but others due to poor planning, inter-departmental politics, siloed approaches, departmental initiatives vs enterprise initiatives, and so on. When companies acquire other companies, the acquired company most often comes with its own BI tech stack, and it is not easy to migrate to a common BI tech stack. It can take years, or it may be decided to retain the BI tech stack for cost reasons. And another point to note is that not all BI solutions are alike. Some organizations have built just the minimum capability such
What is Business Intelligence
17
as static reporting delivered via intranet or an internal portal or send out reports to users via email with no other access. Whereas other organizations have implemented solutions that contains several capabilities. Capabilities that enables users to interact, view, refresh, create, and publish their own reports and dashboards. Capabilities that enables collaboration between users, carry out ad hoc data analysis, etc. Also, have included web analytics, app analytics and other analytics products into the BI landscape. An architecture diagram for a contemporary BI solution is depicted in Figure 1.10:
Figure 1.10: A sample architecture diagram for contemporary Business Intelligence
We will cover more details on the topic of BI architecture in Chapter 9: BI Architectures. Figure 1.10 provides us with a sense of how a high-level architecture of a BI solution looks. Just bear in mind that the ‘Prepared data layer’ depicted in the figure above is a logical representation which can be implemented using various technologies. Also note that, in case of a large company, it is possible that there are multiple different instances of such BI solutions, for example, one per line of business or one per region, or one per subsidiary company, etc. Moreover, it is very much possible that the tech stack is different in different BI implementations within the same company.
BI solutions serve a variety of users
BI solutions are not only for the top management but it also serves a variety of users. Depending upon which functionalities are available in the BI landscape, a variety of users can take advantage of it. We can classify users broadly into two user types: basic users and power users.
18
Business Intelligence Demystified
A basic user may use a feature used by a power/advanced user but the degree to which they use a feature is different from the degree a power user will use the same tool for. Similar to how two users who use MS Excel software, one may use only basic features of Excel whereas the other user may use advanced features, but both are considered as users of the MS Excel tool. A basic user after acquiring some experience may graduate to a power user group. In one company, a Vice President (VP) may be a basic user, and in another company of the same size in the same industry a VP could be a power user. Data analysts, BI analysts, or any job role whose main job responsibility is to analyze data is always a power user. And usually, those whose main role is not analyzing data fall under the basic user type. More details on BI roles are provided in Chapter 5: Roles in Business Intelligence. Apart from human users there can be other systems that use the output of a BI solution. For example, a rules engine in an insurance company could depend on pre-aggregated results provided by the BI solution to determine whether an existing customer is eligible for a lower premium. Or an email marketing system could depend on BI to narrow down a target group of customers. In such cases a technical account (system user) is used. In different companies, technical accounts are called differently, basically it’s a non-human user. The following Table 1.1 illustrates a typical mapping of some of the functionalities of BI and corresponding user types: Functionality
Basic user
Power user
System user
Static reports
Yes
Yes
No
Interactive dashboards
Yes
Yes
No
Ad hoc analysis
No
Yes
No
Analytics general
Yes
Yes
No
Self-service reports & dashboards
No
Yes
No
Data mining tools
No
Yes
No
App analytics
Yes
Yes
No
CRM analytics
Yes
Yes
No
Web analytics
Yes
Yes
No
Data cleansing
No
Yes
Yes
Rules engine
No
No
Yes
Table 1.1: Functionalities mapped to user types
Let’s take one example to clearly understand how the usage of BI tools by a basic user is different from the usage of the same BI tool by a power user. The following Figure
What is Business Intelligence
19
1.11 is from PublicBI EUProc BI (publicbi.com) solution showcasing one example of a BI dashboard:
Figure 1.11: Example of a BI dashboard
PublicBI EUProc is a BI solution that helps businesses and general public to find trends, patterns, correlations, and anomalies in EU public procurement[25]. A basic user of this dashboard views and interacts with it. For example, uses the filters on the top to drill down and find trends and patterns for specific values or combinations of values, and downloads the dashboard. Whereas a power user in addition to everything that a basic user does would also create new charts or edit existing charts, create new metrics, create new pages of dashboards, or create totally new dashboards based on the same data sources on which the dashboard is built or even include new sets of data sources and merge it with existing set of data sources. Power users may also clean the data before using it in the dashboard. In most cases, top management will be basic users and analysts will be power users. It should now be clear that BI serves a variety of users.
20
Business Intelligence Demystified
BI is not always expensive and a multi-year project
No, it doesn’t cost millions and doesn’t take multiple years to build every BI solution. Businesses are very dynamic these days, gone are the days when businesses had the luxury and patience to implement multi-year BI projects to get the first result out. And gone are the days when businesses had the organization structure fixed for multiple years. Now companies are carrying out organization changes anecdotally at least once a year. A 2013 McKinsey survey found that large scale organization changes were made in most companies more often than every three years, as was the norm. In addition to org changes, companies are acquiring other companies or are being acquired, and companies are introducing new products, product lines at a faster pace than ever before. With these kinds of trends, it doesn’t make sense for most businesses to venture on building BI solutions that will take years to build spending large sums of money, if they do, there are high chances of failure for two main reasons: 1. As organizations restructure themselves, some of the projects and programs get scraped off or are placed on hold as the original sponsor is no longer in the same role and there probably is no other sponsor. 2. The original requirements become obsolete. There was and there is a real need to build BI solutions faster and with less cost. The good news is that with advancement in cloud services and a whole range of offerings from various leading cloud service providers, a first version of usable BI solution can be implemented in a couple of months if not in weeks with zero capital expenditure (CapEx). Hence, there is actually no need to spend millions upfront or run multi-year projects to implement BI solutions before results are visible. In Chapter 7: Ideas for Success with BI several ideas to speed up building a BI solution are covered. Many more such misconceptions about BI are known, however, we cannot clarify all of those in this chapter without covering some more topics of BI. As mentioned earlier we will continue to clear misconceptions in the relevant chapters. In the next section, we will clear the confusion on who actually coined the term Business Intelligence.
Demystify coinage of BI
When I began my journey in the field of BI in 2006, BI was already a hot topic in the IT industry. So, surely it was coined well before I entered the field. Some companies had already implemented BI solutions backed by solid data warehouses and were enhancing it with additional data marts, others had projects in progress and some
What is Business Intelligence
21
big companies were still considering if they should go ahead with a BI project or not. Large IT service companies, Infosys for instance, was restructuring the organization to build a centralized BI department with hundreds, if not thousands of BI consultants by replacing the previous structure in which BI talent was spread across various verticals such as retail, banking, insurance, telecommunications, etc. To arrive at a conclusion on this topic of coinage of BI, I had to do quite a bit of research, and this part of the chapter took maximum time as I have to rely on what is available in reliable books and articles than on my experience. Business Intelligence as an umbrella term, is a field or subject that refers to a set of concepts and methods to improve business decision making using data was coined by Howard Dresner in 1989[1][2]. Some sources[3][4] claim that Richard Miller Devens coined the term Business Intelligence in 1865. I would like to, with all due respect, disagree and contest that claim. By reading the extract about Henry Furnese from the book “Cyclopaedia of Commercial and Business Anecdotes”[5] authored by R.M. Devens to which the sources attribute the coinage of the term business intelligence, it is evident that the author (R.M. Devens) has not used business intelligence as a subject or field related to business data but rather, he had used the term “intelligence” to mean news. It is also interesting to find that the extract about Henry Furnese is in the book “The Banker’s Magazine and Statistical Register''[6] published in the year 1850 (15 years earlier than R. M. Devens book) but without the word “business” in it. So, one could argue that R. M Devens first used the words “business” and “intelligence” together to mean news (Note: the word news was referred to in a context other than business) that could provide an advantage (unfair advantage, as news was fabricated too) to Henry’s business by knowing them earlier than others would come to know of it. Similarly, some other source[8] claim that the term business intelligence was first coined by IBM researcher, Hans Peter Luhn in 1958, based on his paper “A Business Intelligence System”[7]. While there is no denying that the term “Business Intelligence System” was coined by H.P. Luhn, anyone who has read the aforementioned paper would agree that H.P. Luhn proposed a specific system that would disseminate information automatically to overcome the communication challenges that were faced by businesses during those times and it was not addressing the aspects of data analysis, data analytics, decision support, decision-making, or deriving information and insights from data. BI as a concept that we use today is different from a specific business intelligence system that H.P Luhn had proposed. Just like some words in English language (or any language for that matter) can have multiple meanings, the business intelligence system that was proposed by H.P Luhn has a different meaning to any of the BI systems or solutions that we have built based on the concept of business intelligence. Therefore, the conclusion that Howard Dresner coined the term business intelligence is more accurate.
22
Business Intelligence Demystified
Conclusion
With that take on the coinage of business intelligence, we come to the end of the chapter. Initially I had planned to cover the evolution of BI also in this chapter, however, I found that that the evolution of BI will be better understood by learners after understanding the details of why businesses actually need BI. So, we will cover evolution of BI as a section in the next chapter, which mainly deals with the question why businesses need BI. In this chapter, we have covered the definition of business intelligence, explained each and every important term in the definition of BI. As we progress through the chapters, we will go into more details on some of the aspects that we have not gone into detail in this chapter. Here the focus was to get a good conceptual understanding of BI, clear some of the common misconceptions about BI, bring out some of the realities of BI, and clarify that BI is a concept and not limited by tools or technologies. A high-level overview of working of BI was provided. And finally, the topic of coinage of business intelligence has been clarified. It’s important to note that new names for the same concepts or new technology to implement same concepts will keep popping up but once we have a strong foundation with concepts, we should be able to deal with the changes without many difficulties. Also, it really doesn’t matter who actually coined the term business intelligence. As long as we understand the concept and use it to improve the business. In the next chapter you will learn about the need of BI for business.
Points to remember
Some of the key points to remember are listed as follows: • BI is a concept or a process. • BI is not limited by type of data, technologies, tools, or methodology. BI is technology, tools, and methodology agnostic. • BI is essential for business improvement. • BI is not just the frontend that BI business users use. There is a lot in the backend that business users don’t see (they don’t have to). • When context and meaning is provided to data, it becomes information. • The word business in business intelligence is not limited to commercial (for profit) enterprises. • BI provides one of the main inputs for decision-making, but all decisions in businesses all the time are not fully based on BI alone. • There can be more than one BI solution in an enterprise.
What is Business Intelligence
23
• A variety of users, from basic to advanced, are served by BI. • BI solutions can now be built in weeks without any CapEx. • BI is an umbrella term referring to the overall process in which information and insights are derived from data in a scalable, efficient, and on-going basis, and made available to decision makers to support data-driven decisionmaking in order to improve the business.
Multiple choice questions 1. Business Intelligence is a a) Product b) Process/concept c) Technology d) Tool 2. Which of these organizations could use business intelligence? a) A retail chain b) Police department c) Not-for-profit organizations d) All of the above 3. Business Intelligence deals with a) Internal business data b) External data c) Social media data d) All of the above 4. What data is available in BI? a) Any data that business decides to use for BI purposes b) Only financial data c) Only marketing data d) Only daily transaction data
24
Business Intelligence Demystified
5. There is a recurring requirement to notify customers about new offers when the sum of their transactions at a retail store exceeds 1200 EUR in a 6-month period. Which user type of BI should be used? a) Basic user b) Power user c) System user d) None of the above 6. Which of these statements is most accurate about a data-driven organization? a) All decisions are based on BI b) No decisions are based on BI c) Some decisions are based on BI d) Most decisions are based on BI 7. Who coined the term business intelligence in the way it is used now? a) R M Devens b) Howard Dresner c) Hans Peter Luhn d) Henry Furnese 8. You find a series of numbers in a spreadsheet in a shared folder, there are no headers nor any documentation. In the context of BI, how would you classify it? a) Data b) Information c) Insight d) None of the above 9. Business Intelligence a) Includes data visualization b) Includes business analytics and data mining c) Is automation or semi-automation of information and insight generation process d) Includes all of the above
What is Business Intelligence
25
10. Which of these is a buzzword? a) Artificial Intelligence b) Big data c) Data mining d) Data 11. “There is an increase of 2% in number of customers visited per day for the last 10 days”, to a store manager this is a) Data b) Information c) Insight d) None of the above
Answers 1. b 2. d 3. d 4. a 5. c
6. d 7. b 8. a
9. d 10. b 11. b
Questions
1. What are some of the misconceptions of BI that you have come across? 2. What is the definition of business intelligence? 3. What are the four main reasons that there is so much confusion about BI? 4. Why do business users wrongly assume that BI is only the frontend? 5. What is the core concept of BI?
26
Business Intelligence Demystified
6. What does data, information, and insight mean in the context of BI? 7. In the context of BI how is intelligence defined? 8. What are the real pain points that BI addresses? 9. Other than insights from BI what are the other inputs considered for decision making, and why? Explain with an example. 10. Explain how decisions are made in organizations that doesn't use BI? 11. What are some of the reasons for companies to end up with multiple BI solutions? 12. What are the differences between a power user and a basic user? 13. Why doesn’t it make sense for most businesses in current times to invest in BI solutions that cost millions of dollars and takes multiple years to build?
Why do Businesses Need BI?
27
Chapter 2
Why do Businesses Need BI? I
n the previous chapter, business intelligence (BI) was introduced and a high-level overview of working of BI was provided apart from other topics. In this chapter, we will explore in detail about why businesses actually need BI. If you are a business owner, entrepreneur, or are part of a management team, this chapter will provide you enough knowledge to understand why your business or business unit actually need BI and help you understand what you are perhaps missing by not using BI in your organization. Readers will acquire knowledge on important topics such as key uses of BI, its additional benefits and importance, users of BI and the purposes they use it for, thereby building a good foundation on the concepts of BI. When summed up, all of these points answer the question why businesses actually need BI. This chapter has been structured such that it first introduces the reader to the ways in which BI is currently used in various organizations and the processes that support the main uses. At the end of the chapter, the topic of evolution of BI is covered as it helps to relate to the approaches that were used in the past for achieving similar goals that are achieved with BI today. The confusions and misconceptions around business analytics, data analytics, and data mining are clarified in this chapter. Also, clear distinction is made between BI reporting and operational data transfer.
Structure
This chapter will cover the following topics:
28
Business Intelligence Demystified
• Introducing Walget – A retail chain example • Main uses of BI
o Decision making
o Business performance management
o Finding opportunities and identifying problems • Processes that support main uses of BI o BI Reporting
v BI Reporting versus operational data transfer
o Analytics
o Data mining
o Reporting vs analytics vs data mining in BI • Importance of BI • Why BI is a must have? • Benefits of BI
o Side benefits of BI
• Which sectors use BI? o Users of BI and purposes • Evolution of BI
Objectives
Understanding the main uses of BI, processes that support main uses of BI, differences between those processes, and differences between BI reporting and operational data transfer. Understanding the importance of BI for businesses, side benefits and the reason why BI has become a must have for almost all businesses. Readers will also learn about various sectors or industries and types of businesses that use BI, BI user groups, and purposes for which those user groups use BI. Readers will also understand the evolution of BI.
Introducing Walget – A retail chain example
To be able to relate to the concepts discussed in this chapter and for better understanding, there is a need for us to connect to different business scenarios and examples. So we will introduce a fictional company. For the examples to be most
Why do Businesses Need BI?
29
effective and reach wide audience, there is a need to provide examples that most of us would be familiar with. What better example than one from the retail sector, such as a supermarket that most of us are familiar. Therefore, we will consider a fictional company called Walget Supermarkets (Walget). Walget will be used wherever it is appropriate throughout the book from hereon. Even though Walget is a fictional company, the usage in all examples will be as though it were a real company. We will consider that Walget is an international supermarket chain with more than 1000 stores/locations across 100s of cities, operating in multiple countries with more than 100,000 full-time employees. Walget also has an online store and a mobile app that was recently launched. Walget was started in the 1980s. It has grown several times in size in revenue and headcount both organically and by acquiring several of its competitors. Walget currently has multiple BI solutions mainly because of its acquisitions and also due to specific needs of the business. A simplified management structure of Walget is provided in Figure 2.1.
Figure 2.1: Reporting structure of the supermarket chain
As we can see in Figure 2.1, each store has a store manager, each regional manager has around 5 to 10 store managers directly reporting to them, each country manager has 5 to 10 regional managers reporting to them, and all of the country managers report to the global manager. Now we know the business and a simplified management structure of Walget. What about data? What kind of data is available at Walget? A simple event (transaction) of a customer buying groceries from any of the Walget stores generates a lot of interesting data such as date and time of purchase, terminal used for payment, the
30
Business Intelligence Demystified
employee who served the customer, amount and currency of purchase, products purchased, type of payment (card, app or cash), card type and brand, store location, loyalty card number, discount, tax rate, loyalty points gained, returns, etc. Apart from the daily transaction data, data about stores, employees, suppliers, products, prices, inventory, campaigns, etc., are also maintained in different systems for regular business operations. All of these are still in their raw form and it is referred to as data. The following Table 2.1 provides an overview of various groups of data stored by Walget. Customers
Employees
Demographic
Demographic
Purchase history
Salary
Return history
Attendance and leaves
Subscriptions Behavior (both at brick and mortar and online stores) Social media
Products Product categories Prices and costs Performance
Training
Margins
Education
Offers
Performance
Stores Managers Location Sales Performance Costs Revenue Taxes
Manager
Opening hours
Referrals
Suppliers Suppliers’ contact details Payments Invoices Returns Offers Contracts
Inventory
Compliance
Social media
Job application Social media Warehouse
Transactions
Loyalty and campaigns
Syndicated data
Transport
Capacity
Sales
Campaigns
Nielson
Trucks
Stocks
Returns
Membership management
IRI
Home delivery vehicles Schedules
Table 2.1: Some of the groups of data stored by Walget
As every business is different, different sets of data are collected by different businesses. The details in Table 2.1 should not be considered as a comprehensive list of data entities but as one that covers the most common data entities in a retail chain. Starting from the next section, we will refer to Walget as and when necessary. Let’s now start with the main uses of BI.
Why do Businesses Need BI?
31
Main uses of BI
In order to answer this chapter’s main question, why businesses actually need BI? The most practical and simple way is to first understand the applications of BI in modern day businesses. Because, once we answer what are the main uses and benefits of BI, it’ll be easy to understand why other businesses also need BI. There are the three main uses of BI. BI is used to support decision making, managing business performance, and to finding new hidden opportunities and identifying problems proactively. Every organization is different and not all organizations use BI for all of the 3 main purposes, but this doesn’t mean that they won’t use them in the future. It just means that that particular organization has prioritized some use cases over others. The following table 2.2 captures a summarized view of each of the 3 main uses of BI. Business Intelligence Decision making •
Strategic
•
Tactical
•
Operational
Finding business opportunities and identifying problems
Business performance management •
•
Track key performance indicators (KPIs) and metrics.
•
Analyze data to answer specific questions.
•
Find previously unknown trends, patterns, correlations, and anomalies to figure out next steps.
Monitor business at every level (team, department, and corporate). Table 2.2: Main uses of BI
In the following sections, we will discuss each of the 3 main uses in detail. But before proceeding, let’s clarify what is the difference between KPIs (key metrics) and metrics (business metrics). All KPIs are metrics but all metrics are not KPIs. Only a selected few of the metrics that indicate business performance are grouped as KPIs. In a business, there can be hundreds of metrics, for example, number of users, number of visitors, number of likes, number of comments, number of enquiries, etc. These are all metrics and measuring these metrics is important, however, these metrics do not necessarily indicate that a business is growing profitably or meeting its business objectives. Measuring KPIs allows a business to track whether the business is growing profitably or not and whether it is on track to achieve its business objectives or not. KPIs are the most important metrics using which performance of the organization can be
32
Business Intelligence Demystified
measured. KPIs can be at various levels. There can be departmental KPIs, team-level KPIs, enterprise-level KPIs, etc. Also, a metric considered as a KPI at one level may not necessarily be considered as a KPI at another level. For example, a departmentlevel KPI such as training completion compliance ratio for the training department could be considered only as a metric at enterprise or corporate level and not as a KPI. KPIs are derived based on metrics. And KPIs at lower levels could be used as inputs to derive KPIs at higher levels in the organization. Usually, the number of KPIs is between 5 to 10. In case of Walget, the number of employees and the number of customers are metrics whereas sales per employee could be considered as a KPI if the business decides that it is one of the most important metrics which they should monitor regularly to ensure that their business is on track.
Decision making
With Walget as a backdrop, let’s explore the first use of BI—business decision making. Decision making here refers to the support that BI provides to the decision makers (managers in this case) in making business decisions based on information and insights at all levels and all types of decisions such as strategic, tactical, and operational. The following Table 2.3 lists a few scenarios from Walget where BI could support decision making: Decision making Strategic
Tactical
•
Which products • should we invest in the long term?
•
In which locations should we launch • new stores?
•
Which stores should we close down in the long run?
Which promotion/ • campaign should we repeat or launch to meet yearly targets? • How many stores should we upgrade to new technology in • phase 1?
Operational Which products should we stock for the holiday season? How many part-time staff should we hire for the weekend? Which products should be part of the clearance sale and what should be the discount percentage?
Table 2.3: Scenarios where BI is useful for decision making
In the preceding scenarios, the strategic decisions are made by the country managers together with the global manager, the tactical decisions are made by the region managers, and the operational decisions are made by the respective store managers. Similarly, in other businesses, decision makers could be anyone from a CEO to department heads or operational staff. The scope of the decision changes based on
Why do Businesses Need BI?
33
the roles. In the past, BI was used only to address strategic questions and therefore was solely used by the top management, however, modern businesses use BI at all levels. So, scope of decisions could be at different levels—team, department/unit, or at enterprise level. Some general examples of decisions to be made are given as follows: • Which services should we further develop? • Which sector/customer segment should we pay more attention to? • Which market should we focus on? • Which projects or programs or initiative should we continue or scrap or put on hold? • Which departments should we focus on to reduce attrition rate? • Which tools/technologies needs replacement to support current needs? Let’s now consider two examples, one where Walget and the other where a business to business (B2B) company must use BI to support their decision-making process. Situation 1: Walget has to select a software as a service (SaaS) vendor from a list of vendors on the basis of best-fit price ranges. Price ranges quoted by vendors are dependent on the number of users that would use the SaaS application. To determine this, the procurement manager should know how many of the employees are potential users. At this point, the procurement manager could use BI to forecast the monthly/yearly growth rate of the number of users using actual numbers and based on this decide which vendor fits Walget’s requirement. Here, the procurement manager is able to narrow down the list of SaaS vendors based on the information derived from data and eventually select (decide) a vendor based on the information provided by BI. Situation 2: A B2B company intends to upgrade their top 10 customers to premium support to fulfil a strategic objective. Top 10 customers could be based on revenue or profit or some other metric or a combination of metrics that the company has decided. Using BI, as the information is already available in the BI solution, it would be relatively easy and less time consuming for the company to identify the right set of top 10 customers. In both the cases we see that BI is supporting the decision makers based on facts rather than on gut feelings alone. However, note that as discussed in the previous chapter, decisions are not solely based on BI but depends on other factors and inputs as well. The decision to select a particular vendor in situation 1 may also depend on other parameters that are not available in BI, for example, one of the vendors could be a recommendation from the CEO based on his personal relationship. And in the second situation, it could be possible that one of those 10 customers are dropped from the list and the 11th one is included because the relationship manager is aware
34
Business Intelligence Demystified
(not in BI system) about one of those companies closing down shortly because of a financial crisis. These examples should also make it clear that BI supports decision makers in making business decisions and that it doesn’t decide on its own.
Business performance management
The second main use of BI is in business performance management. At every successful business, managers at various levels monitor the business performance of the business unit they are responsible for. This is to keep track of what’s happening in the business and to ensure that the performance is in line with their and the management’s expectations (SLAs, objectives, etc.). Using the right set of metrics and KPIs managers get a quick overview of the performance of the business without necessarily having to go through the details. BI involves automation of the process of collecting, storing, and preparing data in an efficient and reliable way such that it is possible to derive the predefined metrics and KPIs and present it to the managers. BI supports two parts of business performance management—business performance monitoring and business goals achievement tracking. In Table 2.4, we can see examples for the two parts of the business performance management that BI supports in the context of Walget. Business performance management Business performance monitoring
Business goals achievement tracking
To understand what’s happening in the business.
To track whether the performance is in accordance with the set objectives and goals.
•
How many shoppers per day?
•
How many employees are at work and how many on leave? •
•
Sales trend—hourly, daily, weekly, etc.
•
How many orders were received?
•
How many orders are/were processed?
•
And other business metrics at various • levels.
•
Tracking KPIs and metrics at various levels. Tracking adherence to SLAs.
•
Are sales increasing by 1% every day?
•
Are item returns less than 10% of daily sales? Are fraudulent transactions less than set percentage?
•
Is the training completion percentage of all employees above or equal to 95%?
•
Is on-time delivery percentage more than 95%?
Table 2.4: Examples of use of BI in business performance management
Why do Businesses Need BI?
35
Even though both business performance monitoring and business goals achievement tracking may seem similar, they are not, there is a difference. In business performance monitoring, the idea is to continuously monitor whatever is happening in the business, that is there is no set target whereas in business goals achievement tracking it includes comparison between the set targets and the actuals.
Finding opportunities and identifying problems
The third main use of BI is for finding opportunities and identifying problems proactively which otherwise would have been missed or remained unknown. The approach is to either explore and analyze the data to find something specific (answer a specific question) or to explore and analyze data with no specific questions but to find what is possibly hidden and available in the data. The main difference between using BI for business performance management and using BI for finding opportunities and identifying problems is that, in case of the former it is clearly known what needs to be measured and how it fits into the business strategy whereas in case of the latter the expected outcome is unknown. A few Walget examples of finding opportunities (O) or identifying problems (P) are listed as follows: • A data analyst/BI analyst at Walget proactively finds or identifies: o A pattern that every alternate Wednesday between 4pm to 6pm there is a drop in the sales across 100s of stores. (P) o That after every successful campaign the next 2 campaigns (promotions) across stores are not as effective compared to expected outcome. (P) o That every alternate Friday there is a significant drop in employee work hours. (P) o That customers to whom gift cards were provided, came mostly along with family to the stores and this led to a significant increase in the average basket size of such customers. (O) o That job ads posted on Thursdays had a higher chance of finding the right candidate than any other weekday. (O) o Which customers are likely to churn out (stop buying from Walget)? (P) o Which store is likely to make a loss? (P) Now the main uses of BI should be clear. All three main uses of BI explained in this section are supported by the process of reporting, analytics, and data mining both independently and combined. In the next section we will cover each of these three processes that support the main uses of BI.
36
Business Intelligence Demystified
Processes that support the main uses of BI
As aforementioned there are three important processes that support the uses of BI—BI reporting, business analytics, and data mining. These three processes in turn depend on sub processes such as data warehousing (DWH), data integration, metadata management, etc. However, we will limit the explanation to the first level processes and not go into details about the sub processes as these are beyond the scope of this chapter. These sub processes are covered in Chapter 10: Demystify Tech, Tools and Concepts in BI. Figure 2.2 depicts the three main uses of BI supported by BI reporting, analytics, and data mining both independently and as a combination.
Figure 2.2: Processes that support main uses of BI
Let’s go through each of the three processes in detail to understand what it involves and how it supports in the main uses of BI.
BI reporting
BI reporting refers to the process in which BI reports are designed, developed, and delivered to decision makers. It provides a quick and comprehensible overview of the business. In BI reports, data is organized into information summaries with KPIs, business metrics, charts, tables, etc. BI reports enable multi-dimensional representation of data. In this book, BI reports and BI reporting is simply termed as reports and reporting respectively unless mentioned otherwise. These reports support managers in fast decision making, managing business performance, and finding opportunities or identifying problems. BI reporting is achieved by various means such as simple static reports, dashboards, or interactive dashboards. Simple reports and dashboards maybe created by business (non-technical) users using data visualization tools as well as reporting and analytics platforms, however, complex reports are usually developed by technical people such as BI frontend developers. In some of the organizations, there is a lack of clarity between BI reporting and
Why do Businesses Need BI?
37
operational data transfers. Unfortunately, in such cases everything that has to do with data is considered as a BI report, which creates quite a lot of unwanted difficulties in achieving the goals of BI. So, let’s demystify the misconception about operational data transfer in the following section.
BI reporting versus operational data transfer The most important difference between BI reporting and operational data transfer (sometimes referred to as operational reporting) is based on the purpose or intention from the perspective of the business entity that creates the reports. What do we mean by this? Let’s assume that one of Walget’s suppliers, ABC requests Walget to provide sales data of ABC’s products, and Walget agrees to provide it in xml format on a daily basis. The purpose for which ABC requires the data could be for BI or non-BI (operational purposes for example, billing or reconciliation) but from the perspective of Walget, transferring the data to ABC on a daily basis is definitely not BI but simply a file transfer or operational data transfer. Any operational data transfers (sharing of data extracts) between systems even within a business entity is also not considered as BI reporting but as operational data transfer. It is important to note that operational decision making (a type in BI) is different from operational reporting. The differences between BI reporting and data transfer are captured in Table 2.5: BI reporting
Data transfers
Used for one or more of the three main uses of BI—decision making, business performance management, and finding opportunities and identifying problems.
Used for operational purposes such as billing a customer, reconciliations, sharing data with regulatory authorities, transferring data to customers, suppliers, partners, etc.
Provides information and insights. Data is Provides data. Data is usually in the usually summarized in charts, tables, KPIs, lowest possible granularity (at event or metrics, etc., and presented as information. transaction level). The number of records could be in hundreds to millions or more. The point to take away is that it is not fit for direct use by decision makers. The user or recipient of a BI report is a human (a business decision maker) and not an application or a system. Reports must be in a human readable format and must be easily comprehendible by a business user.
The recipient is usually not a human unless it is a case where the process lacks automation. A non-BI report or data extract is meant for machines, so it should be machine readable.
38
Business Intelligence Demystified
Delivered via interactive portals such as Delivered via FTP or SFTP or other similar intranet, SharePoint, emails, BI reporting, means in formats such as csv, xml, JSON, and analytics platforms in the form of etc. static reports, dashboards, or interactive dashboards. Table 2.5: BI Reporting vs Operational Data Transfer
It’s important to note that operational data transfer has been excluded intentionally from main uses of BI. This is because operational data transfers such as providing data extracts or dumps to regulators, partners, customers, suppliers, etc., is not the main use of BI, in fact in most cases a misuse of the BI solution to coverup the shortcomings of other solutions. Just because BI solutions have clean data doesn’t mean that BI solutions and BI teams should be misused for operational data transfers. This issue is listed in Chapter 7: Ideas for Success with BI.
Analytics
There are so many interpretations of analytics that it’s hard for one to not be confused. You will notice that there are hundreds of questions in Q&A forums about differences between analytics, data analytics, business analytics, BI, and a department or a function or a domain specific analytics such as HR analytics, fraud analytics, sales analytics, web analytics, retail analytics, healthcare analytics, etc. To add to the confusion there are hundreds of blogs out there which are optimized very well for search engines (I hope someday search engines are smart enough to rank search results based on authenticity as well) and are propagating either false or partial information. My request and recommendation to learners is, please check who is the author of the blog, what are their credentials, what is the intention of that article, who is the sponsor, and what are the sources, etc. Let’s now demystify analytics. Analytics as an umbrella term is another name for BI.[29] What academics refer to as analytics, IT professionals refer to as BI. Analytics in short refers to business analytics[30][31] or data analytics. Note that “business” is not limited to commercial enterprises and as explained in the previous chapter it is used in its generic sense. Hence, our first conclusion is that analytics, business analytics, and data analytics are used interchangeably and all three mean the same. When any of these three terms are used in the broader sense, we’ll be referring to BI in this book. When analytics is considered as a process within BI, that is in the narrow sense, it excludes reporting and data mining. It is the process of deriving insights through analyzing and exploring data and information. It is a quantitative and fact-based analysis to answer specific questions (asked by business users) and carrying out predictions. In this case, analytics is classified into three types, descriptive, predictive, and prescriptive analytics.
Why do Businesses Need BI?
39
Descriptive analytics is about slicing, dicing, and analyzing a subject or a recurring event in every possible dimension and understanding the influence of each dimension independently and in combination. While reporting deals at macro level and highlevel summaries, analytics deals at a micro level. When analytics are prefixed with a department, function, domain, or sector then they are specific for that respective department, function, domain, or sector. Google Analytics as of writing is an example of descriptive as well as web analytics. In the future Google Analytics could include predictive and prescriptive analytics. Predictive analytics is about using insights from descriptive analytics to predict future behavior of a subject or an event. For example, the price of car insurance offered to a customer could depend on the likelihood of a customer meeting with an accident based on their past driving data. As we are going through the COVID times, let’s look at one example using COVID data. Figure 2.3 showcases an interactive dashboard that can be considered as a starting point for both descriptive and predictive analytics.
Figure 2.3: Example of descriptive and predictive analytics, source: covid-stats.de
If we look at the covid-stats.de dashboard in Figure 2.3, we can see both descriptive facts about Germany’s COVID-19 cases such as number of people infected (Gesamt
40
Business Intelligence Demystified
fälle) with COVID-19, number of deaths (Todesfälle), distribution by gender (the pie charts in the dashboard) and by age (Alterskohorten), and predictions about the doubling rate (Verdopplungszeit in Tagen, that is, doubling rate in days) based on current trends. Prescriptive analytics refers to using data-driven insights for decision-making to select the best option that would lead to desired outcomes. The results of prescriptive analytics influence operational processes. For example, the price of flight tickets, hotel rooms, fees for credit card maintenance, etc., are priced based on optimization methods used in prescriptive analytics.
Data mining
In data mining, the data analyst (data miner) digs into heaps of data without a predefined question, uncovers hidden trends, patterns, anomalies, correlations and presents this new-found information and insights in an informative way to the business stakeholders, sometimes along with recommendations for the next steps. There after it is left to the decision makers to decide what they would like to do with the new-found insight. Data mining is defined as process of efficient discovery of valuable, non-obvious information, and insights from a large collection of data. Data mining is important because business users may not always ask all the possible questions, there could be some unasked questions or some dimensions that have been overlooked. So, the approach is to find patterns, trends, and correlations that are not apparent and are unknown and then figuring out what to do with that new information and insights. There are no predetermined goals other than a generic idea to mine the data. Data mining may not always result in uncovering new insights. The approach is to try and find every possible hidden information and insight and extract as much value as possible from data. If there are no new findings, the process carried out is documented and the activity is closed. In case there are some findings from the data mining process, the BI analysts/data analysts’ hand over the findings and recommendations to relevant business managers. Business managers then figure out how this information and insight can be used for cross-selling, upselling, improving efficiency, closing revenue leakages, decreasing customer churn (number of customers lost), targeted marketing, etc. If the cost of implementation is too high and return on investment (ROI) is negative they may even drop the idea or postpone it and carry out further analysis to find something more viable. All three processes, reporting, analytics, and data mining support each of the main uses of BI as well as influence each other. A finding from the data mining process could trigger further analytics work to be carried out or a new metric or KPI to be added in the standard reports. Information spotted in a report may trigger further
Why do Businesses Need BI?
41
analytic work. In the next section, we will look at the differences between reporting, analytics, and data mining.
Reporting versus analytics versus data mining in BI
In the previous sections we have covered the details about reporting, analytics and data mining in BI. The differences between BI reporting, analytics, and data mining is provided in Table 2.6: Process
At the start
Reporting
Reports are designed, developed, and delivered to decision makers to provide a quick and comprehensible overview of the business. Often, the information and insights are fed or pushed to the decision makers instead of them creating it or actively getting it themselves.
Metrics and KPIs are predefined and agreed upon before reports are produced.
Analytics
Data mining
Analysts and decisions makers explore data and information to gain insights. It is an interactive and iterative process. They use insights to predict future behavior and prescribe decisions that is expected to lead to desired outcomes.
Analyst (data miners) discover hidden information and insights and pass it on to decision makers (usually business managers).
Focuses on specific and known business problems or issues. For example, finding out current customer churn rate or fraud rate and reducing it by 2%.
Problem is not defined. Finding hidden patterns, trends, and correlations that are not apparent and are unknown. Figuring out what to do with that new information and insight.
42
How
Business Intelligence Demystified Decision makers are provided with reports, data is organized into information summaries with KPIs, business metrics, charts, tables, etc. Simple reports maybe created by nontechnical users but complex reports are developed by technical staff.
Level data
of Mainly deals with macro level data to derive macro level information and insights.
Example: Walget store level information such as total sales per day, profit, number of customers, number of employees, utilization ratio, number of products sold, stock levels, etc.
Interactive dashboards with drill down/up, filters, etc., could be a starting point. Business users are able to use these. However, there is no upper limit on the tools used, it depends on the amount and complexity of data to be analyzed. Both statistical and machine learning models are used. Specialists are required. Deals with micro level data to derive micro level information and insights.
Example: Walget customer data analytics. Buying patterns, choices, what influences customers’ purchasing choices, which age group prefers what and when to purchase, how will a certain type of marketing impact the buying choices of the customer, etc.
Simple analysis of data is not considered as data mining as that level of analysis is already covered in reporting and also in the initial stage of analytics. So, data mining is for the technical specialists such as data analysts and not for non-technical users unless they are supported by no-code data mining tools. The tools used in data mining are similar to those used in analytics.
Deals with micro level data to derive micro level information and insight.
Example: An analyst mines log files of all Walget store servers. No reports are available, no metric and KPIs have been defined, no specific questions have been asked. Analyst discovers a pattern that every Wednesday morning the servers across a region are reaching 80% CPU capacity. An analyst mines customer purchase data and finds a pattern that more than 80% of the customers who bought paint churned out (stopped visiting the store) within a month of buying paint.
Why do Businesses Need BI? Level of Macro level business decision decisions. Example, which location should be considered for the next store?
Usually, decisions that deal at a micro or operational level such as event, transaction, or instance of a subject. For example, identifying, marking, or blocking fraudulent transactions based on past data.
43
Depends on the outcome of the data mining process. Some findings may result into changes at the macro level and others at micro level.
Table 2.6: Reporting vs analytics vs data mining in BI
Though there are differences between reporting, analytics and datamining as provided in the table 2.6, in practice, we can see that there is a lot of overlap between these processes. And a lot of people use one of these processes when they actually mean to use the other. There are also misconceptions that a report should only have past data and that it is not as important as analytics. There is no such limitation, and it is wrong to assume that reporting is less important than analytics. For example, one of Walget’s store manager would like to know the sales forecast for one of the products. However, he is not familiar with the tools and lacks time, so he requests a data analyst to provide him with a report. The data analyst analyzes the past data, builds a prediction model, and deploys it as part of a report. The store manager receives the report everyday with the sales forecast. The report that he received in his email is of course static, but it does contain predicted values. If the analyst would have deployed an interactive dashboard instead of a static report, the store manager would’ve been able to use it interactively. The point is, reporting doesn’t always refer to past data. Also reporting is no less important than analytics and data mining. In fact, if an organization hasn’t already built the basic reporting capabilities, all this talk of analytics and data mining are nothing but a fantasy.
44
Business Intelligence Demystified
Importance of BI
How important is BI for a business? What if there was no BI? In this section we will take a look at the importance of BI. Simply put, BI to a business is equal or more important than a dashboard in a car.
Figure 2.4: BI to business is like a dashboard for a car
Yes, BI to business is equal or more important than a dashboard for a car. The more intelligent the car dashboard is, the better results you get. Can a car be driven without a dashboard? Yes, it can be. Can a car be driven better when there is a dashboard that shows you how you are driving? How is the car performance? How much fuel is left? etc., Yes. This is exactly what BI delivers to businesses. It delivers useful information using which businesses can be run better. Businesses can obviously run without BI, however, with BI they can run better. And in this competitive world where there is a need for businesses to operate better than their competitors, there is definitely a need for BI. Just like the dashboard of a car indicates the speed at which the car is going, what is the optimum speed, in how much time you will reach your destination, etc., BI indicates how the business is performing and gives a sense of how it will perform in the future based on past trends. This information allows managers to take necessary actions to achieve the objectives of the business. For a moment, just imagine that there was no fuel level display in the dashboard, the driver would have to every so often come out of the car and manually check if there is enough fuel. By automating the process of continuously measuring the fuel level and displaying it on the car dashboard, makes it available to the decision maker, the driver in this case to whether visit a fuel station or not. It not only saves time but also spares the stress, anxiety, worry, unnecessary stops, delays, inefficiencies, and possible accidents due to car running out of fuel abruptly. Similarly, BI not only
Why do Businesses Need BI?
45
saves time (easy to measure benefit) for the decision makers but also has intangible benefits such as sparing decision makers the stress, anxiety, worry, unnecessary breaks, delays, inefficiencies, possible shutdown due to running out of resources abruptly. Note that we have used a couple of keywords in the above paragraph—measure and automation. If you think about it, at a fundamental level, BI is all about measuring. It is about measuring the performance of the business so that it can be improved. The famous quote “if you can’t measure it, you can’t improve it.” by Peter Drucker, a popular management consultant, educator, and author fits perfectly here. BI acts as a measuring tool for a business. How can managers manage something well or improve it if they don’t have an overview or if they can’t comprehend it? BI helps managers comprehend the business by giving them a complete overview. This could have been done manually as well just like a driver can manually check the fuel level, but then if the managers spent all their time manually measuring (collecting required data, storing it, securing it, putting it together, analyzing it) they wouldn’t have the time to do anything else or have less time to do their main job. That’s where BI comes to rescue and supports managers. Having a BI solution means that there is enough automation to ensure that all of the data collection, storage, and integration is taken care of and that the data is ready for analysis and decision-making. So, the whole point of BI is automating the process of deriving information and insights from data. Now let’s run through a couple of questions again. Can businesses exist without BI? Yes, they can, and they do. Can businesses be profitable without BI? Yes, they can be. Can businesses grow from a small company to a large company (more than 250 employees and $100 million US dollars revenue) without BI in this generation? Probably not. If a small company wants to scale to a large company, then the business needs BI. My effort [20][21] to find companies that don’t use BI but reached yearly revenue of more than $100 million US dollars led to zero results. I couldn’t find a single company which meets that criteria. On the other hand, we can find thousands of small businesses which have never used BI. That finding tells us that BI is required for growth, for scaling the company to a large company. If it's a very small business for example, one where a single person running a mom-and-pop store or a single person with a small office, where the owner himself/herself has direct contact with every customer and the owner is available for the customers at all times, is able to listen to the customers’ feedback directly, and monitor the business then there probably is no need of a BI solution outside of the owner’s mind as the owner knows what to buy and store, what to sell, what to offer, which customers he should give special offers, which customers can be trusted with credits, which products are moving fast, which products have more margins, etc. BI, in the way we know it, is needed only if there is a plan to grow the business and to run it efficiently. In
46
Business Intelligence Demystified
case of Walget, it simply cannot run efficiently and grow without BI. The top-level managers need to know which locations are performing well and where to invest. The store manager needs to know which products to stock up and when. BI is highly important and a critical capability for any business. In fact, it is a must have as we will see in the next section.
Why BI is a must have?
In the 90s and early 2000s mainly large companies invested in BI projects and programs. BI had become a “must have” for large companies because of competition. Whereas for an SME (small and medium enterprise) it was a “good to have” as most SMEs couldn’t venture into a BI program due to the barriers of high costs and long implementation times. Now the situation has changed, even SMEs consider BI as a “must have”. Reason for this shift from “good to have” to “must have” can be attributed to the factors listed in Table 2.7: Factor
Description
Lower costs and The cost of data acquisition and data storage has drastically cloud solutions reduced thereby enabling most organizations to store almost all of the data that they think could be utilized. With several variants of cloud solutions, it is now possible for businesses to start BI programs with minimum capital expenditure (CapEx) or without any CapEx, and with very low operating costs compared to traditional on-premises approaches. Open and free source software
Several open and free source software is available that fulfil various functions in the BI process. This enables smaller companies to start their programs without much cost.
Outsourcing
As the outsourcing model has expanded to various functions of a business, companies started outsourcing BI projects too, thereby bringing in a greater number of specialists, and hence reducing the timelines and costs.
Competition
As more SMEs adopt BI solutions for competitive advantage, all businesses must adopt BI solutions to compete.
Proof of successful There are many examples of successful implementations of BI and implementations the advantages it has provided to businesses in every sector. The resistance within businesses in venturing into BI programs has considerably declined, if not totally ceased. Table 2.7: Why BI is a must have for most businesses?
Why do Businesses Need BI?
47
Benefits of BI
The overall intended benefit of BI is improvement in business, whatever the business might be. If a BI solution is not bringing improvements in business, then there is absolutely no point in continuing with that solution. The need for BI and its importance should be clear by now, BI is a necessity to improve the business, it is required for business to scale. Let’s now see some of the benefits offered through BI solutions: • Ensures information is provided to the right person at the right time in the right format. Back in the old days it was a problem of not having enough data, but now, while there is no dearth of data, there is lack of high-quality business critical information. It is challenging to identify the right set of data and then getting to the right data and information because of the overwhelming amount of data captured and processed in the business. BI abstracts the complexity and provides the much-needed information and insights to the decision makers. • Tracks KPIs and metrics related to various stakeholders of the business such as customers, employees, partners, suppliers, distributors, and competitors. • Supports management to understand all parties involved in the business better, to understand what's happening in the business and be on top of the business and manage business better. • Ensures that all stakeholders involved in the business get consistent information, that is, information should be consistent across departments and functions. Anyone who has been in business is aware of data inconsistency issues, for example, data about a subject area that the controlling department has is different from that of what marketing has for the same subject area which is totally different from what sales has. • Ensures data is reliable by processing (cleansing, transforming, harmonizing, etc.) it before use. • Enables data-based fact-based decision making across the enterprise at all levels. • Enables data-driven insights to action approach. • Enables faster and better reporting process. • Improves efficiency across the organization. • Enables businesses to be competitive. • Supports in reducing costs and expenditure while increasing revenues.
48
Business Intelligence Demystified
• Supports in identification, prevention and stoppage of revenue leakages. • Enables businesses to find growth markets, products, and services. • By tracking key metrics related to regulatory and compliance mandates it de-risks the business. • Allows decision makers to focus on their core work and use information as and when required instead of running behind people to gather data and information at the eleventh hour. When we review each of the details of benefits listed, we find that every single point leads to improvement of the business.
Side benefits of BI
Apart from all of the specifically intended uses there are also some very important and interesting side benefits that businesses can achieve by adopting BI solutions. These are called side benefits because the original plan may not have included these as benefits of a BI initiative or the primary goal of the BI implementation could be something else. There can be some overlap between what we have seen in the benefits of BI and the side benefits of BI, the point I am trying to convey is that, whether a business intends to achieve these benefits or not, it will anyway get these benefits if it implements a BI solution. Table 2.8 lists some of the side benefits: Side benefits
Description
Reduction in fraudulent activities
Fraudulent activities by employees, customers, suppliers, etc., are hindered because of increase in transparency. For example, assuming that Walget store had no BI, the management wouldn’t be able to figure out if the Walget store manager partnered with some suppliers for his own personal gain at the cost of Walget’s loss.
Detecting revenue leakages and inefficiencies
As part of the BI process, a business may uncover some revenue leakages and inefficiencies in processes and people. For example, at Walget store it could identify that some of the perishable products were over ordered, lost track of in the warehouse and under sold.
Why do Businesses Need BI? Improvement in data quality Increase in compliance Better equipped for audits
49
To provide an overview to the management the quality of data has to be good. This inherent requirement to bring data to a high-quality state drives change in the organization. Processes which previously remained unchanged are updated and upgraded to meet better data quality requirements. Better data quality ensures that a business is better equipped for audits and also raises the level of compliance. Better quality data ensures that the decisions that are made are made right.
Financial health transparency
Financial health of a business will be transparent to all authorized stakeholders. There are businesses where the employees, including middle level management, have no clue about the financial health of the line of business they are in when there is no BI.
Reduction in data hoarding and reducing delay in access.
By implementing BI programs, the right people (even those who join later) get the right information at the right time and therefore reduce data hoarding cases. For example, it will become difficult for employees of one business unit to block access to data of employees from other business units. With BI, as processes are established and automated, it ensures that the right people have access to the right set of data and are not denied access because of corporate politics or professional jealousy.
Better utilization of other teams
Any other teams that were inefficiently used in an unplanned manner to resolve ad hoc requests for getting information out from source systems can now focus on their core work as BI takes care of all information needs efficiently. For example, in some organizations, the database administrators (DBAs) have no choice but to query the databases to answer ad hoc questions from management. And therefore, it takes away DBAs time and focus from their core work. With proper BI solution in place, DBAs and other such teams can focus on their core work.
Customer communication and notifications
As part of a BI solution, data is usually cleaned and centralized. Marketing teams could use this clean data to communicate with customers and avoid missing out some customers or sending duplicate communications. For example, Walget would like to target those customers who have bought a particular product between certain dates to inform them about a problem with that product.
50
Business Intelligence Demystified
Using DWH data in OLTP
There is no harm in using the data collected and stored for BI purposes as an input in the online transaction processing (OLTP) process as long as it doesn’t slow down or break the OLTP process. For example, the aggregate total amount spent by a customer in the last one year could be precalculated in the data warehouse and then displayed at the Walget cash counter screen for the store cashier to see and give a discount/ voucher based on the yearly amount during the purchase. Table 2.8: Side benefits of BI
There could be several other side benefits that are not listed above. The point is, there are not just known and intentional benefits but also unintended benefits of having a BI solution in place. BI can do no harm in an ethical business. Corrupt businesses shy away from BI as it tends to expose corruption. Do all industries or sectors use BI? Let’s look at this aspect in the next section.
Which sectors use BI?
Short answer to the question of which sector uses BI is that every sector or industry makes use of BI. I haven’t come across a business where BI cannot be used. This answer is based on the fact that there is still a growing demand for BI professionals across industries. There are at least 3 fact-based ways to find out which businesses use BI: 1. Search BI jobs in any of the IT job portals (worldwide or any specific geography) and check which sector the business operates in. 2. Search for BI professionals in professional networking sites such as LinkedIn and find out which sector they work in. 3. Go to the website of any reasonably sized company or a company in its growth stage (growing from SME to a large company) and search for BI job openings and most likely you will find one. In 2017, I created a report based on live BI job openings. At that point in time I found that almost every industry was recruiting people in BI positions. A random set of live 300+ jobs from https://www.jobsbi.com portal was considered. We can see the percentage of BI job openings across various industries as of 2017 in Figure 2.5.
Why do Businesses Need BI?
51
Figure 2.5: BI jobs across various industries in 2017, Source: www.reportpedia.com
Do note that many sectors could be missing from the above chart as the sample set considered was only 300+ jobs. Again, in April 2020 a sample set of random 200 live BI job openings across various locations were collected (to ensure that I am not biased, in both cases, that is in 2017 and in 2020 I did not collect the BI jobs on my own but outsourced it to my wife) and again it was found that there are BI job openings across almost all sectors including non-profit organizations (NPOs), fashion, insurance, FMCGs, market research, mining, staffing, tourism, judiciary, entertainment, and more. Also note that BI is used across various types of businesses, from familyowned businesses to large public companies. From business-to-business (B2B) and business-to-consumer (B2C) to business-to-business-to-consumers (B2B2C). Opinion: In my view some form of BI is a must have for B2C and B2B2C businesses from day 1, whereas BI becomes a must have for B2B as the business grows.
Users of BI and purposes
Who are the users of BI and what are specific purposes they use it for? In this section, we will look at the list of BI users (category of users) within an organization and the specific purposes they make use of BI. This is different from the user types explained in Chapter 1: What is Business Intelligence? Just to refresh your memory, what we have seen in the previous chapter, in types of users is that users are categorized as beginners or basic users and power/advanced users based on the proficiency level of using the BI solution. Usually the CXOs, if they use BI solutions, are at the basic level and users such as BI analysts, business analysts, data analysts, and some middle level managers are at the power user level. Now we will look at the category of users based on the job function. Within each of these functions, the users
52
Business Intelligence Demystified
often include managers (for example, sales managers), heads, VPs, etc. In Table 2.9 a few user groups and examples of purposes (use cases) are provided. This is not an exhaustive list; this is only to give an idea about the use cases for each user group.
Department Sales
Use cases • Tracking product, service, or program revenues • Comparing business performance across verticals, horizontals, customer segments, and sectors • Calculating and tracking commissions for sales managers
Marketing
• Campaign management • Monitoring and measuring web visits • Subscriptions and opt-in management
Human Resources
• Employee cost and utilization ratio calculation • Ensuring legal and regulatory compliance • Managing job application systems, tracking selection or rejection ratio • Forecasting growth in employee count
Finance and Billing
• Budgeting, cost centre reporting and planning • Accruals management • Profit and loss reports • Revenue management
Product/ Service Management teams
• Monitoring product or service performance
Account Management teams
• Monitoring account performance
• Comparing across product/service categories • Prioritizing product and service development • Revenue tracking and comparisons across accounts • Prioritization of accounts
Why do Businesses Need BI?
Operations
53
• Incident ticket backlog management and prioritization • SLAs management • Tracking call centre stats • Tracking technical performance indicators
Compliance
• Tracking compliance
Higher management
• Overall business management Table 2.9: User groups and example use case scenarios
It is important to understand that even within these user groups mentioned in Table 2.9, there could be 2 sets—direct users and indirect users. For example, in some companies some of the CXOs may prefer not to directly use a BI dashboard due to lack of time and instead prefer to delegate it to their subordinates or analysts and get the required information from the delegates. So, in this case CXOs are indirect users of BI whereas the analysts and other delegates are direct users of BI. The main part of this chapter ends here. Now that we have covered the topic of main uses of BI, the importance of BI, and user groups of BI, we will be able to understand the evolution of BI as explained in the following section.
Evolution of BI
Evolution of BI is actually quite a complicated topic. If you search for references related to evolution of BI, you will find that you are directed in so many different directions and that there are quite a lot of inconsistencies between the explanations. This is because there is still no clear agreement in the industry about what exactly fits within BI and what is outside the scope of BI. As there is no clear agreement on what is in scope of BI and what is out of scope, how can anyone correctly provide the details for its evolution? That’s why it is important to first establish what concepts actually come under the umbrella term BI so that we can connect back to its roots in a more meaningful way. And this is exactly what we have achieved with the current flow of contents. In the previous chapter, we covered the definition of BI and busted a few myths, and in this chapter, we covered the various purposes for which BI is used currently, the processes that supports those uses, and groups of BI users among other topics. This gives an understanding of what comes under BI without limiting your understanding to a specific technology or tool. A different approach has been followed in this book. We have not dealt with the technology concepts such as OLAP, ETL, data warehouse, data mart, data integration,
54
Business Intelligence Demystified
data visualization, data virtualization, etc. Some people go wrong in understanding BI because they directly connect or limit a concept to a specific technology or to a specific tool that they have dealt in one organization. And the problem with that approach is that once you connect a concept in a limited way to a specific technology or to a specific tool then every other technology or tool seems to fall outside of the concept even though in reality it very much fits within the concept. By limiting a concept to a specific tool or technology, the evolution is incorrectly traced back on the basis of the tool or technology and not on the basis of the concept. As we have dealt extensively on the concept of BI in the first chapter and in this chapter, it gives you a solid foundation to build on and explore further based on the concepts in an unbiased way. In the 1980s, Executive Information System (EIS) was developed which takes its roots from the Management Information System (MIS) and Management Decision System (MDS) of the 1960s. Note: Even in 2010, BI systems in some companies were still being referred to as MIS. Data Warehousing replaced EIS. EIS was meant to be used only by executives (top management) of the organization, it died for lack of an underlying infrastructure that should have supported the fancy frontend screens. [13] The original name for data warehousing was Decision Support System.[10] In the 1990s, data warehousing, which forked out from the Decision Support System (DSS) as data-driven DSS, became Data Warehousing and Business Intelligence (DW/BI).[11] As BI was used as an umbrella term, data warehousing became a part (subset) of BI, and the discipline of DW/BI came to be known as BI. The promoters of DSS considered BI as data driven DSS[1]. Now there is an ongoing endeavour by market players to state so-called big data analytics as the evolution of BI. In the chapters 9 and 10 we will learn more about the topic of data warehousing. For now, just have it in mind that BI solutions can be backed by a data warehouse, and that there are BI solutions that don’t have a data warehouse. Over the years a lot of changes have taken place related to technology, tools, and architecture of BI, however, the overall concept has remained the same. BI is still a generic term used to describe the process of leveraging an organization’s internal and external information assets for making better business decisions[10] and managing businesses better. The sizes of most data warehouses are now in terabytes range if not petabytes from just gigabytes in the 1990s. Table 2.10 lists some of the major changes from the 1990s. The table does not indicate that all of the new functionalities replaced all of the older ones but that they mostly complemented older/existing functionalities. We will cover most of these changes as we go through various topics in the following chapters.
Why do Businesses Need BI?
From 1990 • Data warehousing • OLAP
• Dimensional modelling
• Static reports on portal • Desktop based software
• Mainly hand-coding of ETL flows (ETL tools from mid 1990s) • Corporate Information Factory
From 2000 • Mainly ETL tools instead of handcoding ETL flows
• Web based reporting tools • Data Warehouse Appliances
From 2010 to now (2020) • Mobile BI
• Columnar, and NoSQL DBs
• Cloud BI (BIaaS, DWaaS and other combinations)
• Embedded BI
• Data visualizations and data discovery tools
• Semantic layer
• Data Lake
• Real-time BI
• Machine Learning capabilities
• Data Vault
55
• Dashboards
• Advanced analytics
• OLAP (ROLAP, MOLAP and HOLAP)
• Automated and Augmented Insights
• Agile BI
• Off-the-shelf subjectspecific or toolspecific BI • Self-Service BI • Analytics • BI for all
• Using output of BI not only for decision making but also for recommendations • Data Virtualization • Conformed Dimensions
• Master data management
• Integration with R packages and other tools • Google-like search functionalities • In-memory DBs
• Translytical data platforms • Analytics and Business Intelligence • Voice-based analytics
• Data culture
• Data literacy
• Social media data • IOT data
Table 2.10: Some of the major changes in the evolution of BI
56
Business Intelligence Demystified
Note that the term “big data” has been intentionally excluded from the list above because big data is actually nothing more than a marketing term, it’s a buzzword with no specific definition and just a vague definition of 3Vs and 5Vs using relative terms. For BI, as we have seen in the previous chapter, all data (including so-called big data) from which information and insight can be derived is considered as input in the BI process and therefore all data is considered as data.
Conclusion
In this chapter, we have covered the main uses of BI and the importance of BI has been highlighted. We have briefly covered the differences between analytics and the data mining process within BI and have demystified the misconceptions about data analytics, business analytics, etc. We have also introduced Walget as an example to be able to relate concepts to a particular business. Apart from the main benefits of BI, the side benefits of BI have also been covered. It is clear that businesses need BI, all businesses of all types in every industry and sizes need BI, and for almost every role within an organization there is a BI use case. By first focusing on the concepts, we were able to trace the predecessors of BI and understand the evolution of BI better. I hope this chapter motivates some of the business owners to consider implementing BI to put their business on a fast track. If you’re not sure about which type of BI you should consider, it will be clear in the next chapter, as we will discuss about the types of BI and variants of BI in the next chapter.
Points to remember
Some of the key points to remember are listed as follows: • All KPIs are metrics, but all metrics are not KPIs. There can be 100s of metrics for a business, however, usually less than 10 KPIs. • Three main uses of BI are decision making, managing business performance, and identifying problems or finding opportunities. • There is a difference between business performance monitoring and business goals achievement tracking. • BI is used for all types of decision making. • Reporting, analytics and data mining supports decision making, managing business performance, and identifying problems or finding opportunities. • Operational data transfer is not same as BI reporting. • Business analytics, analytics, and data analytics are one and the same. • When used as an umbrella term, business analytics is same as BI.
Why do Businesses Need BI?
57
• Analytics is classified as descriptive, predictive, and prescriptive. • Data mining is about discovering previously unknown information and insights from large amounts of data. • Businesses can exist without BI, especially very small businesses such as mom-and-pop stores. • With BI, businesses can run better. • In simple terms, BI is nothing but automation of the process of deriving information and insights from data. • The number of benefits of BI outweighs any disadvantages (if any). • There are a few side benefits but important benefits of BI. • BI is now a must have for most businesses. Almost every kind of business in every sector uses BI. • BI traces its roots to a combination of data warehousing, DSS, EIS, MIS, and MDS.
Multiple choice questions 1. Businesses use BI for a) Decision making
b) Business performance management
c) Finding opportunities and identify problems d) All of the above
2. BI is used for decision making of which types a) Only strategic decision making
b) Strategic, tactical, and operational decision making c) Only tactical
d) Strategic and tactical decision making 3. A Walget store wants to decide which day they should start the sale (discount offer), which type of decision does this fall into? a) Strategic b) Tactical
c) Operational
d) None of the above
58
Business Intelligence Demystified
4. A Walget store missed the daily sales target, which part of business performance management will definitely address this? a) Business performance monitoring
b) Business goals achievement tracking c) Both of the above
d) None of the above 5. Can any business exist and be profitable without BI? a) No, not possible at all b) Yes, in most cases
c) Yes, in some cases
d) None of the above 6. Which one is the most correct about data-driven organization? a) All decisions are based on BI
b) No decisions are based on BI
c) Some decisions are based on BI d) Most decisions are based on BI
7. Reduction in fraudulent activities, is this an intended benefit or unintended benefit/side benefit of BI for a business? a) Intended benefit
b) Unintended benefit
c) Always unintended
d) Both intended and unintended benefit 8. KPI stands for?
a) Key performance intelligence b) Key performance indicator
c) Key performance information d) Key performance insight
9. In case of Walget, how would you classify “Number of products sold”? a) A key performance indicator b) A performance indicator c) A metric
d) Any of the above
Why do Businesses Need BI?
59
10. Nielsen supplies market data to Walget on daily basis, from Nielsen’s perspective, this is considered as? a) Business intelligence b) Analytics
c) Operational data transfer d) Any of the above
11. Google Analytics is an example of? a) Descriptive Analytics b) Web Analytics
c) Business Intelligence tool d) All of the above
12. Data mining is carried out by? a) Data analysts
b) BI report developers
c) Data warehouse developers d) All of the above
Answers 1. d 2. b 3. c
4. b 5. c
6. d 7. d 8. b 9. c 10. c
11. d 12. a
60
Business Intelligence Demystified
Questions
1. What are the 3 types of decisions under decision making? Give one example for each. 2. What are the 3 main uses of BI? Explain with an example. 3. Why some of the businesses don’t use BI for all 3 main uses of BI? 4. What are the differences between metrics and KPIs? 5. What is the difference between business performance monitoring and business goals achievement tracking? 6. What are the differences between reporting and operational data transfer? 7. Differentiate between reporting, analytics, and data mining. 8. What is the point in data mining when we already have BI reporting and analytics? 9. Can any business exist without BI? If yes, then why do we need BI? If no, why not? 10. What sets of data is stored in a traditional bank? Create a table similar to Table 2.2. 11. What are the side benefits of BI? Why are they grouped under side benefits when they can also be considered under benefits? 12. Why has BI become a “must have” even for SMEs? 13. What was the reason for the failure of EIS?
Types of Business Intelligence
61
Chapter 3
Types of Business Intelligence T
his chapter introduces you to various types of BI. After reading this chapter you should be able to differentiate between various BI types and varieties of BI implementations. In case you are planning to start a BI initiative but are unsure about which type of BI has to be implemented, this chapter should clear any such doubts or confusion. Also, some of the misconceptions about real-time BI and selfservice BI are demystified. It is expected that the reader has completed both the previous chapters as references are made to some of the concepts covered in previous chapters, and we continue to use the Walget example. It is expected that the reader is familiar with terms open-source software and proprietary software.
Structure
This chapter is structured as listed as follows: • BI types based on various parameters o BI types based on type of analytics o BI types based on types of decisions o BI types based on solution hosting o BI types based on solution ownership o BI types based on type of software license
62
Business Intelligence Demystified
o BI types based on data freshness v Myths about Real-time BI
o BI types based on sectors o BI types based on departments o BI types based on BI integration approach • Varieties in BI implementation o Agile BI o Out-Of-The-Box BI
o Self-service BI
v Myths about Self-service BI
Objectives
Getting to know various parameters based on which BI can be classified and understand the various types of BI. Understanding the advantages and disadvantages of Business Intelligence as a Service (BIaaS) over self-managed BI, and open-source BI over proprietary BI. Demystify some of the myths about self-service BI and realtime BI and understand when real-time BI is required and when it is not.
BI types based on various parameters
The types of BI depend on the parameter we choose for classification. There are various prefixes and suffixes that are attached to BI. Based on the type of analytics it can be classified into three different types, based on the type of decisions it can be classified into another three different types, and more. Under this section an effort has been made to cover as many possible types as listed below but do have in mind that there can be more possible types. Mobile BI is not considered as a classification but only an additional feature available in any of the following types and therefore it is not listed below. Distinction can be made between different BI types based on: • Type of analytics • Types of decisions • Solution hosting • Solution ownership • Types of software license • Data freshness • Sectors
Types of Business Intelligence
63
• Departments • Integration approach In the following sections each of the BI types is explained.
BI types based on type of analytics
BI can be classified as descriptive BI, predictive BI, and prescriptive BI based on the type of analytics capabilities it includes. This classification is based on the type of questions that the BI solution can answer. In some classifications, diagnostic analytics is added in the list as one of the types, but in reality, diagnostic analytics is a part or an extension of descriptive analytics. Diagnostic analytics answers the subsequent questions that come up based on the first level of output using descriptive analytics. The first level of descriptive BI provides information such as trends, patterns, correlations, anomalies, etc., about what has happened. The next level of descriptive BI, that is, diagnostic analytics, tries to answer the questions related to why something has happened. Diagnostic BI provides insights. As explained in the Chapter 1: What is Business Intelligence, BI is an iterative process. At first it triggers ideas or questions in the minds of the decision makers, then by interacting with the BI solution, for example, by drilling up/down, filtering, slicing, and dicing the information, the decision maker is able to get to the answer why something has happened. So, we can actually exclude diagnostic BI as a separate type and include it as an extension of descriptive BI as shown in the following Figure 3.1. In this book, where descriptive BI is mentioned, it includes diagnostic BI.
Figure 3.1: BI types based on type of analytics
In the past, most of the popular BI frontend tools were limited only to descriptive analytics. For predictive analytics and prescriptive analytics use cases businesses had to buy/build additional specific tools. However, now[23] most of the BI frontend
64
Business Intelligence Demystified
tools include both predictive and prescriptive analytics capabilities too. As discussed in Chapter 1, What is Business Intelligence, BI is a concept, it is a process, and not limited to any particular tool or technology or features. As new capabilities emerge due to technological advancements which further support deriving information and insights from data, they are automatically included under the BI umbrella. With these changes this particular categorization of BI based on type of analytics is blurring and won’t be necessary in the next few years as by default all three types of analytics capabilities will be available in all of the BI frontend tools. However, for the sake of completion it has been included here. In Chapter 2: Why do businesses need BI, we have already covered the three types of analytics and therefore will not be explained again here. The following Table 3.1 compares the types of BI based on the type of analytics.
Capabilities
Examples
Descriptive BI
Predictive BI
Prescriptive BI
Provides information and insights about what has happened based on available data. And with the extension of diagnostic capabilities, it provides insights on why something has happened.
Predicts or forecasts the future based on past (available) data, based on known trends and patterns. Derives value of an unknown variable based on relationships identified in available data.
Prescribes which options should be opted for course correction or alignment with requirements.
How many customers shopped at Walget store in location xyz last Sunday?
How many customers are expected to shop at Walget store in location xyz next Sunday?
What should be the discount percentage on product abc to get maximum profits in the next three months?
Why did one of the regions have a spike Which Walget in the number of customers are most transactions? likely to switch to online shopping from brick-and-mortar shops? Table 3.1: BI types based on type of analytics
Provides information about which is the best option to take to get to the desired result.
How many temporary workers should be recruited for the predicted peak season?
Types of Business Intelligence
65
In Table 3.1 we have used the words predict and forecast. Most people use these terms interchangeably. However, there seems to be a difference between forecasts and predictions. In the case of forecasts, it considers current state and calculates future state, time series (continuous time dimension) is involved. State or value of a subject in question such as weather, sales, economy, etc., in a specific point in time in the future depends on the current state/value and the subsequent state/value of that subject at equivalent time intervals up until that specific point in time in the future. Whereas all predictions need not necessarily consider the current state nor calculate, and time series may not be involved. Predictions could also be about discrete events. For example, to predict whether a specific customer will default a loan or not, which country will win the football world cup, who will win the election, which item will a customer purchase after purchasing another item, etc. So all predictions are not necessarily forecasts, but all forecasts are predictions. Forecasts are a subset of predictions or in other words predictions are generic, and forecasts are predictions in which current state is considered and future state is calculated for equal time intervals.
BI types based on types of decisions
In the past, BI was mainly intended only for top level management for making strategic decisions. Over the years, top management across businesses and sectors have realized the value of BI, and at the same time the cost of BI solution implementation and operations has reduced drastically. Both of these factors largely have led to enabling all levels of management to use BI in most businesses. And now, even day to day operational decisions by first-line managers are also made with the support of BI. In general, the number of BI users increase as we go from strategic to tactical to operational decision making as depicted in Figure 3.2.
Figure 3.2: BI users at different levels
66
Business Intelligence Demystified
In Chapter 2: Why do businesses need BI? we discussed the types of decisions that are made using BI. This classification of BI is simply based on different types of decisions such as strategic, tactical, and operational decisions which the BI solution is able to support. It’s not uncommon to find multiple BI solutions within the same organization to support each type of decision or the same BI solution built in such a way that it supports all types of decision making for all relevant and authorized users. In Table 3.2 we reuse the questions that were shared in Chapter 2: Why do businesses need BI to indicate the type of questions the BI solution should be able to answer with the backdrop of Walget. Strategic
Tactical
Which products should Which promotion/ we invest in the long campaign should we term? repeat or launch to meet yearly targets? In which locations should we launch new How many stores should stores? we upgrade to new technology in Phase 1? Which stores should we close down in the long In which location should run? we conduct more training?
Operational Which products should we stock for the holiday season? How many part-time staff should we hire for the weekend? Which products should be part of the clearance sale? What should be the discount percentage?
Table 3.2: BI types based on type of decision making
It is to be note that the types of decisions which the BI solution can support decision makers depends on the availability of the required data in the BI solution for such decisions.
BI types based on solution hosting
Based on whether a BI solution is hosted on-premises or on in the cloud, it can be classified as on-premises BI or cloud BI, respectively. BI solutions consist of four main logical layers as listed below: • Data acquisition layer • Data processing layer • Data storage layer • Information presentation layer In some BI solutions, right after the data is acquired it is first processed and then stored whereas in some other solutions the acquired data is stored first without
Types of Business Intelligence
67
processing and then processed later within the storage layer. The former is the wellknown extract, transform, and load (ETL) approach and the latter is the extract, load, and transform (ELT) approach which has already started gaining popularity. The four main logical layers of a BI solution that fits both the approaches is depicted in Figure 3.3.
Figure 3.3: 4 logical layers of BI
Based on where these four logical layers are hosted, on-premises or on the cloud, BI solutions can be broadly classified into three different types: 1)
On-premises BI
2)
Cloud BI
3)
Hybrid BI
In cloud BI there are multiple variants based on various cloud models such as infrastructure as a service (IaaS), platform as a service (PaaS), and SaaS. Within the hybrid BI type there are multiple subtypes based on where each of the logical layers of a BI solution is hosted. In Table 3.3, we differentiate between the three types and also show two of the variants among many more of the hybrid BI type.
68
Business Intelligence Demystified On-premises BI
Cloud BI
Hybrid BI Hybrid BI–1
Hybrid BI–2
Data acquisition
On-premises
On-premises
Data processing
On-premises
Cloud
Cloud
Cloud
Cloud
Cloud
Data storage Information presentation
On-premises
Cloud
Table 3.3: BI types based on solution hosting
Note that it is very much possible to run multi-cloud BI solutions. For example, the data acquisition layer, data processing layer, and data storage layer could be on cloud services provided by a certain cloud services provider (e.g. AWS - Amazon Web Services) different from the provider (e.g. GCP - Google Cloud Platform) where the information presentation layer is hosted. However, there needs to be a compelling reason to choose a multi-cloud approach as it introduces relatively more complexity for administration, operations, and maintenance, and the level of complexity varies based on the chosen cloud model.
BI types based on solution ownership
Based on BI solution ownership, BI can be broadly classified into self-managed BI and BIaaS. BIaaS is a commercial model in which a vendor provides a BI solution as a fully managed service to multiple customers on a subscription basis. When companies use BIaaS they don’t need to retain the technical part of the BI team in the company as all of the technical BI work is carried out by the service provider. The customer could still retain BI analysts and data analysts who use the BI solution but don’t usually build the BI solution themselves. In case of self-managed BI, the BI solution is owned by the business (Walget for example). Even if some or all of the components are hosted in the cloud it could still be termed as a self-managed BI solution if the administration and management of the cloud services is done by the business itself. A self-managed BI solution could be managed entirely by an in-house BI team or by a combination of in-house BI and
Types of Business Intelligence
69
external teams or entirely by team/s of externals, for example by employees of an IT service provider as depicted in the Figure 3.4:
Figure 3.4: Types of BI based on solution ownership
There are both advantages and disadvantages of using BIaaS over self-managed BI, these are captured in Table 3.4: Self-managed BI
BIaaS
Time
Relatively implementation time.
Cost
Expensive. Much higher total cost of ownership.
Staffing
Dearth of skilled BI team Abundance of BI specialists therefore members. Resources split allowing businesses to focus on their between core business core business application. applications and BI.
BI maturity
Low to medium. Not realizing full potential in data through BI.
Best practices
Lesser number implementations of practices.
longer
Shorter implementation time.
Cheaper. Lower ownership.
total
cost
of
Medium to high. Based on experience in similar markets or businesses, the specialists are able to derive more value from data and provide more value through BI solutions.
of Higher number of implementations best of best practices.
70
Business Intelligence Demystified
Scale
More difficult to scale unless Easier to scale as BIaaS is mostly using cloud services. based on cloud.
Data security
Reduced risk of data security. Increased risk of data security. Data Data does not have to be shared has to be shared to servers that are outside the business. managed by a third party.
Dependency
Lower dependency on third Very high dependency on third party providers. party service providers. Table 3.4: Self-managed BI vs BIaaS
BI types based on type of software license
Based on the type of software licenses used in a BI solution, BI can be classified broadly into three types: 1. Proprietary BI or commercial BI 2. Open-source BI (OSBI) 3. Mixed-source BI As we have seen earlier, a BI solution mainly consists of four logical layers. This classification of BI types is based on the license type of the software used to fulfil requirements of each of the four main logical layers. Let’s take a look at the differences between each of the BI solutions based on the type of software license in Table 3.5. Proprietary BI
Open-source BI
Only commercial or Only open-source proprietary or closed- software used for all source software used for all layers. layers. For each layer it could be For each layer it could a different open-source be different proprietary software. software from different vendors.
Mixed-source BI Combination of both proprietary software and open-source software used in the same BI solution. Note that in the same BI solution, for the same logical layer, a combination of open-source software and proprietary software can be used.
Table 3.5: BI types based on software license
Open-source BI: A BI solution that is built using only open-source software is referred to as OSBI. To be clear, by open-source software it is not meant that the software is
Types of Business Intelligence
71
free but that a business such as Walget can have access to the software’s source code and modify it as required. There is of course software that are both open-source and free (zero cost) which can also be used in building a BI solution and it would still be referred to as open-source BI. Now an increasing number of companies are building BI solutions using entirely open-source software. Proprietary BI: When all BI solutions were built using proprietary software, the term proprietary BI did not exist as there wasn’t a need to differentiate. With the advent of open-source software and the usage of open-source software in BI solutions a need arose to make distinction between OSBI and proprietary BI. In my own experience, we did not use the term proprietary BI because in every project at every organization it was always proprietary software that was used for all layers. The solution was simply referred to as BI solution. Mixed-source BI: BI solutions include multiple software in it. One of the responsibilities of a BI architect is to recommend or select one or more suitable and cost-effective software that fulfils the requirements of a data acquisition layer, processing layer, storage layer, and information presentation layer. For one of the layers open-source software could be the best option and for others it might be proprietary software that works better. The choice of software depends on various aspects that are specific to the organization. Some of the BI solutions which previously had only proprietary software are being complemented with open-source software and the other way around. In mixed-source BI there can be several variants based on usage of open-source or proprietary software or both in the four layers of the BI solution. Table 3.6 provides examples of different software that can be used to fulfil requirements of each of the logical layers of a BI solution from each category. In the mixed- source BI, it is limited to only 2 variants, but other variants are also feasible.
Data acquisition
Open-source BI
IBM DataStage, Informatica,
Pentaho Business Analytics CE, Talend Open Data Studio
IBM DataStage, Informatica
Pentaho Business Analytics CE, Talend Open Data Studio
Pentaho Business Analytics CE, Talend Open Data Studio
IBM DataStage, Informatica
Pentaho Business Analytics CE, Talend Open Data Studio
Ab initio Data processing
Mixed-source BI
Proprietary BI
IBM DataStage, Informatica, T-SQL
Variant - 1
Variant - 2
72
Business Intelligence Demystified
Data storage
MS SQL Server, Oracle Exadata
MariaDB ColumnStore, Apache Druid
MariaDB ColumnStore, Apache Druid
MS SQL Server, Oracle Exadata
Information
SAP Business Objects, MicroStrategy
Pentaho Business Analytics CE
Pentaho Business Analytics CE
SAP Business Objects / MicroStrategy
Table 3.6: BI types based on software license .
There are both advantages and disadvantages of using OSBI over proprietary BI as captured in Table 3.7: OSBI Cost
Proprietary BI
Zero or low cost for software.
High cost for software.
Higher people/talent costs as more skilled workforce is required to deal with complexities.
Lower people/talent costs as proprietary BI tools are usually easier to use tools and abstracts complexity.
As there is no license cost, cost Usually, there is a license per BI does not increase as the number user and therefore cost increases as of BI users increases. the number of BI users increase. Vendor
No vendor lock-in.
Bugs and feature request
No dependency on vendors. If Dependency on vendors to fix bugs a business has a technical team, or adding new features. the team can fix bugs and add new features.
Customization
Better suited for customizations. Difficult to customize. Some requirements can be met through configuration. Modules that are not required Some modules even if not required can be removed. cannot be removed. Unpacking the software is usually not an option as warranty can be lost.
Vendor lock-in.
Types of Business Intelligence
73
Capabilities
In most cases OSBI software Usually comes with more features contains lower number than open-source software. of features compared to proprietary software. User interface is well developed Usually, UI, especially GUI, are and intuitive. not good enough for use by business users. Requires some development effort to make it usable.
New features
New features are usually released later on or are included in the community editions later than commercial versions.
Features are released and made standard features based on demand and internal innovation by vendors.
Path to action
Usually, the software has to be enhanced before use. Time needs to be allocated for developing the tool before it can be put to use.
As most of the features are already packaged as part of the software, the BI team can focus on using the tool instead of developing the tool.
Staff skill level
Need staff who are able to Need staff who are able to design code/program. using the tool and configuration.
Support
In most cases only voluntary community support is available. Unreliable support. In some cases, third party support on paid basis is available.
Usually high-quality reliable support through software vendors is available. Additionally, third party support on paid basis is available.
Documentation
Usually, only poor or scattered documentation is available. Documentation is mainly for technical audience.
Good, reliable and centralized documentation is available. Documentation available for both business users and technical users.
Reliability
In general, less reliable compared to proprietary software.
More reliable than OSBI.
Scalability
Usually requires upgrading to enterprise editions (supported editions).
Built for default.
Integration with other tools
Usually doesn’t come integrated with other tools.
Good support for integration with other commonly used tools such as LDAP, SSO, SharePoint, Excel, etc.
Table 3.7: OSBI vs Proprietary BI
large
businesses
by
74
Business Intelligence Demystified
BI types based on data freshness
Data freshness is the duration between the time of data generation as part of any business transaction/event and the time data is available for BI purposes as shown in Figure 3.5:
Figure 3.5: Data freshness
Based on how fresh the data is, which can be seconds, minutes, hours, or days old— BI solutions can be classified as real-time BI and traditional BI/non-real-time BI. In real-time BI there is a negligible duration (less than a second) between a business transaction and that transaction data availability for BI purposes. To avoid confusion, it is not only that transaction data that is required but all the transactions data of the past accumulated up until that point. Again, this is one more classification that exists more for historical reasons than what reflects the current reality. Historically, most BI implementations were limited to a batch-oriented process, that is, data of the previous day was processed on daily basis and stored in the data warehouse to be made available for the decision makers. As the demand increased for fresher (lower latency) up to date data, various solutions were developed and implemented to process and store data in real-time and make it available for BI users with minimum data freshness duration. While we are on this topic, let’s clear some of the myths related to real-time BI as listed below: 1. Real-time BI is a new concept 2. Real-time BI is a must have 3. Strategic decisions require real-time BI 4. Without real-time BI there is no use of BI solution
Types of Business Intelligence
75
There could be more number of myths about real-time BI, however, clarifying these four is expected to clarify others as well.
Myths about real-time BI Myth 1: Real-time BI is a new concept
Some businesses had implemented real-time BI solutions many years ago. For example, even before 2011 there were vanilla BI solutions like BMC Analytics for BSM that provided real-time BI. In 2009, in a fleet management company that I worked at, we had something called Real Time Reports (RTR) as part of the BI solution, it was a set of reports that contained information about workshops and technicians’ allocations, which was actually refreshed every 15 minutes as the minimum time slots were 15 minutes. These examples are provided to drive home the point that real-time BI is not something new, it has been around for many years. However, not all businesses had implemented real-time BI, and in most cases, it wasn’t actually required for those businesses or was not viable. Myth 2: Real-time BI is a must have
No. Some businesses don’t need to have real-time BI. Relevant information should be made available at the right speed at which decisions can be made. For example, if the users of BI are limited in their capacity and can look at the information only once a day to compare the daily trend of the last 7 days there is no point in refreshing the data continuously, especially when additional investment is required to get that capability. Myth 3: Strategic decisions require real-time BI
No. Strategic decisions don't necessarily require real-time BI. Real-time BI is for supporting operational decision-making more than any other type of decisionmaking. Of course, real-time BI can be used for other types of decision-making such as tactical and strategic too but it is not mandatory to have data in real-time for such decisions, batch load or traditional BI will also suffice. For example, Realtime reports within Google Analytics fall under operational BI, the user of these Realtime reports is a web analyst. You don’t expect a CEO or a CFO of a multi-billion-dollar company to continuously monitor web usage all day long. For strategic decisions, it is not the instant data that is required, but historical, clean, integrated, and trustable data. Myth 4: Without real-time BI there is no use of BI solution
While real-time reporting and dashboarding capabilities add to the list of capabilities of BI solutions and enables to take immediate actions as the events are occurring, BI solutions backed by data warehouses that are batch fed are still very much relevant and useful for management. Some of the decisions can only be based on longer term accumulated and calculated data.. The recent event[26] of TikTok app rating falling to a 1.2-star rating from 4.5-stars is a good example. The rating was down only for a
76
Business Intelligence Demystified
few days before Google intervened. Depending on the type of decisions that need to be made, impacted businesses will either include or exclude that 1.2-star rating. If all decisions were made just based on the 1.2-star rating at that point (short-term data), it could lead to wrong results. Let’s look at the preceding example in a generic way with data. In Figure 3.6, we have visualized the rating data of 2 mobile apps—App1 and App2 over a 6-month period.
Figure 3.6: Example of rating of two apps
Let’s assume that the rating of App2 fell to 2 stars on 7th May 2020. Even then the average for May doesn’t go below 4.41 assuming that for all other days App2 had at least 4.5 stars. So, making a conclusion on 7th May based on only recent data that App1 is better than App2 for a strategic decision could lead to wrong actions whereas for an operational decision it might make perfect sense. There is more uncertainty about data that can change through the day than older data that has less chances of changes. So for tactical and strategic decision it is ok if up-to-date data is not available and data is a day old, thus making non-real-time BI still relevant.
BI types based on sectors
BI solutions are also classified based on sectors or industries. These are BI solutions that are specifically built for targeting customers in those sectors. For example, BI for banking, healthcare, manufacturing, retail, energy and utilities, telecommunication,
Types of Business Intelligence
77
travel, etc. Some of the BI vendors offer out-of-box reports, dashboards, and other capabilities that are specific to these sectors. Businesses can start with out-ofbox features for BI purposes but do note that some customizations are definitely required. Every organization is different even if they are in the same sector and therefore customization is usually unavoidable. Some service-based companies offer sector-specific BI solution frameworks and accelerators, developed based on years of experience of implementing solutions for multiple businesses, which speeds up and provides the right direction for BI solution implementation. These are also marketed as sector-specific BI. Depending on the specific needs of a business, a business may decide to implement a generic BI solution, a sector specific BI solution, or both.
BI types based on departments
BI solutions are also classified based on the targeted department or function. These are BI solutions that are specifically built targeting a specific department such as marketing, sales, human resources, customer care, IT operations, finance, supply chain, etc. Similar to BI types based on sectors, some BI vendors offer out-of-box reports, dashboards, and other capabilities that are specific to those departments. The benefit of these specific BI solutions (both sector specific and department specific) is that a business can start with pre-built dashboards and reports right away and customize it as per their needs instead of starting from scratch, thereby saving time.
BI types based on BI integration approach
Based on integration approach, BI can be classified as embedded BI and standalone BI. The Atlassian Jira dashboard that comes with the Jira application (Jira is a powerful work management tool for all kinds of use cases, from requirements and test case management to agile software development.) is a good example of embedded BI. Similarly, any application where data is visualized for analysis within the application such as HR analytics or supply chain analytics are also classified as embedded BI. Embedded BI is about providing BI capabilities as part of the business application instead of as a separate BI application. In case of standalone BI, the BI application is different from the business application (source system/s) in which the data is generated. In Table 3.8, we’ll compare embedded BI with standalone BI. Embedded BI
Standalone BI
Type of decisionmaking
Mostly operational sometimes tactical, strategic.
Data warehouse
Usually no data warehouse.
and All 3 (strategic, tactical, and rarely operational). Most often backed by a data warehouse.
78
Business Intelligence Demystified
Residence of data
Same data as the business application data or aggregated data which co-resides together with the main business application data.
Data is copied over from source application to data warehouse in most cases. BI frontend sources data from data warehouse and usually not from source applications.
Data integration
Usually there is no data integration across multiple application. Limited to data from one application.
Data is acquired from multiple data sources and therefore data is integrated before use.
Scope of BI capabilities
Only as much BI capabilities as BI capabilities are generic. required to deal with the data Can be used on data from within the application. any of the data sources.
Data freshness
Real-time. Data is available Batch, near-real time and as soon as the transaction is real-time. For achieving realcompleted. time data access, BI solution is directly connected to the OLTP application data, although it is an approach that is usually avoided, not impossible.
Sector specific or generic
Mostly sector, domain, application specific.
Type of analytics
All 3 (descriptive, predictive, All 3 (descriptive, predictive, and prescriptive). and prescriptive).
Advantages
Users don’t have to switch Data specialists deal with between applications. BI data challenges and build capabilities seem to be part of scalable solutions. the source application itself. Reusable with other Data used for BI is always in application’s data. sync with application data. Better suited for BI usage.
or Mostly generic and rarely sector specific.
Easier to include other data sources and get a better overview.
Types of Business Intelligence Disadvantages
79
Application performance can Dependency on a separate be impacted due to BI usage. (BI) team that is usually different from the team Siloed information. that develops the source application and therefore Limited BI capabilities. possible delays due to other priorities for BI teams. Chances of data not being in sync between source application and BI solution. Table 3.8: Embedded and standalone BI comparison
Varieties of BI implementation
In the previous section we dealt with various types of BI based on different parameters. In this short section we will look at varieties of BI implementation which cannot be easily grouped into any of those types dealt previously. These varieties of BI listed below are based on how the BI solutions are built. • Agile BI • Out-of-the-box BI • Self-service BI Let’s now go through the details of each of these types of BI.
Agile BI
Similar to how other IT solutions were developed, BI solutions were also built using Waterfall methodology. BI projects were known for multi-year complex projects that most often were delayed further or did not end well or were scrapped in between or did not yield expected results. Even those that succeeded built static reports that would perfectly answer questions that were pre-defined but would require a code change and a deployment to answer a previously unasked question. So, there were problems with both; the approach in development and the deliverables/ functionalities available in the BI solution. Agile BI is in a way a solution to solve those two aforementioned problems. Based on my own experience, it was around the end of 2000s that agile BI started gaining popularity. Agile BI is basically a combination of applying agile development principles in developing BI solutions in smaller but scalable modules with quicker ROI, responding to business changes with quicker implementation compared to traditional approaches, and building functionalities such as self-service BI and
80
Business Intelligence Demystified
interactive dashboards. This enabled business users to respond to business questions quicker than traditional ways. There is absolutely no question that agile BI is the way to go ahead with BI solution implementation. In Chapter 7: Ideas for Success with BI, we will cover more on this topic, especially about agile KABI.
Out-of-the-box BI
BI solutions that come pre-packaged with a set of reports, dashboards, and other BI functionalities for specific applications such as CRM, ERP, HR modules, etc., and are ready for use right after installation and configuration are known as out-of-the-box BI (OOTB BI). This is different from the approach of custom development of a BI solution. BMC Analytics for BSM, Oracle OOTB BI for financials and HR are examples of OOTB BI. BMC Analytics for BSM is separate from the BMC Remedy application. The main advantage of OOTB is that it saves time and effort for businesses as most of the reports and dashboards which otherwise would have to be custom developed by businesses are prebuilt, for example, BMC Analytics for BSM has over 100 prebuilt reports. These 100+ reports could easily take at least 200 person days if they had to be built from scratch. The other advantage is that these reports are built by people (vendors) who understand the underlying data model of the application for which the OOTB is built and therefore the reliability of the reports is higher. As and when the applications are upgraded, the OOTB BI is also upgraded by the vendor. Side note: While I was writing this book, it was interesting to find out that BMC Analytics for BSM and BMC Dashboards for BSM, both based on SAP Business Objects, are nearing its end-of-life and they are already being replaced with BMC Remedy Smart Reporting based on the Yellowfin platform. The point to grasp here is, BMC, by keeping the main application (Remedy) separate from the OOTB BI that it offered, is able to change OOTB BI’s underlying platform from Business Object to Yellowfin with probably no change to the main application. Let’s summarize the differences between OOTB BI and custom-built BI solutions in Table 3.9: OOTB BI
Custom-built BI
Pre-built reports, dashboards, and other BI Built from scratch using either proprietary functionalities, ready for use immediately software or open-source software or after installation and configuration. combination of both. Usually is specific for an application or a Usually is built as a generic solution to few applications that are packaged together work with any of the source application’s such as HR module and CRM from the data. same vendor.
Types of Business Intelligence
81
Saves businesses the time they would have otherwise spent.
Takes time to build.
Higher one-time costs for the software as it also includes the costs for the prebuilt reports, dashboards, and other functionalities. However, the costs in comparison to custom-built will be lower as vendor sells it at a lower price due to sale volumes (vendor can sell it to multiple customers).
Lower one-time costs for the software as no pre-built reports, dashboards, etc., available. However, higher development cost as it is custom-developed specifically for an organization.
As solutions are built by vendors who are Staff developers have to go through a specialists in that field, they leverage their learning curve. expertise. Dependency on the vendor for updates and No dependency on the source application upgrades. vendor. Table 3.9: OTB BI vs Custom
In case you are thinking, isn’t OOTB BI not the same as embedded BI? No, it is not. But as there could be some confusion about the differences between embedded BI and OOTB BI, let’s clear those in Table 3.10: Embedded BI
OOTB BI
Integrated (tightly coupled) with the main Loosely coupled with the main application application. after it is bought. It is usually part of the main application. Pre-integrated by the vendor.
Is a separate module. Data integration and not application integration. Integrated by the customer or professional services of the vendor or by IT services/ system integrator companies.
Is provided by the same vendor as the source application.
Could be provided/sold by any vendor and not necessarily the vendor of the source application.
Specific for application).
Could be for one or more source applications.
that
application
(source
Table 3.10: Embedded BI vs OOTB BI
82
Business Intelligence Demystified
In Table 3.10, what do we mean by provided by the same vendor or any vendor? For example, Walget depends on an employee attendance tracking solution to monitor and follow attendance of its employees. This attendance tracking solution is provided by a vendor X. If X integrates (embeds) BI module as part of the attendance tracking solution and provides it to Walget as part of the attendance tracking solution that’s embedded BI as it is developed by the same vendor. If vendor Y collaborates with vendor X and builds a separate BI module that can be bought separately and configured to work with the attendance tracking solution of vendor X that’s OOTB BI.
Self-service BI
Self-service BI (SSBI) was hyped by some of the BI vendors to be the magic bullet that would solve all of the problems faced by the end users of BI. Starting from around 2010 business managers were almost convinced by claims of a few BI vendors that with SSBI there is no more a need for a BI team, end users can have all that they want without having to wait for BI teams to build reports, dashboards, etc. But even now in 2020 we can see that BI teams are very much in demand and the demand continues to increase. So, what happened with SSBI? Let’s first define SSBI. And then let’s answer some questions and demystify some myths related to SSBI. Self-service BI is a set of features in a BI solution using which authorized end users are able to analyze data interactively to get to information and insights on their own. Capabilities include data profiling, data visualization, create new artefacts (reports, dashboards, etc.), edit existing artefacts, add new simple to medium complexity data sources, schedule reports, drill down/up, explore data, etc., without breaking any of the existing artefacts, without having to code and without involving services of IT or BI team. Here end users refer to non-IT users such as marketing managers, sales managers, finance heads, and VPs. SSBI is one of the means through which a BI solution becomes an agile BI solution. SSBI has become a standard feature in almost every BI solution. However, there are quite a few myths associated with SSBI. Here we will try to clear at least some of the myths or misconceptions about SSBI as listed as follows: • Self-service BI are deployed and operated by non-IT business teams. • Self-service BI is a new set of features and not available in traditional tools. • Once SSBI is installed and configured there is no need for a BI team or IT. • Objective of SSBI is to free up BI team/IT team. • SSBI works on a standard laptop and therefore no server installation is required. • All SSBI users have access to all data
Types of Business Intelligence
83
Let’s now go through each of these.
Myths about self-service BI Remarks on each of the myths about self-service BI are provided as follows: Myth 1: Self-service BI are deployed and operated by non-IT business teams.
Incorrect. End users make use of SSBI, they are not responsible for deploying, configuring, operating and maintenance of the SSBI platform. Installation, configuration, operation and maintenance are all activities performed by IT or BI administrators. Myth 2: Self-service BI is a new set of features and not available in traditional tools.
Incorrect. Self-service BI is not a new set of features, some of the features were available even before 2010. If some businesses still don’t provide SSBI functionalities to its users, it’s not because of the tool, it’s simply because of the bad implementation or internal policies. This is the very reason I have intentionally not provided a table that shows differences between a traditional BI and SSBI. For example, from my own experience Business Objects Webi was available well back in 2008, using Deski (thick client) and Webi (web-based) end users were able to view, edit and create their own reports, drill down/up, slice and dice the data. Some of the advanced users were even modelling the semantic layer. Myth 3: Once SSBI is installed and configured there is no need for a BI team or IT.
Incorrect. The IT team, especially BI team, is still required. No end user (business user) is usually going to develop ETL jobs to load a database on a daily basis. It is not end user’s responsibility to clean the data for all use cases. Using SSBI end users are able to clean/transform data to a certain degree using the data wrangling features but complex transformations, historization of data, etc., is not something the end user is meant to do. BI teams continue to design and implement data warehouses, configure, and administer such that only authorized users have access to data. Myth 4: Objective of SSBI is to free up BI team/IT team.
False claim, even if this was to become true in the future, it is not useful in my view. The argument that SSBI ensures that a business user can free up the time of an IT person is equivalent to the argument that you can cook your own food in a restaurant to free up the time of the cook. What else is the cook supposed to do? To stretch the example a bit more, “you can also purchase the ingredients to cook at the restaurant to free up the time of the restaurant owner or purchase manager”. Some questions that business users should ask themselves are, “Isn’t a VP’s or EVP’s or CXO’s time more expensive than a BI team member in most cases?”, “Would senior management personnel decide to spend their time in front of the system analyzing data or would they rather delegate it to a specialist like a BI analyst or data analyst?”,
84
Business Intelligence Demystified
“So why should a business user free up time of BI or IT team?” Objective of SSBI is to enable business users to be able to get to the right information at the right time. Myth 5: SSBI works on a standard laptop and therefore no server installation required.
Not entirely correct. Yes, some of the SSBI tools can be installed on a standard laptop and used as long as the data is within the limits (CPU, RAM and storage capacity) that the laptop can process. As soon as there are billions of rows in the dataset these tools hang or crash. Most BI tools have a web-based interface to enable users to use SSBI functionalities using the compute power and RAM capacity of the server. Currently a standard laptop of an end user may have 16 GB RAM whereas a decent BI server has between 256 GB to 1 TB RAM. So, for many use cases of BI, a standard laptop will not be sufficient, there is a need to install SSBI platforms on the servers. Myth 6: All SSBI users have access to all data.
Incorrect. SSBI is also enabled by the IT or BI team and access to data is governed by the data governance policies. Even if users install a desktop-based tool on their own they still cannot have access to all of the data in a company unless connections are permitted/authorized to the data sources. Any company of a decent size would ensure that only authorized users have connections to the right set of data. So, this notion that SSBI enables all users to have access to all data is really misleading.
Concluding remarks on SSBI While SSBI is very important, bestows more power in the hands of business users, provides a lot of value and all BI solutions should definitely include SSBI, to state that BI teams or IT teams are not required is far-fetched. SSBI ensures that nontechnical users don’t have to depend on BI or IT team for a lot of their day-to-day information needs. For example, if a new dashboard or report has to be created based on a previously built data foundation layer (tables are connected using the right keys, correct relationships are maintained, etc.), or based on a csv or excel file, in that case end user (refers to BI user) doesn’t have any dependency on the BI team and can go ahead and create it. But if a new data source, example, a new CRM system has to be integrated into the data warehouse, end users depend on BI team. Currently almost all BI vendors offer some sort of SSBI capabilities. Usually the BI team setup/enable this self-service by creating governed and metadata-based reporting using various tools like Business Objects, MicroStrategy, Cognos, etc., at the frontend (user access) of the BI solution. Usually there are other tools (ETL, RDBMS) at the backend of the BI solution. The features that are generally available for an end user as part of SSBI are listed as follows: • Intuitive user interfaces with drag and drop functionality
Types of Business Intelligence
85
• View, create, edit, save, and download artefacts such as reports and dashboards • Possibility to create metrics, dimensions, aggregates, groups, filters, etc. • Schedule reports • Connect to approved data sources and upload data files • Carry out all types of analytics (descriptive, predictive, and prescriptive) • Share and collaborate with other users • Prepare data for visualization and analysis • Blend data from multiple data sources And finally let’s look at SSBI from a Walget’s end user’s perspective with an example. Store manager is one of the end users of BI. His responsibilities include managing the store staff, dealing with customers, customer complaints, suppliers, local authorities, etc. Let’s assume that there exists a dashboard that shows which are the top 5 products sold in that store on a daily, weekly, monthly, quarterly, and yearly basis. Assume that the following situation arises, for reasons beyond his control one of the top 5 products is no longer supplied by the supplier. He would now like to know which is the best replacement for that missing product. He needs to ensure that store profit does not get impacted negatively with this change in product. He is aware that other stores of Walget have tried three different replacements for that product. Which of the 3 products have done well? To find the answer for this, if the data of other stores is already available in the BI platform and approved for his use for comparison, then using SSBI capabilities he will be able to drill down and find the best replacement for the product backed by actual data without having to wait for Walget’s BI team to find the answer, and thereby will be able to make quicker decisions. In another scenario, let’s assume that Walget decided to migrate its database from one provider (deployed on-premises) to another database provider (database-as-a-service). All of the backend changes to adapt the data warehouse to the new database are carried out by the BI team and not by the store manager. This example should help you in understanding the important role SSBI plays and at the same time also clarify the limitations or scope of SSBI.
Conclusion
In this chapter we have covered various types of BI based on different parameters such as type of analytics, type of decisions, solution hosting, solution ownership, type of software license, data freshness, sectors, departments, and based on BI integration approach. It’s useful to be aware of these types to be in a position to choose the right type. It is to be noted that even though considerable effort has been made to cover all possible BI types, there are possibilities that a couple of types or
86
Business Intelligence Demystified
variants are unintentionally missed out. We have also covered differences between some of the types of BI and also learnt the advantages and disadvantages of some types over others. And finally, quite a few misconceptions related to real-time BI and self-service BI have been clarified. This chapter together with the previous two chapters lays a good foundation on BI by covering detailed definition of BI, need for BI and types of BI, respectively. After having understood the importance of BI and so many types of BI available to choose from, do you see any challenges in organizations going ahead with a BI project or a program or an initiative? Do you think there can be any reason why some employees could even be against BI initiatives? Before starting with next chapter, can you think through and come up with a list of challenges in BI? In the next chapter we will deal with some of the common challenges faced in business intelligence.
Points to remember
Some of the key points to remember are listed as follows: • Types of BI depend on the parameter we choose for classification. • Based on type of analytics, BI is classified as: o Descriptive BI o Predictive BI o Prescriptive BI • Diagnostic BI is an extension of descriptive BI. It answers why something has happened. • Forecasts are predictions that are based on current state, calculated and involves time dimension. All forecasts are predictions, but all predictions are not forecasts. • Based on type of decisions, BI solutions are classified as: o Strategic BI o Tactical BI o Operational BI • All three types of decisions could be possible from the same BI solution too. • Four main logical layers of BI are: o Data acquisition layer o Data processing layer
Types of Business Intelligence
87
o Data storage layer o Information presentation layer • Based on solution hosting, BI solutions are classified as: o On-premises BI o Cloud BI o Hybrid BI • When the components or logical layers of BI are spread across on-premises and cloud it’s called hybrid BI. • There are several variants within cloud BI and within hybrid BI. • Based on solution ownership BI solutions are classified as self-managed BI and BIaaS. • Based on software license type BI solutions are classified as o Proprietary BI o OSBI o Mixed source BI • Data freshness, in short, is the duration between the time of generation of the data and the time by when that data is processed, ready and available for BI purposes. •
“Real-time BI is a must have for every business” is a myth.
• There are sector specific BI solutions, for example, Retail BI. • There are department specific BI solutions, for example, Human Resources BI. • Based on BI integration approaches they are classified as Embedded BI and Standalone BI. • There are more varieties of BI implementations such as: Agile BI, OOTB, and Self-service BI. • OOTB BI is different from embedded BI. • Self-service BI is a very important capability in all BI tools.
88
Business Intelligence Demystified
Multiple choice question
1. “What is the percentage increase in the average spend per customer at Walget in the last month compared to previous year same month?” Which type of analytics answers this question? a) Descriptive b) Predictive c) Prescriptive d) All of the above 2. Walget is about to open a new store and they want to ensure they have the right number of staffs for the first week, which type of analytics should they use? a) Descriptive b) Predictive c) Prescriptive d) All of the above 3. Management at Walget is deciding about a multi-year multi-million-dollar supplier contract. Which type of BI supports in this decision-making process? a) Operational BI b) Tactical BI c) Strategic BI d) None of the above 4. In one of the Walget stores, a store manager finds out that one of the products will expire in the next 2 days. Which type of BI supports the store manager in arriving at a clearance price for that product? a) Operational BI b) Tactical BI c) Strategic BI d) None of the above 5. Which of these is not one of the logical layers in a BI solution? a) Data acquisition layer b) Data storage layer
Types of Business Intelligence
89
c) Data processing layer d) Data presentation layer 6. If the data acquisition layer and data processing layer is in the cloud and the data storage layer and information presentation layer is located onpremises, what type of BI is it? a) On-premises BI b) Cloud BI c) Hybrid BI d) None of the above 7. Which type of decision-making is mostly hindered without real-time BI? a) Strategic b) Tactical c) Operational d) None of the above 8. Walget would like to find out why there is a higher attrition rate in one of the regions? Which type of analytics should they use? a) Descriptive b) Predictive c) Prescriptive d) All of the above 9. Walget’s BI solution is up to date with data by 7 AM local time every day. If a transaction happens at 5 PM local time, what is the maximum data freshness of that transaction? a) 2 hours b) 2 days c) 1 day d) 14 hours 10. Walget’s BI solution is partially down because of unplanned maintenance and only today’s data is available. Should top management rely on today’s data for strategic decisions? a) Yes, this is the best they have today b) No, today’s data may not be in sync with the trend
90
Business Intelligence Demystified
c) It depends on the style of management d) None of the above
Answers 1. a
2. d 3. c
4. a
5. d 6. c 7. c
8. a
9. d 10. b
Questions
1. Why diagnostic BI is not included along with the other three types of BI? 2. Why are all predictions not forecasts? Explain with an example. 3. What can be the reasons for choosing a multi-cloud approach for a BI solution? 4. Which one should Walget choose? Proprietary BI or open-source BI? And why? 5. Explain three known myths about self-service BI
Challenges in Business Intelligence
91
Chapter 4
Challenges in Business Intelligence E
ven though BI solutions are critical for almost every business, and businesses should be in possession of it from the very beginning, not every business have it. Some organizations have deployed BI solutions successfully and use it effectively whereas some have deployed but haven’t been able to use it effectively. There are also organizations that are still struggling to implement BI solutions after approval while some are just waiting for an approval to start. According to a 2017-2018 survey by Gartner,[27] more than 87 percent of the organizations were classified as having low BI maturity. Every business, even those that use BI effectively, have faced and continue to face some BI specific challenges at every phase. This chapter attempts to introduce to the learner various challenges faced by organizations in every phase of a BI journey. After going through this chapter, you should be equipped with a good understanding of challenges that organizations may face in every phase of their BI journey. What you can do by knowing these challenges depends on your role to a large extent. For example, if you are a CXO, VP, a newly appointed head of BI, or someone who is about to start a BI initiative, it will help you to account some of these challenges in your plans and estimations. On the other hand, if you are a student or a BI job aspirant it will help you in your future job to be aware of these challenges. First, we’ll introduce the main phases of a BI journey and then discuss the challenges faced in each of these phases. We continue to use
92
Business Intelligence Demystified
Walget (introduced in Chapter 2: Why do Business need BI) as an example. It will be easier to understand this chapter if you already understand software development processes and project management.
Structure
This chapter is structured as follows: • Main phases in a BI journey o Initiation phase Trigger for BI initiatives o Implementation phase o Live phase Further development Enhancements Maintenance and Support Migration • Trigger for BI initiatives • Challenges in the initiation phase o Resistance to the BI initiative o Building a good business case for BI o Acquiring sponsors and promoters o Getting it prioritized • Challenges in the implementation phase o Data and information challenges o People challenges o Process challenges o Technology challenges • Challenges in the live phase o Challenges faced by BI users o Challenges faced by BI technical team
Challenges in Business Intelligence
93
Objectives
Understanding the BI capabilities ladder and getting an overview of the three main phases in a BI journey, activities involved in those phases and the challenges faced during each of those phases.
Main phases in a BI journey
By BI journey, it means the journey from the very start of building BI capabilities in an organization which then continues for as long as the organization exists. BI journey takes an organization from the bottom-most stair to the topmost stair of the BI capabilities ladder as depicted in Figure 4.1:
Figure 4.1: BI capabilities ladder
Even at the time of writing (June 2020) there are many organizations that are still at the bottom-most stair of the BI capabilities ladder. One of the aims of this book is to increase awareness about BI among such organizations and help them climb the ladder and enable them to compete with bigger players. Strictly speaking there are no hard cut-offs as to what would be termed as low, moderate, and high user adoption. However, to provide some guidance, anything below 30% user adoption rate is low, 30 to 60% is moderate, and anything from 60% and above can be considered high. This can be calculated as:
Apart from one of the organizations, all other organizations that I have worked at were at or above the 5th stair in the BI capabilities ladder. The only difference
94
Business Intelligence Demystified
between them and the one that was below 5th stair was that all others were large companies with over a few billion dollars in revenue per year while the one at the lower stair reported less than half a billion in yearly revenue. I should emphasize that it is not a straight-forward correlation that all billion dollars plus companies have excellent BI capabilities and all smaller companies in terms of revenue do not. For example, a US-headquartered background screening solution provider, First Advantage,[28] with revenue less than 1 billion USD has built state-of-the-art reporting and analytics solution not just for internal use of the company but also provides parts of this solution to its customers. So having a good BI solution or not is not to do with the revenue size, but with the vision. Now let’s look at the different phases of BI journey. In BI journey there are broadly three phases, and the order is as depicted in Figure 4.2: 1. Initiation phase 2. Implementation phase 3. Live phase
Figure 4.2: The 3 main phases in a BI journey
In practice, there are several phases, however, our intention is not to detail the different phases of a software development life cycle or all the phases of project management. Here, the intention is to provide a broad view of the BI journey that an organization follows so that we are able to map the challenges per phase accordingly. We could of course directly list all of the challenges without discussing these phases, however, that would make it quite difficult to understand for people who are not aware of BI or IT solution implementations. Knowing these phases helps set the context, allowing readers to understand the challenges better. Let’s take a look at each of the three main phases of the BI journey.
Initiation phase
Initiation phase is not a fixed duration. Duration of initiation phase can last from weeks to years, it totally depends on the organization, its needs and priorities. If there is an urgent need for BI and the resources are available, the organization could start immediately, however, in most cases there isn’t enough resources for BI initiatives to start immediately.
Challenges in Business Intelligence
95
As the name suggests, the initiation phase starts when the idea or vision to build a BI solution or add BI capabilities is initiated within the organization. This phase ends when a BI initiative has been approved by the management as depicted in Figure 4.3:
Figure 4.3: The 3 main phases in a BI journey with start and end points
Approval for a BI initiative could mean different things for different organizations such as those listed below. • Budget allocation for building BI capabilities • Initiation of a BI program or a project • Approval to create a BI team • Approval to start a request for proposal (RFP)/request for information (RFI)/request for quote (RFQ) process to procure products or services to build BI capabilities • Resources are identified and reserved or earmarked for BI • A simple “go-ahead” or an in-principle agreement from management for BI • Hiring of the first BI team member, raising a purchase order raised. Now, you may be asking yourself, who actually initiates a BI journey and when? Or what is the trigger that leads someone to initiate a BI journey? How does it all start in an organization? The next section answers these questions.
Trigger for BI initiatives In Chapter 2: Why do businesses need BI, we dealt extensively with the need for BI, that is, we have covered the “why” part of it. So, we will not be covering the reasons
96
Business Intelligence Demystified
why a business needs BI again. We will only go through those specific trigger points which leads to the initiation phase. The triggers that leads to BI initiatives could be one or more of the following reasons: • Management triggers it as advised by one or more parties such as the advisory board, external consultants, market research firms, auditors, etc. • Management triggers it after being convinced by BI vendor(s). • Management triggers it based on a change in management where the new management has experience in using BI and know the importance of BI in organization. • Implementation of a core business application or software solution which includes embedded BI or OOTB BI. • Requested or demanded by customers (B2B or B2B2C business). • Requested by internal departments. • Management triggers it based on knowledge of competitors’ or other businesses’ BI endeavors. • Requested by suppliers as other clients are providing it. • Initiatives by employees based on their experience at other organizations. There are could other trigger points too which are not captured in the above list, the main point is to get a rough idea about how BI initiatives are triggered.
Implementation phase
The term implementation here has been used to mean the whole end-to-end process of building a BI solution starting from the planning stage until the solution goes live. As we saw in Figure 4.3, the implementation phase starts when the management has approved to kick-off a BI initiative and it ends when a minimum usable solution has been deployed on production/live environment and the solution is made available for BI users. This phase includes activities such as planning, hiring BI team members, business analysis/requirements engineering, designing, procuring hardware and software, installation, establishing connectivity, data modelling, development/coding/programming, various types of development/IT team testing activities, BI user testing activities, release planning and deployments. The aforementioned activities are valid for building BI capabilities entirely in-house. In case of outsourcing, most of these processes are outsourced except for managerial activities. In case of mixed models, where both in-house BI team members and outsourcing partner team members collaborate and work together, the activities are shared between both in-house team members and outsourcing partner team
Challenges in Business Intelligence
97
members. The extent to which the activities are shared depend on the specifics of the contracts between the organizations.
Live phase
The live phase begins when the minimum usable solution (MUS) is live. That is the BI solution is deployed in the production environment and is available for BI users. This phase of the BI journey continues as long as the organization exists. I haven’t heard or seen a case where a company has got rid of BI after using it. As explained in Chapter 1: What is Business Intelligence, BI is not a one-off activity, it is a continuous process. So, once MUS is live, it goes through further development, enhancements, maintenance (fixes and upgrades), and possibly migrations and replacements. Let’s briefly look at the activities within these categories. It must be noted that, in practice, there are different interpretations of these categories in different organizations and therefore following explanation for these categories is one of the most common interpretation based on experience.
Further development All of the development activities that were mentioned in the implementation phase are repeated here. It could be to add new sets of data from existing data sources, integrating new data sources, developing new dashboards, reports, new capabilities, and more. For example, let’s say Walget has four data sources such as point of sale (POS), customer relationship management (CRM), inventory management, and job application system (JAS). Out of these four data sources, the first three were considered as mandatory data sources for MUS and the solution went live with only those three as the data sources. In further development, the data source JAS was integrated into the solution.
Enhancements
Enhancements are the improvements or addition of features to the existing solutions. Using previous example of Walget, enhancements include data quality improvements for existing data sources (POS, CRM, and inventory management), changes to dashboards and reports, creation of reports, metrics/measures, dimensions/attributes, performance improvements, etc.
Maintenance and support
The activities in this category include bug fixes, applying patches, and upgrading of hardware and software. In many organizations, support and maintenance activities are carried out by the same team. As part of support activities, the team ensures that the applications are available, incidents and problems are resolved, and new versions are deployed. For those who don’t understand what bug fixes, patches, and
98
Business Intelligence Demystified
incidents mean in the context of BI, let’s look at few short examples to give you an idea. • Bug fixes: Essentially it means fixing a bug, that is, making necessary changes to the system to ensure it fixes an issue detected in the system. Example: A report crashes when data is in a different language other than English. If there was a requirement to ensure that reports should be able to handle other languages as well then, this issue would be considered as a bug as it doesn’t meet the requirement. And the bug fix would be to make changes to the report or underlying database to ensure that the report doesn’t crash when data is in other languages. • Patches: When there is an issue or possibility of an incident occurring in the production environment, a patch (temporary fix) may need to be applied to ensure that the system continues to work. For example, let’s assume that a database is almost full (reaching its max capacity), due to which daily batch jobs running in the production environment may fail because no more data can be loaded once database is full. A patch here would be to run a script that would delete data from copies of old backup tables that are no longer required. A proper fix such as developing a batch job that auto purges older irrelevant data periodically might take longer time. So, a patch is a temporary fix. • Incident: Incident is an unplanned interruption. Let’s assume that the daily batch load failed and therefore data of the previous day is missing in the data warehouse. BI users are unable to use the dashboards without previous day’s data and therefore they raise an incident stating that data of previous day is missing in the dashboards.
Migration Usually, people forget to include this category. In a BI solution, because it has so many components that make up the BI architecture as we have seen in Chapter 1: What is Business Intelligence, we regularly face situations where BI migration activities are carried out. It could be a full stack migration from one set of tools to another set of tools, or migration of only one or more tools used in the BI solution. For example, let’s assume that Walget uses IBM DataStage as their ETL (data integration) tool, and now for internal reasons it was decided that Walget should migrate to an opensource ETL tool such as Pentaho or Talend. This will lead to a migration activity from DataStage to Pentaho or Talend. Note that the migration activity might itself be a separate project, however, it is still part of the live phase of a BI journey. We haven’t gone into too much detail about the different phases intentionally and have limited content to just the right level of details to ensure sufficient understanding and to be able to understand the challenges in BI corresponding to those phases.
Challenges in Business Intelligence
99
Let’s now get to the challenges.
Challenges faced in the initiation phase
These are some of the challenges faced by any of the BI journey initiators such as head of a business unit, consultant, etc. It includes the following: • Dealing with internal resistance to the BI initiative • Building a good business case for BI • Acquiring sponsors and promoters for BI initiative • Getting BI initiative prioritized While the last 3 challenges in the above list are quite self-explanatory, the 1st one needs more detailed explanation. So, in the following pages we will look at the 1st challenge in detail and the last 3 will only be discussed briefly.
Resistance to the BI initiative
In my experience, I haven’t seen any employee openly opposing a BI initiative in any of the organizations. However, in some companies, I have noticed that there is some resistance, and some employees are deliberately trying to delay BI initiatives as much as possible. Why would there be any resistance from employees when BI is supposed to help and support them in their work? Some of the reasons for resistance are actually genuine concerns while some are not. To some of you, yet to enter into the corporate life, these reasons may look fictious but let me assure you that each of the reasons discussed here, unless marked otherwise, are real. As mentioned in the beginning of this chapter, it is important to be aware of these challenges so that you can navigate them better. Let’s go through some of the reasons provided in the Table 4.1 due to which employees may resist BI initiatives. Reasons Concerns over high costs
Description As with any other initiative, there is a cost involved and BI was always associated with high cost. Therefore, some employees resist BI initiatives because of their past knowledge about its high cost. In the past, companies in fact did have to spend millions of dollars to implement BI solutions. However, as clarified in Chapter 1: What is Business Intelligence, this is no longer the case. But not everyone is aware about this change, so they continue to resist BI initiatives.
100
Business Intelligence Demystified
Concerns over long duration
BI solutions implemented in the past are known for implementation durations extending over multiple years or never ending at all. Again, as mentioned in Chapter 1: What is Business Intelligence, this is no longer the case.
Organization culture/ Change aversion
An organization may have largely got used to working the way it has been working for a long time and doesn’t want to change or adapt. They don’t realize that there is a need for change in the way of working to improve efficiency.
Job insecurity
This is one of the major reasons. Some employees resist due to a fear of losing their jobs, especially due to automation. As discussed in Chapter 1: What is Business Intelligence, the BI process is largely about automation or semi-automation of the information and insights generation process to support decision makers.
Fear of gaps and inefficiencies being exposed
Some employees or departments fear that the gaps and inefficiencies in the current systems, processes, finances, way of working, etc., could be exposed as part of the BI initiative.
Fear of not being able to continue with fraudulent activities
When organizations have BI capabilities, chances of fraud reduce as transparency increases. Employees, including management, who indulge in fraudulent activities try to delay any such initiatives so that they can carry on with their fraudulent activities. When quarterly or annual reports are based on data from a well-built BI solution, it can help prevent fraud.
Fear of losing importance
When BI is implemented with proper data governance, everyone who is supposed to get access to information, gets access, there is no need for BI users to request favours from someone to get the information. Those employees who were in such positions, whom rest of the employees had to depend on for information and insights, fear losing their importance in the organization.
Concerns about high failure rates
According to some articles across several websites, up to 60 to 70% of BI projects have failed. So, this argument could be used to block a BI initiative in an organization. Note: None of these articles have provided what exactly they mean by failed and how they have actually calculated it.
Challenges in Business Intelligence Excel is enough
101
Some employees resist BI initiatives stating that they already use MS Excel, they are familiar with it, it works for them and there is no need for anything more. Table 4.1: Reasons for resistance to BI initiatives
We can understand from the reasons for resistance to BI initiatives provided in the Table 4.1 that most of these are not in the interest of the organization but personal agenda.
Building a good business case for BI
It’s not easy to build a compelling business case for BI. As we have learnt in previous chapters, BI is meant for improving the business and to manage business better. However, to improve something, it should be clearly known what the current situation is and where does the company see itself in the short, medium, and long term. What are the company’s or the business unit’s vision, mission, goals, objectives, and strategy? For example, if the goal of Walget is to become the number one retailer in country X, and the objectives are set as 1. Year-on-Year increase in revenue by 20% 2. Obtaining 4.5 or higher out of 5 in customer satisfaction Then a compelling business case can be built for BI solution by articulating how it is a necessity to effectively manage Walget and not only achieve the objectives but also outperform. To do this, the initiator should have sufficient business knowledge or should have support and cooperation from management. As we saw in Chapter 2: Why do businesses need BI, there are quite a lot of tangible and non-tangible long-term benefits of BI, but a lot of these benefits can only be derived based on the goals and objectives of the department or the enterprise. In the Chapter 6: Financials of Business Intelligence, we will cover the details of calculating the cost and return on investment on BI projects.
Acquiring sponsors and promoters
One of the challenges during initiation phase is to get the top management’s support and backing to pursue the initiation phase. Support of management not to be confused with the approval for project, but just the approval to do the groundwork of coming up with a business case, understanding company goals, objectives, etc. The other related challenge is to get the time of the top management, usually it’s tough to get their time, and then to convince them in a very short time. Without top management support and backing, middle management and others may not be willing to cooperate and support in the initiation phase.
102
Business Intelligence Demystified
Getting it prioritized
A business in the short run can continue to run its business without BI. BI projects are internal projects. It so may happen that the management focuses on urgent operational topics and ends up deprioritizing BI initiatives. For example, they may prioritize custom development of an application requested by a customer over internal BI initiative. The problem is, the custom development never ends, by the time a customer’s request is met, new ones come up and hinder BI initiatives.
Challenges in the implementation phase
Challenges in the implementation phase refer to those challenges that are faced by the BI team(s) once they have started the implementation phase. To understand these challenges better, we need to think from a perspective of a Head of BI or similar positions, that is, someone at management level who is responsible for the BI solutions. The level or complexity of the challenge depend on whether the BI initiative is a department-level initiative or an enterprise-level initiative. If the BI solution’s scope is limited to a single department, the challenges are obviously relatively smaller. BI solutions which have their scope limited to a specific software product (for example, Jira), a specific dataset (for example, web stats), or to a specific customer, etc., could face lesser number of challenges than those that are generic BI solutions (Enterprise BI) involving multiple source applications, multiple data sources, multiple customers (B2B), etc. While we go through this list of challenges, in the backdrop, let's assume that Walget in the past didn’t have a BI solution and at some point decided to build a BI solution. As part of its first project, Walget decided to derive information and insights from sales, employees, and suppliers’ data as shown in Figure 4.4 to enable managers across all levels to make data-based decisions:
Figure 4.4: Walget example of first BI implementation
Challenges in Business Intelligence
103
The challenges faced in the BI implementation phase can be grouped under 4 main categories as listed below: • Data and information challenges • People challenges • Process challenges • Technology challenges Some of the challenges actually belong to more than one of the four categories, there is a strong association to one category and weak association to other categories. Challenges are explained under the category in which it strongly belongs. Additionally, to highlight that a challenge also belongs to another category it is marked as (DI) Data and information, (PE) People, (PR) Processes and (TE) Technology accordingly.
Data and information challenges
Challenges that are mostly related to data and those that are related to lack of information about data (metadata) are grouped under this header. The list of challenges in this category are as follows: • Data quality issues (DI, PE, PR) • Data governance challenges (DI, PE, PR) • Data privacy and security (DI, PE, PR, TE) • Data acquisition challenges (DI, TE) • Lack of information (DI, PE, PR) • Lack of business knowledge (DI, PE, PR) Each of the challenges in this category are explained as follows:
• Data quality issues (DI, PE, PR): Data quality issues are one of the biggest and hidden challenges as depth of the challenge is known only later. BI team usually becomes reluctant owner of this topic. It is noticed that until a BI or some other data initiative starts, data quality doesn’t get checked or assessed, its only after BI implementation starts that the poor quality of data is exposed. It is also important to be aware that when the data quality issues are actually mitigated by a BI team, the effort doesn’t get noticed by management and others but when the reports and dashboards have bad data quality, BI team gets the blame. It’s a thankless job to clean data. A couple of examples of data quality issues are as follows:
104
Business Intelligence Demystified
Situation: An employee at Walget changes his name, because of a process issue the name is updated only in 1 of the 5 systems. Data quality issue 1: Now in one system he is considered as one person whereas in all the other 4 systems he is considered as 2 different persons as both his names are available. Data quality issue 2: As his old records are not updated with the new name, his historic records and current records appear as if they are about two different people. • Data governance challenges (DI, PE, PR): Data governance challenges, especially those related to data ownership and master data management. Which set of data should be considered as master data? Who is the data owner? Dealing with the situation where employees are unwilling to take up responsibilities of a data owner. For example, which of the 5 HR systems’ data should be considered as master data of employees? Who should be the data owner of the HR systems? • Data privacy and security (DI, PE, PR, TE): This challenge can also be placed under data governance. But the importance it deserves, especially now, with regulations on data privacy such as GDPR makes it a very important topic that it deserves a separate mention. The challenge in implementing a BI solution with respect to this challenge lies in ensuring that maximum information and insights are gained from data without violating any of the regulations and directives on data privacy and any of the data security mandates. On one hand, if companies focus only on getting maximum insights from data and violate data privacy regulations or data security mandates, they may have short-term benefits but can end up getting fined huge amounts. On the other hand, if companies don’t derive insights from data, they will lose out to competition by not having a competitive edge. For example, to understand what is the maximum spend or total spend by a distinct customer at Walget, Walget has to ensure that all the transactions of each of the customers’ is combined based on a unique identifier of a person and combine the transactions irrespective of whether the customer has carried out purchase at store, through the website, or through the mobile app using different payment options in each case. To be able to do this, the customer identity (distinctness) needs to be maintained and at the same time data privacy should also be ensured. • Data acquisition challenges (DI, TE): Data for the same topic could be spread across multiple applications. For example, there could be multiple employee management (HR) systems depending on regional or country specific needs and one system could have some more or some less data than the other. Some applications could be on-premises whereas some on
Challenges in Business Intelligence
105
the cloud in different cloud models. BI teams will have to explore options, contact suppliers, vendors, and partners to understand the details about data. • Lack of information (DI, PE, PR): Issues such as lack of transparency about the business processes. Lack of documentation on source applications and systems. Some of the applications could be legacy without any documentation and people who built it could have already quit the organization. There is data but no information or not enough metadata (business metadata and technical metadata). • Lack of business knowledge (DI, PE, PR): Lack of domain or sector and business specific knowledge such as business model, operations, key data assets, etc. Again, lack of business metadata.
People challenges
All of the challenges related to people such as leadership issues, and team challenges are grouped under this heading. The list of challenges in this category are as follows: • Inexperienced leadership (PE) • Executive sponsor continuity (PE) • Petty politics (PE, PR) • Team challenges (PE) Each of challenges in this category are explained below.
• Inexperienced leadership (PE): Lacking vision or long-term goals, that is, lack of visionary leadership. People at the top pushing for quick-and-dirty solutions instead of building scalable robust solutions in agile methodology. Dealing with inexperienced leadership, especially with those who don’t have much idea about the potential of data is quite a challenge. • Executive sponsor continuity (PE): Ensuring that there is continuity in executive sponsorship and support. The support with which the project kicks-off can die out over a period of time. Dealing with top management to continue the support is a challenge. As reorganization is quite common, the sponsor may move out to another department or location and the replacement may not be in favor of the BI initiative. • Petty politics (PE, PR): Interdepartmental petty politics, data hoarding, lack of cooperation from other departments, business units, or sister companies. In one of the organizations that I worked, the central or enterprise BI team was in the same business unit as a business product unit (A). The VP of business unit A was also the VP of enterprise BI team. Business unit B, which
106
Business Intelligence Demystified
had a different VP, did not get any support from enterprise BI team and none of the requirements from business unit B were taken up by the enterprise BI team. • Team challenges (PE): Lack of skilled BI team members. There is a huge demand for BI talent. Keeping the BI team motivated and retaining the existing team. An ideal candidate in general is one who meets the requirements as listed: o Domain or sector knowledge o Business specific knowledge o The exact tools (for example, MicroStrategy, DataStage, Tableau) and technology (data integration, columnar databases) expertise o Team player o Soft skills Usually, we find it hard to get candidates with all of these requirements fulfilled, so we select the one who can fulfil most of it if not all of it.
Process challenges
All of the challenges that are mostly related to processes in an organization are grouped under this header. The list of challenges in this category is as follows: • Wrong processes applied (PE, PR) • Lack of funding (PR) • Prioritization challenges (PR) • Coordinated delivery (PR) • Project over product approach (PE, PR) • Project complexity (PE, PR) Each of challenges in this category are explained as follows: • Wrong processes applied (PE, PR): Dealing with over complicated development standards set across organization which may not be useful and applicable for BI projects. For example, carrying out a full blown and heavy release process for reports or dashboards or convincing non-BI architects not to apply the same principles of software development to BI solution development. • Lack of funding (PR): Managing to deliver equal or better-quality solutions with a lower budget compared to other application development teams. Often, BI is an after-thought, the core applications’ development teams get
Challenges in Business Intelligence
107
most of the budget understandably because that is what runs the business and relatively lower budgets are marked for BI team, but when it comes to expectations, same or better quality is expected. • Prioritization challenges (PR): Prioritizing BI initiatives appropriately. BI team members and teams allocated to non-BI tasks, for example, data migration for an application upgrade, or permanent or temporary deprioritization of the BI program/initiative. • Coordinated delivery (PR): Coordinated delivery, that is across BI teams and together with source application teams. For example, let’s assume that the sales software application in Walget is maintained and enhanced by team A, HR software application is maintained and enhanced by team B, and the supplier management software application is maintained and enhanced by team C. Whenever any of the teams A, B or C make changes to the corresponding applications and there is an impact on BI solution, the BI team(s) has/ have to ensure that it incorporates the data changes and delivers the changes together with the source teams. The priorities for core application development teams could be different from priorities for BI team. • Project over product approach (PE, PR): Focusing more on project closure over product/solution development approach and thereby descoping some of the essential requirements or delivering ineffective, non-working, or unusable solutions. • Project complexity (PE, PR): Management of the complexity and size of the project, program, or initiative. It is a challenge to keep the scope fixed, unlike other projects, in BI, usually the future users of a BI solution usually don’t know what exactly is required till they have seen the 1st version of what they asked for and therefore changes in requirements are inevitable.
Technology challenges
The number of technology challenges can become a big list. Here, content is limited to the most common technology challenges with respect to BI. The list of challenges in this category is as follows: • Variety of requirements (TE) • Dependencies (PR, TE) • Timelines for 1st deliverable (TE, PE) • Too many technologies and tools distractions (PE, PR, TE) • Performance challenges (TE)
108
Business Intelligence Demystified
Each of challenges in this category are explained as follows: • Variety of requirements (TE): Fulfilling wide variety of requirements for stakeholders at different levels of hierarchy, from first-line managers to toplevel management. For example, some of the BI users expect that information should be fed to them and others prefer to access BI portal themselves. So, the solution has to be built considering a variety of users and their requirements. • Dependencies (PR, TE): Dependencies on central IT team for provisioning of infrastructure such as hardware, software, platforms, network, etc. Usually, BI teams are not at the top of the priority list for central IT teams. Selection of tools and technologies usually has to be in accordance with enterprise IT recommendations, but enterprise IT may not be familiar or may not be ready to support BI technical requirements. • Timelines for 1st deliverable (TE, PE): Managing stakeholder expectations with respect to the first deliverable timeline is quite a challenge as the first deliverable minimum usable solution (MUS) is only possible after a minimum foundation (for example, a data warehouse) has been built. • Too many technologies and tools distractions (PE, PR, TE): Dealing with team members or management getting carried away by buzzwords, hype, and vendor marketing, and investing in wrong tools and technologies. • Performance challenges (TE): Building systems that can deal with vast amounts of data and at the same time handle high expectations of data freshness. Building systems that can handle not only current size of data but also that which considers future growth. Ensuring that the solution works for global users in different time zones.
Challenges in the live phase
The challenges faced in the live phase of a BI journey are grouped into two categories; challenges faced by BI users and the challenges faced by the BI technical team.
Challenges faced by BI users
To understand these challenges, we need to think from a perspective of a BI user, for example, as a sales manager HR manager or a marketing manager. Table 4.2 captures most of these challenges faced during the BI live phase in no particular order:
Challenges in Business Intelligence Challenges
People
Process
Limited access to the BI solution. Too many restrictions. Too much control by the BI or IT team. Delay in getting access to BI solutions. Some of the features and capabilities are disabled.
Technology
X
X
Delay in processing data (in case of batch processing) and therefore data is available a day or more later.
X
While the solutions advertised or marketed works well with thousands of records, when the number of records is in millions and billions, the solution hangs or takes more time than users can wait.
X
Status of the BI solution, status of the daily data load, information about data quality is not available.
109
X
Lack of proper self-service BI solutions, therefore for any ad hoc analysis users have to again depend on BI technical teams or data analysts. Not all data that is required for analysis is available. For example, a new system went live, it may take a few weeks/months before this system’s data is integrated in the data warehouse.
X
Collaboration with other BI users through the BI portal is not possible or options are limited.
X
Table 4.2: Challenges faced by BI users in the live phase
Challenges faced by BI technical team
To understand these challenges, we need to think from a perspective of a BI team, team lead of BI, head of BI, etc. Note that within a BI team or department there could be different sub teams for development, enhancements, maintenance and support,
110
Business Intelligence Demystified
and migration. Table 4.3 captures most of these challenges faced by BI team during the BI live phase in no particular order: Challenges
People
Low user adoption rate - To get potential users to use the BI solution is one of biggest challenges. They are used to working with spreadsheets and are reluctant to switch to a different solution. Users prefer to get data extracts so that they can work in Excel, or download data from BI portals and work in Excel sheets. Because of this, users miss out on all of the benefits of using a BI solution.
X
Providing BI user training for beginners and other users, especially when users do not have the time to undergo training. You will notice this in almost every company that some people are so busy that they do not have the time to learn something that will eventually help them become more efficient in their work.
X
Process
X
Changes in source systems without communication or late communication to BI teams. Changes in source systems without informing the BI team can break the technical processes as a BI solution is dependent on source applications for data.
X
As multiple components/tools make up the BI solution, continuous upgrade of the solution is a challenge. Every so often one or more of the components needs an upgrade.
X
In some of the BI implementations there are so many data sources, and at any given point in time some of those data sources are undergoing some changes, which mandates a change in the BI solution. Very difficult for BI teams to catch up.
X
Technology
X
X
Challenges in Business Intelligence Using BI solutions for non-BI purposes. Users may try to find ways to use BI solutions in operational (non-BI) use cases. Thereby sometimes mandating high availability of the solution which otherwise wouldn’t be required.
X
Data quality issues continue to cause problems in this phase too. When data quality issues are not fixed at source (source applications) and no processes are introduced to prevent data quality issues, the BI team will continue to face this challenge.
X
As and when new data sources are integrated, resolving data ownership issues, ensuring data privacy and data security policies are not violated continues to be a challenge.
X
Maintaining high availability. Ensuring that the application is always available for users across the world.
X
X
As and when new data sources are integrated, ensuring data consistency is a challenge. New source may have better quality data compared to previously loaded data and now decisions and measures must be taken to ensure there is consistency in the data.
X
Dealing with too many requests (requirements) to BI teams, to the point that the team doesn’t even have time to analyze the requirement to be able to provide even an estimate.
X
X
Businesses now are very dynamic and several changes (for example, processes, organization, product, or service changes) are happening at such pace that it becomes difficult for the BI team to catch up with the changing business.
X
X
X
111
112
Business Intelligence Demystified
Growth in data is not linear but almost exponential. The BI team has to ensure that the systems are able to handle the growing load. As the testing of a BI deliverable is more focused on confirming that the data in the deliverable is correct, for example, in a report or a dashboard rather than functionality testing, BI team has to actually check it in production environment to confirm its correctness. As data in production could be huge, verifying large sets of reports can be challenging.
X
X
X
Table 4.3: Challenges faced by BI teams in the live phase
Conclusion
As we have seen in this chapter, there are quite a lot of challenges in every phase of a BI journey. We can overcome these challenges with various strategies and ideas which we will see in Chapter 7: Ideas for Success with BI. Challenges are inevitable, every initiative has its own set of challenges. That doesn’t in anyway mean that BI projects or initiatives will fail. Some articles on various websites claim that 60 to 70% of the BI projects fail. These statements are good only for making headlines (read Clickbait). We cannot trust those articles because no evidence has been provided, not even a description of what do they mean by failure in a BI program or a project. There are many open questions, for example, was there a survey? If yes, how many companies were surveyed? Who actually conducted the survey? On what basis were the companies selected? For those businesses where it claimed that BI projects failed, did all other projects in that business succeed and only BI project fail? What was time period of the study? No answers for any of these questions. So, let’s ignore those claims unless there are any evidence. And let’s focus on the real challenges and be prepared. In this chapter we have covered most of the common challenges faced in a BI journey. We haven’t covered generic IT or software development challenges and have tried to limit to BI specific challenges. Some of these maybe common among other implementations as well. Before describing the challenges, different phases of a BI journey were explained. As mentioned in the introduction of this chapter, what you do by knowing these challenges totally depends on your role. Most of the challenges are role specific, that is, a challenge for one role may not seem like a challenge for another role, however, by being aware of these challenges, it will help one role to appreciate the other role’s challenges. Once you understand the different
Challenges in Business Intelligence
113
roles in BI, you will be able to appreciate these challenges even better. In the next chapter we will explore various roles in BI.
Points to remember
Some of the key points to remember are as follows: • Different companies are at different levels of BI maturity, starting from “Not aware of BI” to “Excellent BI solution and high user adoption” as per the BI capabilities ladder. • Three phases of a BI journey are: 1. Initiation 2. Implementation 3. Live • Live phase continues as long as the organization exists. • There can be multiple trigger points that lead to beginning of a BI journey. • Implementation phase in a BI journey means the end-to-end process from planning to delivery. • Further development continues as part of the live phase. Enhancements, maintenance and support, and migration activities are also carried out as part of the live phase of a BI journey. • Every phase in a BI journey has challenges. • In the live phase both BI users and BI technical team face different sets of challenges • How you use this knowledge about challenges in BI depends entirely on your role.
Multiple choice questions 1. What percentage of BI projects fail? a) 20 to 30% b) 40 to 50% c) More than 60% d) None of the above
114
Business Intelligence Demystified
2. Which of these statements is most true? a) All BI projects face some or the other challenges b) Every BI project has the same set of challenges c) Challenges are same irrespective of the roles d) Every BI project will have all of the challenges 3. Which of the following is not one of the phases in a BI journey? a) Initiation b) Implementation c) Live d) None of the above 4. Initiation phase ends when a) A BI project is started b) A BI program is started c) An RFP process has started for BI d) Any of the above' 5. MUS stands for a) Minimum utility service b) Minimum usable service c) Minimum usable solution d) Minimum use of solution 6. Initiation phase in a BI journey means a) Project initiation phase b) There is a vision to build BI capabilities c) Installation of BI solution d) Program initiation phase 7. Implementation phase in a BI journey includes a) Project planning b) Procuring infrastructure such as hardware and software c) Software development d) All of the above
Challenges in Business Intelligence
115
8. In which phase of BI journey is MUS available for BI users? a) Initiation phase b) Implementation c) Live d) All of the above
9. At Walget, the POS machines are being replaced and this means there is a change in the database and data model of the POS application from which data is flowing to the data warehouse. BI team integrates the new POS application in which phase? a) Implementation phase b) Live c) None of these d) Both A and B 10. BI projects are initiated by a) Management b) Consultant c) Any employee d) Any of the above
11. Who may resist a BI initiative? a) Head or VP of a department b) Operational staff c) Excel specialist d) Any of the above 12. At Walget, there are 10000 potential BI users, and 9000 are actual BI users, what is the BI user adoption rate? a) 9% b) 90% c) 99.9% d) 0.9
116
Business Intelligence Demystified
Answers 1. d 2. a
3. d 4. d 5. c
6. b 7. d 8. c
9. b 10. d 11. d 12. b
Questions
1. Why many businesses are still not aware about BI? 2. What are some of the trigger points for BI initiatives? 3. What are the reasons for internal resistance to BI initiatives? 4. What are the challenges faced during the initiation phase of a BI journey? 5. Why do some employees resist BI implementation? 6. What are the four categories of challenges faced during the implementation phase of a BI journey? 7. What is GDPR and how does it become one of the challenges in implementing a BI solution?
Roles in Business Intelligence
117
Chapter 5
Roles in Business Intelligence W
hether you are building a BI team and would like to know which are the most important BI roles, a student who would like to understand the main roles in BI, or a BI job aspirant deliberating which role to consider, this chapter will provide you enough details to understand the main roles in BI and help you in your endeavor. This chapter could be more interesting for students than those who have already worked in BI to learn about various BI roles. It could also be interesting for managers or the top management, for people who are going to manage or build BI teams in the future. In this chapter we will first look at a simple and typical BI team structure, and then cover different BI organizational models. We will then learn about different BI roles and their responsibilities after which the confusion around a few roles such as BI analyst and BI business analyst are clarified. For those who are already in BI, this could be a good refresher of the roles and responsibilities, and provide you a framework to understand where your organization stands in terms of BI organizational models and BI roles.
Structure
We will cover the following topics in this chapter: • Setting the context
118
Business Intelligence Demystified
• Typical BI team structure o BI organizational models NGDE SGDE CGCE CGDE • BI roles and responsibilities o Technical roles in BI
Business Intelligence Administrator Business Intelligence Architect
Business Intelligence Developer Business Intelligence Tester
o Techno-functional roles in BI
Business Intelligence Analyst
Business Intelligence Business Analyst BI Analyst vs BI Business Analyst
o Management roles in BI C-Level Role
Head of Business Intelligence
Business Intelligence Team Lead
o Exclusions
Objectives
Getting to know a typical BI team structure and understanding the characteristics of different BI organizational models. Learning about various BI roles, different job titles for the same roles, and responsibilities in each category in a BI team. Understanding the difference between a BI analyst and BI business analyst, and between a data analyst and a BI analyst.
Setting the context
In an organization, BI roles similar to other roles can be staffed by hiring and building internal teams, contracting freelancers, or by partnering with a service provider. In this chapter, we will discuss about roles from the context of an internal team, that is, we are not considering BI roles from IT service provider companies.
Roles in Business Intelligence
119
By BI roles, we are referring to the team members that implement, maintain, and operate a BI solution and analyse data, for example, a BI analyst whose core job is to deal with the process of deriving information and insight from data. We are not discussing about general BI users such as sales manager, marketing manager, HR manager, Walget’s store manager or regional manager, etc., whose core responsibility is something else but use BI as part of their job. Technical roles such as database administrator (DBA), network engineer, or server administrator are also not covered as these roles are not BI specific roles.
Typical BI team structure
As every organization is different, every BI team is also structured differently. Even within the same organization, BI teams can be structured differently. We will begin by looking at a typical BI team structure in a small to medium enterprise (SME). In large companies, there can be multiple teams of this type and size. Figure 5.1 depicts a typical BI team structure in an SME.
Figure 5.1: Typical BI team structure
In a typical BI team structure there is a development track that consists of business analysts, frontend developers, backend developers, and testers/quality assurance engineers, a support track that mainly consists of administrators, and a few individual contributors. The leads of development track and support track and the individual contributors report to the Head of BI. It is important to note that a single person may carry out more than one role and two or more people may carry out a single role. All possible combinations that you can imagine can be found in one or the other
120
Business Intelligence Demystified
companies. The BI team structure provided in Figure 5.1 is reflective of what we currently see in companies. Multiple such BI teams can operate in a large company. What happens when there are multiple such BI teams within an organization? How are the teams organized? How are the works/projects distributed between teams? To find answers to such questions, let’s go through the BI organizational models.
BI organizational models
Large organizations may have hundreds, in some cases more than thousands of employees working in a BI department. For example, in 2014, Target (an American retail corporation) had around 1600 team members in their BI and analytics department.[32] When there are multiple BI teams within an organization, the BI organization structure can be broadly grouped into one of the four BI organizational models as explained in the following paragraphs. The four organizational models of BI are listed below in the increasing order of maturity and efficiency. • No governance and decentralized execution (NGDE) • Some governance and decentralized execution (SGDE) • Centralized governance and centralized execution (CGCE) • Centralized governance and decentralized execution (CGDE) Note that the abbreviations NGDE, SGDE, CGCE, and CGDE are not commonly used abbreviations but have only been introduced in this book for convenience and better understanding. These models are reflective of what currently exists and not to be considered as prescriptive. If we refer to a model as only centralized, it is not clear if the execution is also centralized or if only the governance aspect of the model is centralized. Therefore, the models are based on the approach followed for governance of BI and the approach followed for execution. Execution refers to all the activities in the implementation and live phase as we saw in Chapter 4: Challenges in Business Intelligence. BI governance activities include: • Prioritizing BI projects, programs, and initiatives and ensuring effective usage of BI. • Driving BI adoption across the organization. • Selecting BI tools and technologies as well as managing BI vendors for the entire organization. • Organizing BI training for both BI team members and users. • Enabling collaboration among BI teams by establishing platforms and channels.
Roles in Business Intelligence
121
• Establishing standards for BI development (coding, documentation, number of minimum environments, etc.), testing, hiring, etc. • Developing frameworks (e.g. metadata logging) and templates (e.g. deployment checklist, project onboarding checklist, requirements specification template, etc.). • Guiding and ensuring that BI teams across the organization follow the standards, make use of the frameworks and templates, adhere to the policies and share best practices through documentation, meetups, newsletters, announcements on portal, etc. Note: In some organizations, the support and maintenance function are also carried out by a centralized team whereas the development activities are carried out by decentralized teams. However, support and maintenance activities are not be considered as BI governance activities. Governance activities can be carried out by a team of its own (dedicated team for governance activities only) or by a virtual team or an official group with a few members from various BI teams across the organization. The governance function can be taken up by the Business Intelligence Competency Center (BICC), Center of Excellence and Center of Expertise (BI COE), Analytics COE, Data and Analytics COE, or the enterprise BI teams. Now let’s take a look at the characteristics of each of the four BI organizational models.
NGDE The NGDE model can also be called as the fully-siloed model or simply as the chaos model. The characteristics of the NGDE model are as follows: • In this model every BI team is autonomous. Every team has its own set of tools, technologies, processes, standards, priorities, even their own job specifications which are different from the other BI team’s job specifications for the same role. If there is anything common between any two teams, it’s usually a coincidence and unplanned. • The teams in this model belong to different departments and report to different top-level leaders (C-level). For example, some BI teams may operate under the CFO whereas some under the COO, CMO, etc. • Teams fulfil only the requirements of the department they are in. • There is no collaboration between the BI teams. Please note that within a BI team there could be good teamwork and collaboration, however, there is no sharing of best practices and learning across different teams.
122
Business Intelligence Demystified
• This BI organizational model is a result of lack of centralized BI leadership. From a high-level (corporate-level) view it can be noticed that there is chaos, but the BI teams themselves may not realize it as they do not have an overview of all the different BI projects and initiatives that are happening simultaneously. • High cost of operations due to poor or inefficient management leading to situations such as purchase of redundant hardware and software as well as under/inefficient utilization of each teams’ capabilities. • Mismatch in BI budget allocation. The teams that are supposed to be working on the most critical and important initiatives may not be the ones that get the required budget allocation.
SGDE The SGDE model can also be called as the mostly decentralized model. The characteristics of SGDE model are as follows: • Similar to NGDE, every BI team is autonomous. Every team has their own set of tools, infrastructure, technologies, processes, standards, priorities, and even job specifications that are different from the other BI teams’ job specifications for the same role. However, unlike NGDE, if there is anything common between any two teams, that’s mostly because one team has adopted the practices from the other as some channels are established for collaboration between the teams. • Similar to NGDE, teams belong to different departments and report to different top-level leaders. • Teams fulfil only the requirements of the department they are in. • There is some governance through informal channels (that is, there is no budget, no dedicated head count, no authority to enforce) created usually by proactive members from different teams. There is some collaboration between teams, some channels are open for collaboration by the teams themselves, for example, they share best practices and learnings between teams during meetups/knowledge sharing sessions but there is no authority to ensure that the best practices learnt from other teams are actually applied. • This BI organizational structure is also not planned and could be a result of inorganic growth. It may happen that during restructuring of the overall organization, BI teams may not receive the priority they deserve. Companies, when they acquire or merge, usually prioritize integration of core applications and systems before BI applications can be consolidated. If a company is often acquiring new businesses, there may be no time to focus
Roles in Business Intelligence
123
on consolidating BI applications. BI teams and leadership both realize that there is chaos from the higher-level view but are in no position to make big changes. This often leads to teams trying their best to collaborate with other teams using available channels. • Again, cost of operations is high due to poor or inefficient management. • Mismatch in BI budget allocation continues in this model as well. Teams working on the most critical and important initiatives may not receive sufficient budget allocation. Though due to collaboration, the teams may at least be aware of the budget allocated to another team unlike NGDE.
CGCE
The CGCE model can also be called as fully centralized or unitary model. The characteristics of the CGCE model are as follows: • Compared to NGDE and SGDE, the CGCE model is a polar opposite. All BI members are part of a large BI team and use the same set of tools, technologies, processes, standards, etc. Temporary sub-teams are created by allocating some of the BI team members for the project duration. Once the projects are completed, the sub-team members are reallocated to another project but they always remain part of the larger BI team. All BI team members report to the same top-level leader. • Team members are centrally hired, managed, and there are standard job profiles and defined career paths. • BI projects are centrally prioritized and only then assigned to the sub-teams. • Teams fulfil requirements of any of the departments across the organization. • Full collaboration exists across the entire BI team through officially established channels. Sharing of best practices and learnings are actively promoted. • This BI organizational model is fully planned and is a result of good leadership and consolidation of BI teams. • Low overall cost of operations due to savings from broader use and reuse of tools and technology across the organization and efficient utilization of team capabilities. • There is an overall budget for the entire BI department which is allocated/ used for different projects based on its importance. Therefore, the projects that are most critical and important get the right budget allocation. • While in theory this model seems to be ideal, in practice, onboarding the requirements, executing them and governing at the same time can become
124
Business Intelligence Demystified
a big challenge as all the requirements are routed centrally. The number of requirements can overwhelm the BI department, execution can take up more time and thereby result in less time for effective governance. • Whenever some business situations or requirements mandate changes in BI priorities, many projects are impacted as all the projects are centrally executed and are dependent with respect to resources. Impact analysis can take a lot more time compared to decentralized execution models. • Some business units may start to feel that they are being neglected and that they don’t get any priority or the attention they deserve.
CGDE The CGDE model can also be called as partially centralized or federal model. The characteristics of the CGDE model are as follows: • Similar to CGCE, every BI team uses the same set of tools, technologies, processes, job profiles, etc. • Standards are established by the centralized BI governance function/group. However, the execution is decentralized. Teams are specific to business units or locations and report to different top-level leaders. • Once the teams are assigned a focus area (focus area could either be subject area such as marketing, HR, sales, etc., or source applications such as ERP, CRM, etc.), only the projects related to those focus areas are prioritized at that corresponding team level. • Teams mainly fulfil only the requirements of the departments they are in but pitch in to support other teams when the need arises. As tools, processes, and other resources are shared across teams, they are able to move freely and contribute quickly without a long ramp up time. • Similar to CGCE, collaboration exists across the larger BI team through officially established channels. Sharing of best practices and learning are promoted actively. • This BI organizational structure is also fully planned and a result of good leadership and consolidation of BI teams. • Low overall cost of operations due to savings from broader use and reuse of tools and technology across the organization and efficient utilization of team capabilities as team members can be moved across teams. • Budget is usually team specific at a business unit level. • Governance activities are separated from BI execution. This model has its advantage that members of the governance team are not involved in the day-
Roles in Business Intelligence
125
to-day execution activities and therefore have capacity to actively take part in governance activities. It is to be noted that no BI organization structure is permanent. Every company continues to make changes to its BI team structure based on changing business needs. The BI organizational model could be different at different points in time. For example, when an organization becomes a global company from a local start-up by acquiring companies in different parts of the world, initially the BI organization model could be NGDE and then gradually as the company matures in its processes and operations it may move to SGDE, CGCE or CGDE model. Another point which is a unique aspect of a BI department is that it is not clear and there are no written rules that exist that govern where a BI department/function should sit within an organization. All varieties can be seen. In some organizations, BI is fully under the marketing department, in some organizations it’s under IT, in others under Finance, or as a separate enterprise department reporting directly to C-level, etc. Various arrangements are being tried out. Note that the roles and responsibilities associated with the roles that we will now explore in the next sections are true for all BI organizational models.
BI roles and responsibilities
BI roles can be grouped into three categories: technical, techno-functional, and management roles. We will not discuss generic roles that fall under these groups but only the BI specific roles. Note that every BI role is important, every role adds value, every role has its own sets of challenges, there is no such thing as one role is better or worse than the other. A team member who has carried out only one role may wrongly assume that their role is the most difficult or a thankless job, however, if given the chance to experience another role, a team member might learn the challenges that exists for other roles. The reason I am emphasizing this point is because I have seen too many questions on this topic in various forums where there is a presumption that one BI role is somehow less or more challenging than the other. The roles are introduced in an alphabetical order or in the order of hierarchy when it comes to management roles. These roles should also be considered as representative of other equivalent job titles mentioned under every role. The responsibilities stated here should not be considered as comprehensive or a mandatory set of responsibilities for the role. Not all responsibilities are always part of the role. There can be overlaps, a role may or may not exist, there can be different arrangements depending on the organization structure and team structure. Depending on the size of the organization, a role could be an additional responsibility, a shared role, or one person or a team in itself. The intent is to provide a generally-true set of responsibilities under each role.
126
Business Intelligence Demystified
Technical roles in BI
Team members in technical roles are the ones who actually create the technical architecture, technical designs, develop (code, program, create), test, support, and maintain the BI solution.
Business Intelligence Administrator Other equivalent job titles to BI administrator role are as follows: • BI Production Support Engineer • BI Support Engineer • BI Application Engineer • BI Operations Engineer • BI Application Manager A BI administrator usually reports to a team lead of BI, a BI support track lead, or the Head of BI. The responsibilities of a BI administrator include: • Installation and configuration of BI tech stack in all environments (multiple non-production environments such as development, integration, performance, user acceptance testing (UAT), and also production environments). Installation and configuration are usually not a daily activity but a less frequent activity. It is generally carried out once during the initial stage and thereafter on need basis (deployment in a new location, upgrade, tool migration, datacentre migration, server crash, etc.). • Daily (regular) monitoring of the BI tech stack (all tools in all four layers - data acquisition, data storage, data processing, and information presentation) to ensure that the applications are up and running. Data loads in data lakes and data warehouse are monitored daily (nightly) to ensure that data is available as per expectations. • User administration (for example, adding new users, providing right permissions based on roles such as developers, BI basic users, advanced users, maintaining user, group, profiles list, etc.) and supporting the users. Announcing and communicating downtimes and maintenance periods. Keeping users informed about any upcoming major changes that may have an impact. • They are expected to carry out maintenance and updates on BI tech stack on need basis. It is their responsibility to maintain and enhance documentation about data, and keep the admin and support procedures up to date.
Roles in Business Intelligence
127
• Expected to coordinate with server administrators, network engineers, DBAs, and other support admins. Carrying out pre-deployment reviews, releases, hotfixes, troubleshooting, and incident and problem management. • May have to support with ad hoc data enquiries directly from BI users and other stakeholders, especially when there are no BI analysts or data analysts to support such requests. Usually, only admins have direct access to production databases, that is, only read access permissions are provided to the OLTP DBs and read and write access to the databases of the BI solution. • It is generally expected of a BI administrator to be up to date with new and emerging data technologies and provide support in evaluating new technologies, tools, processes, and practices. They are required to guide development teams, review deliverables, ensure that the deliverables from the development teams meet the technical standards and run in the production environment without crashing the production environment. • It is expected of a BI administrator to identify and recommend process improvements, best practices, performance improvements, and suggest alternatives for continuous enhancement of the BI solution. • When there are no BI vendor managers, a BI administrator is expected to coordinate with the BI vendors to ensure the licenses are properly managed. • Based on the utilization ratio of servers and platforms such as ETL tool server, data warehouse, reporting and analytics platforms, etc., they should forecast the capacity and licenses required and plan accordingly. • A BI administrator must understand end-to-end BI solution and be acquainted with the intra-dependencies within the BI solution and the interdependencies between the BI solution and source systems. One of the challenging aspect of being a BI administrator to be emphasized is that BI administrators may have to make themselves always available, especially during nights to fix any issues that may occur in the nightly batch runs. Note: Although there are real-time data flows, even now most of the heavy ETL jobs run during the night, that is, the time period when the operational systems’ utilization is at the lowest. Tip: If you are a person who cannot sleep after a short break in sleep in the night then I would recommend you not to consider this role.
Business Intelligence Architect Job titles equivalent to a BI architect include: • Data Architect • BI Solution Architect/Specialist
128
Business Intelligence Demystified
A BI architect usually reports to the Head of BI and works across multiple BI teams. The responsibilities of a BI architect are as follows: • Assessing business and technical requirements, designing, validating, and maintaining business intelligence architectures (on-premises and in the cloud) and maximizing the potential of data assets efficiently. As a stakeholder of technical requirements, it is a BI architect’s responsibility to ensure that BI teams are not only fulfilling functional requirements but also the nonfunctional requirements, especially those related to scalability, reusability, performance, maintainability, and data quality. They are also responsible for approving or rejecting design decisions made by other team members. • Being acquainted with the latest BI tools and technologies available in the market. Proposing most suitable BI tools (frontend and backend) and technologies that aligns with the company's strategy. Suggest, recommend, or decide which tools to buy and which to build in-house. • Collaborating with business analysts and other stakeholders, documenting and managing technical and architectural specifications, and advising on architectural issues. Recommending solutions to improve existing systems. A BI architect acts as liaison for all BI systems on technical topics with all stakeholders. They translate and articulate technical decisions into expected business outcomes and convey them to the management. • Profile data from source systems, design solutions to integrate data from various source systems into data lakes, warehouses and marts and the presentation of this data to end users using data visualization software. For example, analyze and decide whether to have a direct connection to source systems or have a file-based transfer, in either case various stakeholders may have to be convinced by providing sound reasons for the option selected. Source team may not be happy with a direct connection, decision will have to be made case by case basis, at this point BI architect has to gather the support of management by providing the reasons. • An architect may be expected to carry out the data modeling (design the data models), especially dimensional modeling when there is no dedicated data modeler. Expected to analyze the data entities and make decisions such as whether to use star schema or snowflake schema or hybrid schema (note: these topics are covered in chapter 7 and 10), which type of fact and dimension tables should be built, etc. • Prepare documentation, including solution designs, data models, technical standards and guidelines. Support and guide the development team. Usually, a BI architect (full-time) is not expected to take part in the day to day development work. BI architect is expected to set the technical direction for the team.
Roles in Business Intelligence
129
• Design and share the best practices, carry out timely reviews and technical audits of all modules of BI and ensure modules are as per standards. A key member and usually the decision-maker in the BI governance function that sets the standards. • Be up to date with the latest advancements in BI and data technologies. Attend BI conferences, share learnings with the team and increase the knowledge base. A BI architect should have an overview of the BI technical landscape, thereby ensuring that there are no duplicate or redundant modules and incompatible modules built. By working across multiple teams, a BI architect should also pick up best practices from teams and pass it on to other teams and where authorized should get those best practices implemented. When this role is not rightly staffed, the BI architect can become a bottleneck, and teams may start doing their own thing when it comes to technical decisions. I have noticed in at least couple of projects that a BI architect is actually not leading the team on the technical front but is catching up (documenting what has already been developed) with the BI team, which obviously is not the intention of the role and doesn’t serve the purpose. It is the BI architect’s responsibility to ensure that the solutions built are not only satisfying current data sources and load volumes but also considers future loads and data sources that are to be integrated.
Business Intelligence Developer Job titles equivalent to a BI developer include: • BI Engineer/Specialist • BI Data Engineer/Developer • Data Warehouse Developer/Engineer/Specialist • Business Analytics Specialist/Developer/Engineer • Data Specialist/Developer/Engineer • Data Analytics or Data and Analytics Engineer/Developer • Full Stack BI Developer/Engineer • BI Backend/Frontend Developer/Engineer • Software Developer/Engineer – BI focus • Data Visualization Engineer or BI and Data Visualization Developer • A specific focus such as an ETL, Data Integration, Reporting, or Analytics Developer/Engineer
130
Business Intelligence Demystified
• A specific BI tool such as a DataStage or MicroStrategy Developer/Engineer Most of these roles are also currently promoted as big data developers, engineers, etc. Note that there can be a difference between the responsibilities depending on whether the BI developer is expected to work on part of the solution (for example frontend or backend) or whole (end-to-end or full stack). In large BI teams, we have ETL or data integration developers (backend developers), data modelers, and reporting and analytics platforms developers ( frontend developers). A developer is usually part of one agile BI development team reporting to a team lead of BI or the Head of BI. Responsibilities of a BI developer, considering they work on end-to-end solutions, include: • Participating in the entire BI software development life cycle, including solution design, development, code review, testing, deployment (or handover to support teams for deployment) and bug fixes. The methodology of development varies from team to team and organization to organization. A developer follows whatever is the agreed methodology. • Creating high-level design (when there are no BI architects), low-level design, data modelling (when there are no data modelers) and implementation of BI and data warehouse solutions based on requirement specifications, non-functional requirements, defined architecture (usually defined by BI architects) and applicable standards to provide business with reporting, dashboards, and self-service analytics platform. • Developing ETL/ELT processes (jobs or data flows or packages or data pipelines for both streaming and batch data) using specific GUI-based platforms (ETL or data integration) tools such as Informatica, DataStage, BODS, etc., and/or code-based (programming languages) such as Python, Java, etc., and query languages such as SQL, PL/SQL, T-SQL, etc., to integrate new data sources into data repositories such as data lake (staging), data warehouse, data marts, data vaults, etc., in cloud and/or on-premises technologies or both based on technical specifications. • Developing semantic layer (in simple terms a layer that bridges technical terms or metadata to business terms or metadata), dimensions/attributes, metrics/facts, reports, dashboards, data visualizations and advanced analytics solutions. • Developing automated data quality checks to ensure high data quality of deliverables. This is applicable when there are no BI quality assurance engineers in the team. • Taking ownership of technical process documentation. • On the testing front, usually, developer’s responsibility is limited to unit testing and peer code review. Integration testing, performance testing and
Roles in Business Intelligence
131
quality assurance may be handled by quality engineers or testers as seen in the next technical role. In most projects, ETL developer (backend developer) may not have direct access to BI users and the other way around. So the BI user may not even be aware about ETL developers. BI users interact directly with either a BI business analyst or frontend (BI reporting and analytics platform) developer. But that doesn’t mean that the ETL developers are in anyway less important than the client-facing roles. ETL development is most often underestimated. Even with the availability of ETL tools such as Informatica, DataStage, etc., bulk of the effort involved in deriving information and insight from data lies in the ETL development, especially the “T” in ETL, transforming the data according to business rules. Complexity of business rules can range from very simple to very complex rules. For example, from transform null to zero to calculate percentage increase in number of transactions based on the campaign or promotion. Usually, quite a lot of data quality issues are noticed after ETL development has started, and then there is an unwritten expectation to fix all of the data quality issues in the ETL process itself, thereby increasing ETL development effort. For a reporting and analytics platform developer, the challenge is to envision what a BI user may not have thought through while providing requirements. Most often, BI users may say what they really want only after they have seen or used the first version of the report or dashboard. The other challenge is to try and fit as many BI users’ requirements as some of these can be subjective, for example, “I need only top 10” vs another user requesting “I need to be able to find top N (5, 10, 50, etc.)”, “The chart should have only blue and red color” versus another user demanding usage of company’s brand colors, and some users demanding finished reports and dashboards vs other users demanding a governed set of metrics and dimensions that they can use to build their own reports and dashboards or a combination of both. It is also important to note that a BI developer is not the same as a software engineer or a software developer who develops a BI related software. A BI developer works on deriving information and insights from data whereas a software developer (BI tool developer) develops a tool or software that can be used by BI developers, users, etc. For example, a BI developer uses a tool such as MicroStrategy, Pentaho Data Integration, Python, or Java to work with or to process data whereas a BI tool developer could be a Java developer who creates the MicroStrategy or Pentaho Data Integration software itself. The key difference is the purpose, one is using a software to derive information and insights from data while the other is developing a software to deliver the software and is not interested in the data. BI developer is data-focused whereas a Java developer, in this case, is application-focused. Some organizations may build all or some of the BI software in-house instead of buying it. In the last two decades, only in one out of the several projects that I have worked, the organization wanted to build a reporting tool of its own, however for everything else such as ETL tools, they bought it. In such cases, a BI developer role includes developing the tool
132
Business Intelligence Demystified
first and then using the tool to work with data, in such cases this role can be easily split between different people.
Business Intelligence Quality Assurance Engineer Job titles equivalent to the BI quality assurance engineer are as follows: • BI Tester • Quality Engineer – BI focus or Quality Assurance Engineer – BI focus • Data Warehouse Tester • Data Test Engineer • Software Tester – BI focus • QA Tester or QA Automation Tester – BI focus • A specific focus such as ETL, Reporting, or Analytics Tester • A specific BI tool such as DataStage or MicroStrategy Tester In many BI projects and teams, we may notice that there is no specific role of a BI tester or a quality assurance engineer as the developers test the deliverables among themselves in all of the non-production environments. The validation in production or live environment is done by the business analyst or directly by the users. The users of the BI solution are usually internal users (management) and therefore do not directly and immediately impact the end consumer of products and services of the business. On one side there may not be a tester at all in small teams and on the other side in big projects there can be a team dedicated for BI testing, especially in data or tool migration projects. A BI tester may be part of an agile development team reporting to a development team lead or may be part of a separate testing team reporting to a test lead. Responsibilities of a BI tester include: • All of the general testing related activities such as creating test strategy, test plans, building and setting up of test data, testing, defect management, test automation, performance testing, integration testing, regression testing, maintaining test suite, scripts, etc., that are carried out by any other (non-BI) tester is applicable for a BI tester too. The activity to be highlighted specially for this role is related to data setup. While most of the other testers mainly focus on functionality and application testing, a BI tester focuses more on the data level testing, for example, is the right set of data picked up (included) or excluded in the ETL run, does the record get overwritten or does it make a new entry, and so on. • Finding data issues and performance issues related to the data loads and report or dashboard refresh times, that is, along with functional requirements
Roles in Business Intelligence
133
ensure that the deliverables meet the non-functional requirements with respect to performance, efficiency and reusability. • Ensuring that the base metrics and derived metrics are calculated as per specifications. Taking responsibility for analysis of the data quality in the BI solution. • Writing simple to complex SQL queries to validate data along with other automated ways of testing.
Techno-functional roles in BI
Team members in techno-functional roles act as a bridge between technical teams or technology and business users or management.
Business Intelligence Analyst Job titles similar to a BI analyst include: • Data Analyst or Business Data Analyst • Reporting and Analytics Specialist • Data Scientist (not all of the profiles) • Data Analytics Specialist • Analytics and Insights Specialist • Business Analytics Specialist or Data Analytics Specialist • BI and Data Analytics Specialist • BI Officer • BI & Reporting Analyst / Officer • Business Insights Analyst • A specific domain or sector such as Retail or Healthcare BI Analyst • A specific function such as People, Sales, or Partner Analytics Specialist • BI Manager (This is different from a manager – Business Intelligence or other such managerial positions) • Business Analytics Manager • BI Solution Manager
134
Business Intelligence Demystified
Currently there are two different interpretations of BI analyst, and it is very much evident that there is a confusion. If you were to check what a BI analyst does in 10 different organizations or refer to the current BI job openings of BI analyst, you will notice that there are indeed two different interpretations, and both interpretations can be right. Let’s call them Type 1 and Type 2 and describe them. In short, type 1 has end-to-end responsibility, that is, first to build the solution and then to analyze the data using the solution whereas type 2 deals mainly with the data analysis part, that is, they are not involved in building the BI solution. All responsibilities of type 2 are also part of type 1. In type 1, there is an expectation to also own or participate in the development of the BI application whereas in type 2, the expectations are limited to analysis using available BI and data capabilities and communication to stakeholders. The main responsibilities of a BI analyst are listed below for both types. In case a particular point is applicable only for one of the types it has been mentioned accordingly otherwise it means that the point is applicable for both. It is important to note that you may have come across a few BI analyst job openings that actually describe the role of a business analyst, that’s a mistake, it should have been called Business Intelligence Business Analyst or Business Analyst for Business Intelligence (BIBA) and not BI analyst. Even though there are some overlapping responsibilities between BI analyst and BIBA, their core responsibilities are different. Hence, under BI analyst, the main responsibilities of BIBA are not covered. The difference between BI analyst and BIBA are covered after BIBA is explained in this chapter. Responsibilities of a BI analyst include: • Creating engaging visualizations, intuitive and scalable reports and dashboards that turn both quantitative and qualitative data into critical information and insights/knowledge that can be used by management to make sound business decisions. • Defining metrics and KPIs, proactively tracking and maintaining metrics and KPIs to make sure that businesses measure what is relevant for business and proactively engaging stakeholders. • Coming up with recommendations that directly address business objectives based on all types of analytics. Participating and supporting with insights in decision-making meetings. • Cooperating with data engineers, data scientists, data analysts and other stakeholders across the organization to ensure that the right information is available and accessible. • Assisting in the on-boarding, coaching and training of business unit end users in the use of BI tools. • Gathering business data in many different ways, also looking at competitor
Roles in Business Intelligence
135
data and industry trends. With the data collected, help develop a picture of the company's competitiveness compared to other players in the market. • Participating in the design, development, deployment and testing of BI/DW applications and solutions. [Type 1 only] • Setting up of processes and framework to make the data solutions with a high availability and low maintenance cost. [Type 1 only] In summary, whether it is Type 1 or Type 2, in my view this is one of the best roles when you want best of both worlds. This role is in touch with technology and at the same time close to the management, for example, discussing recommendations with the management and making a direct impact. The insights and inputs provided by a BI analyst can bring a big positive difference in the business, optimize and improve processes, save millions of dollars, etc. Usually BI analysts are expected to have very good domain/business/sector knowledge. A BI analyst may initially analyze data directly in the staging area or directly in data sources (OLTP) when the data is not available in data warehouses. BI analyst may also manually create reports for few iterations, and when the requirements are clear, passes it on to the development team.
Business Intelligence Business Analyst Job titles similar to a BI business analyst are as follows: • MIS Business Analyst • BI Product Owner or Business Analytics Product Owner • Data Warehouse Business Analyst • Business Analyst with BI/DWH focus • Business Analyst Analytics or Business Analyst - Reporting and Analytics • Business Analyst – Data Management • Business Analyst – BI and Data Management • Technical Business Analyst – Business Intelligence or Data Warehouse Product Owner here refers to the agile team product owner and not the commercial product owner. A BIBA’s main focus is to enable others (business users, customers, etc.) to use BI solutions rather than themselves. BIBA is the bridge between the business users and BI development teams. BIBA is usually part of the BI team. Responsibilities of a BIBA include:
136
Business Intelligence Demystified
● Engaging with business stakeholders from all levels to gather, elicit, analyze requirements and document and maintain detailed requirements of new and additional BI solutions such as dashboards, reports, analytics solutions, etc. ● Translating business requirements to technical requirements, usually in the form of epics and user stories in an agile BI team. ● Performing detailed gap analysis, identifying data sources, creating solution concepts, mock-ups and working with stakeholders to develop the best possible solution. ● Acting as a technical product owner and envisaging requirements and concepts to inspire agile BI teams to deliver reusable deliverables. Building a delivery roadmap for the team. ● Coordinating between the BI team and business users, customers, other business analysts (source data teams) and production support teams. ● Prioritizing the requirements based on business and department strategy and building a product roadmap. Communicating plans for iterations to the stakeholders. ● Providing effort estimations, clarifying business questions from technical teams, validating deliverables (for example, reports, dashboards, data) before delivery, ensuring timely and high-quality delivery, conducting demos, onboarding new users and training users. Carrying out ad hoc data analysis based on business needs. ● Analyzing source systems, its processes, data flow and documenting it. Supporting data modeler and BI architect in developing the data model. ● Creating and maintaining user manuals and other documentations. Similar to any other BA, in case of a BIBA too, one of the main challenges is to deal with the accusations from both sides. On one side customers/business users complain that a BA is not pushing the development team enough and on the other side development teams complain that a BA is pushing more than what the team can handle.
BI Analyst vs BI Business Analyst As quite a lot of people get confused between BI analyst and BIBA, the key differences between a BI analyst and BIBA are provided in Table 5.1.
Roles in Business Intelligence
137
BI Analyst
BIBA
One of the main users of BI solutions.
Main focus is to enable others to use BI solutions.
Usually, part of the business Usually part of the BI team. departments such as marketing, sales, etc. Usually, a BI analyst is an individual Usually is part of the development team. contributor role. Carries out detailed and complex Carries out simple to medium-level data data analysis, including all variety of analysis. Most of the time is spent in gathering analytics. requirements, translating it to technical requirements, enabling users to use BI solution, etc. BI development team is not fully BI development team is blocked if requirement dependent on this role for getting the specifications are not provided by BIBA. That requirements. is, BI development team depends fully on BIBA to provide the requirements. Builds BI applications (Type 1 only) hands-on.
Does not build applications hands-on but writes specifications for the development team to build them.
Table 5.1: Main differences between BI Analyst and BIBA
Now the difference between a BI analyst and BIBA should be clear. Let’s now take a look at the management roles in BI.
Management roles in BI
As with any other field or department, in BI too, there are management and leadership roles starting from Team Lead and Head to C-Level, between Head and C-level there may be roles such as VP, SVP, etc., depending on the size and specific needs of an organization. The responsibilities of these roles in BI are comparable to similar roles at similar levels in other departments. The responsibilities related to general management, collaboration with senior management, leadership, etc., are common with other management roles and not covered in this chapter, only if there is something specific to BI, they are mentioned. Note that C-Level roles in BI are applicable only in those companies that have adopted CGCE or CGDE BI organizational models. To get an understanding of the management roles let’s look
138
Business Intelligence Demystified
at the management roles in BI at Walget. The following Figure 5.2 highlights the management roles in BI at Walget with one example function, Marketing.
Figure 5.2: Management roles in BI at Walget
As seen in the Figure 5.2, CDAO is responsible for the overall BI function of the entire Walget group. There is only one CDAO for the Walget group. VP of BI is responsible to fulfil all BI requirements of one global department that cuts across group of companies of Walget. There are multiple VP of BI positions, one each for every global department, one example that is provided in the Figure 5.2 is of marketing department. Head of BI is responsible for fulfilling the BI requirements of one department within one company of Walget group. There are multiple Head of BI positions, one each for every department and similarly there are multiple BI team lead positions, one for every BI team. A BI team lead in this case, is responsible for fulfilling the BI requirements that are based on one of the subject areas (e.g. campaign management data) that is required for one of the departments (marketing in this case) within a company of Walget group. As in the case of any of management jobs, the level (in the hierarchy) in which the manager operates determines the area and scope of responsibilities, for example, a team lead provides vision, roadmap, leadership, etc., at the team level whereas a Head provides these at a department level and so on. Same goes with budget responsibility, one is at a team level, another is at department level and the other is at corporate level. In the following paragraphs details related to 3 of the management roles in BI are provided:
Roles in Business Intelligence
139
C-Level Role Job titles in C-Level roles in BI include: • Chief Analytics Officer (CAO) • Chief Data Officer (CDO) • Chief Data and Analytics Officer (CDAO) • Chief Business Intelligence Officer (CBIO) Data or analytics or BI specific C-Level roles such as Chief Analytics Officer or Chief Data Officer, Chief Data and Analytics Officer, Chief Business Intelligence Officer are not yet as common as other C-Level roles such as COO, CIO, CTO, etc. In many companies there are no C-Level-specific roles for BI, in such cases a VP or Head of BI reports directly to one of the existing C-Level roles such as a CIO, CMO, CFO, etc. CAO seems to be the most popular title and CBIO seems to be the least popular based on the number of people currently in that role and based on the number of job openings. We can attribute this difference to the misconceptions about BI that we have dealt and clarified in the previous chapters. Some have wrongly assumed that BI is limited to reporting and visualization and therefore they would like to have a title that clearly indicates analytics, without recognizing the fact that BI in the broader sense includes analytics. In the future, when there is more clarity, we could see a rise in CBIO roles. The BI specific responsibilities of all the BI management roles, at different levels with different scope are as follows: • Overseeing the overall BI function within the organization (enterprise or department or team level). • Driving the vision for reporting and analytics strategy and KPI framework. • Hiring the right BI talent, procure the budget, align BI to organization’s goals, drive BI roadmap, provide guidance and high-level technical direction, strategic technology planning, lead, define, grow, and manage BI team/s. • Delivering enterprise reporting, analytics and self-service capabilities utilizing enterprise BI tools. • Taking ownership and/or supporting in defining KPIs and metrics. • Bringing own vision to the future of the BI team and acting as an internal spokesperson. Should be a data and BI champion. • Translating business vision and priorities to BI requirements and priorities.
140
Business Intelligence Demystified
Head of Business Intelligence Job titles similar to a Head of BI include: • Head of BI and Analytics • Director of BI/Business Analytics/Analytics • Data Engineering Director - Analytics Some points to note are as follows:
• In some companies we may find VP of BI / Business Analytics / Analytics, especially when there are multiple subsidiaries, then a VP could be in charge of all BI talent in a subsidiary. • In small companies, a Head of BI could be the final authority on BI topics and reporting directly to C-Level (CFO, CTO, etc.) and there may not be a CAO or CDAO. In large organizations, a Head of BI is in charge of multiple BI teams and leads a BI department. • Rest of the points are same as the C-Level role, except for the fact the scope is at a different (lower than C-Level but higher than team lead) level.
Business Intelligence Team Lead Job titles similar to a BI team lead include: • Business Analytics or Analytics Team Lead • Data Warehouse and Reporting Team Lead • Business Analyst with BI/DWH focus Some points to note are as follows:
• Usually, a BI team lead leads one of the BI teams within one of the departments in an organization. The team size could vary. I have seen teams where there are only four team members and where there are 20+ team members. • The team lead may lead either a team of BI frontend developers, or a team of BI backend developers, or an end-to-end BI development team that includes both BI frontend and backend developers, or only a team of support members (administrators or support engineers) or a team of BI quality engineers. • A team lead could be partly hands-on in one of the technical roles or technofunctional roles while additionally managing the team. • Rest of the points are same as the C-Level role, except for the fact that the scope is at a different (lower) level and a team lead may additionally provide technical BI leadership too.
Roles in Business Intelligence
141
Exclusions
There are some roles or job titles that are intentionally excluded from the set of BI specific roles, those roles and reasons for the exclusions are discussed in the following sections.
Data Steward Data steward actually belongs to the data governance department. In some BI teams we may come across a data steward role, this is usually because the organization is lacking a data governance department, and as BI team works with data, with nowhere else to place, the data steward is placed in a BI team. This same reason is also true about why we find a data quality engineer in a BI team.
Data Migration Engineer
Moving raw data from one system to another system for non-BI purposes sometimes may end up with a BI team. Note that it is not BI work. If a data migration engineer’s job is only to build scripts/jobs to transfer data between two Online Transaction Processing (OLTP) systems then that is not a BI role. Just because a data migration engineer uses one of the tools (for example, an ETL tool) that the BI team uses, doesn’t make that person a BI professional. Of course, at a later point in time, a data migration engineer can move to a BI team based on their data migration experience and become a BI developer. A BI role should play a part in deriving information and insight from data. Moving data from one place to another for operational purposes is not part of that process. Similar to how every person who uses a camera is not a professional photographer, every person who uses an ETL tool is not necessarily a BI professional.
Project Manager Responsibilities of a BI project manager are similar to any other IT solutions project manager. As there is not much that is BI specific or unique to be explained it has been excluded. Exclusion in no way suggests that project manager is not an important role. In fact, having a project manager who has good BI experience or is knowledgeable in BI can be a great advantage for the BI team and have better chances to successfully complete the project. In a project-based team structure, the team reports to the project manager and therefore a project manager has full control and authority to steer the team in the right direction to achieve the project goals. In a matrix organization, a project manager may not have enough authority to lead the BI team as per project needs.
142
Business Intelligence Demystified
Suffixes and Prefixes
These are not roles on their own, these are suffixes and prefixes that indicates the seniority of a role. The suffixes and prefixes that are usually part of the job title, which may be misunderstood as a role, are excluded. For example, Senior, Junior, Lead, Principal, etc., are all intentionally excluded.
Conclusion
In this chapter we have covered the most common and important roles in BI. As mentioned earlier, every role is important and adds value. It wouldn’t be practical and wouldn’t serve much use to cover each and every single BI job title (not role). The intention is to provide sufficient level of detail about the roles in BI so that learners are equipped, are able to understand and differentiate between roles that are required for their specific situations. Unfortunately, there is no standardization in job titles in BI, therefore you may notice that people with two different BI job titles performing the same role in different organizations and people with the same job titles performing different roles in different organizations. Hopefully, with this book and this chapter in particular, there is some reduction in the confusion about BI roles. This chapter also clarified why some of the roles such as a data migration engineer shouldn’t be considered as a BI role. If you have the task of building a BI team, you should now be in a better position to analyze the needs of your organization and also assess where your organization stands and which roles to hire. In the next chapter you will learn the financial aspects of BI such as how to calculate the cost of business intelligence and how to calculate the return on investment (ROI) of BI initiatives.
Points to remember
Some key points to remember are as follows: • BI roles discussed in this chapter are only those roles that are main roles in implementing and operating BI solutions and whose main job is to analyze data. General BI users are not included. • Every BI role has its own set of challenges, no role is less or more challenging. • BI organizational structures vary between businesses, even between those within the same type (sector or domain) of business. • BI organizational models can be grouped into four models: NGDE, SGDE, CGCE, and CGDE. • CGCE and CGDE are planned models whereas NGDE and SGDE are because of lack of planning and lack of centralized BI leadership.
Roles in Business Intelligence
143
• BI governance activities could be carried out by a dedicated team or a virtual team or a group. • Under which department should BI be? or should BI be its own department at C-level? These topics are still not settled. All varieties can be seen. • BI roles can be grouped into 3 categories: technical, techno-functional, and managerial. • BI developer is not the same as the software developer/BI tool developer. BI developer works on deriving information and insights from data whereas a software developer (for example, a Java developer) develops a tool that can be used be used by BI developers, users, etc. • Roles that are involved only in moving data between OLTP systems for nonBI purposes such as data migration engineers are actually not BI roles. • BI analyst role is not the same as the BI business analyst. BI analyst’s main focus is to use BI, to derive information and insight, whereas the main focus of BI business analyst is to gather business (BI users’) requirements and translate that to technical requirements for the BI development team to build the BI solutions. • There are multiple job titles for almost every role in BI.
Multiple choice questions 1. A typical BI team consists of a) BI Developer b) BI Business Analyst c) BI Team Lead d) All of the above
2. Which of these is not a BI specific role? a) BI Analyst b) Data Analyst c) Data Migration Developer d) Chief Analytics Officer 3.
Which of these is not a BI specific role? a) ETL Developer b) Data Quality Engineer
144
Business Intelligence Demystified
c) Microstrategy Developer d) Business Analytics Engineer 4. Which of these is not a type in BI organization model? a) NGDE b) SGDE c) CCGE d) CGDE
5. Which of these is an unplanned BI organization model? a) NGDE b) SCDE c) CGCE d) CGDE
6. BI governance function could be carried out by a) BICC b) COE c) Enterprise BI team d) Any of the above 7.
BI teams may be within a) Marketing or Sales b) Finance c) IT d) All of the above
8. Which of these is not a category in the BI roles? a) Technical b) Techno-functional c) Process d) Management
Roles in Business Intelligence
145
9. In which of these models, the employees themselves try to establish some sort of BI governance without any dedicated team, without budget, etc. a) NGDE b) SGDE c) CGCE d) CGDE
10. Which of these is not a purely technical role? a) BI Application Manager b) BI Manager c) BI Operations Engineer d) BI Quality Assurance Engineer 11. Which of these activities is usually a responsibility of a BI architect? a) Data Modeling b) Incident management c) Write test cases d) Gather business requirements 12. Which of the below roles is not usually involved in BI user training? a) BI Analyst b) ETL Developer c) BI Business Analyst d) BI Product Owner 13. Which of these activities is most often underestimated? a) ETL development b) Report development c) User training d) Testing
Answers 1. d 2. c
3. b
146
Business Intelligence Demystified
4. c
5. a
6. d 7. d 8. c
9. b 10. b 11. a
12. b 13. a
Questions
1. Why is data migration engineer not a BI role? 2. What are the differences between the four BI organizational models? 3. What are the differences between the responsibilities of a BI analyst and a BIBA? 4. Why is that that it is not mandatory for every BI team to have a BI tester or quality engineer? 5. In your view, which of the four BI organizational model is better and why? 6. What are some of the activities of governance function in BI? 7. In your view, which department should BI team belong to? And why?
Financials of Business Intelligence
147
Chapter 6
Financials of Business Intelligence A
s in the case of any other initiative, obviously there are financials involved in the case of BI as well. If you are a decision maker considering a BI initiative in your organization, or anyone who is interested in understanding the financials of a BI initiative then this chapter is for you. In this chapter, we will take a detailed look at different components that typically add to the cost of BI, the total cost of ownership, and return on investment (ROI). Along the way, we will also clear out some of the misconceptions about the costs of BI and ROI for BI. In Chapter 2: Why do businesses need BI; we already covered the benefits of BI. We will now look at some of those benefits from a financial perspective. After studying this chapter, you should be able to calculate the financial metrics for your specific case. In Chapter 5: Roles in Business Intelligence, a typical BI team was introduced, we will calculate the people cost for that team structure.
Structure
In this chapter, we will cover the following topics: • Cost of BI o People cost o System cost o Total cost of ownership
148
Business Intelligence Demystified
• ROI for BI o ROI for BI - Complex o ROI for BI - Simple o Examples of saving time o ROI for BI - Side benefits
Objectives
Understanding the various components that add to the cost of BI and learning how to calculate the people cost and system cost for a BI solution. By the end of the chapter, it should also be clear why some approaches make it more difficult to calculate ROI for BI and learn an easier approach to calculate ROI for BI initiatives.
Cost of BI
As Harel Sagiv, the technical reviewer of this chapter, rightly commented as part of his review, “The BI service is part of the IT unit, and its budget will most likely be mixed in the IT budget. Parts of the BI infrastructure and the professional staff will be shared with other IT services. Moreover, the IT budget varies according to sector, size, other internal reasons (for example 'lean' companies) and other external influences (for example, using cloud infrastructures and services). So, it is really very difficult to estimate the cost and budget of a BI service”. Yes, it’s quite difficult to come up with a general estimated cost for BI and that’s why we will limit the cost calculation to an in-house BI team that has a budget of its own. When we say cost of BI, we are referring to the total cost of ownership (TCO). TCO includes both capital expenditure (CapEx) and Operational expenditure (OpEx) costs, and we are referring to the TCO of the end-to-end BI solution and not just the TCO of the frontend BI tool. I have to emphasize on that point because, quite a lot of articles out there have calculated TCO of BI solutions considering only the frontend (reporting and analytics platform) part while ignoring the need for a backend infrastructure such as ETL tool and database. This is actually quite misleading. Imagine if top-level decision makers were to use MS Excel or any other frontend tool directly on data sources to get the required information, especially when the data sources (OLTP applications) are dealing with millions if not billions of daily transactions. To get the information which the decision maker requires, it may involve tens of database tables to be joined, hundreds of xml files to be parsed, data cleansing, data transformation, and other tasks. Obviously, decision-maker will not consider this setup as a solution. This is the kind of half-baked solution we would deliver if we consider only the frontend part of the solution. For BI, we should include the end-to-end solution cost. By bringing clarity about overall costs,
Financials of Business Intelligence
149
obviously my intention is not to discourage businesses from adopting BI but to create awareness about the real TCO so that realistic expectations are set. It is important for the management to be aware of all the costs so they’re able to decide which combination fits best to their organization’s needs and meets budget constraints. We will attempt to capture most of the costs in this chapter. Similar to any other IT solution, a BI solution involves both people and system (hardware and software) cost. As we have seen in the previous chapters, BI solutions other than out of the box (OOTB) and embedded BI solutions, have to be built specifically as per the organization’s data and requirements. We will now look in detail at the people and system cost of building a BI solution or to be more specific, equipping a business with BI capabilities. Note: As there is a trend of not using the words human resource, we will stick to people.
People cost
People cost includes the cost of hiring, managing, and retaining a BI team. This includes the BI development track, BI support track, BI management roles, and other IT support staff, for example, a portion of the DBA’s capacity could be reserved for supporting the BI solution. There are also costs related to training BI team members and BI users. In addition to salaries and employee benefits there can also be travel related costs. Other costs include IT equipment such as laptops, workstations, etc. Let’s calculate the people cost for a typical BI team for a period of 5 years. Assume that Walget’s subsidiary, WalgetABC is establishing a BI team in Munich to cater to the needs of the European region. Assume that there are two tracks of development in the BI team. The following Table 6.1 provides the split of number of people per role. Head of BI
Role
1
BI Dev Track Leads
Number of people
2 (1 for each track of development)
BI Support Track Lead
1
BI Developers
BI Quality Assurance Engineers BIBAs (BI Business Analysts) BI Administrators BI Architects
6 (3 in each track of development)
2 (1 for each track of development) 2 (1 for each track of development) 2 1
BI Analysts (Type 2 – Refer to Chapter 2 5, Roles in Business Intelligence) Total no. of BI team members
19
Table 6.1: BI roles and number of people
150
Business Intelligence Demystified
Major part of the people cost is salaries. We will use salary as the base to calculate other costs such as hiring costs, employee benefits, IT equipment, etc. Salary costs can vary based on location, sector, organization, seniority (experience), etc. As we’re assuming that WalgetABC is based in Munich, we will consider costs with respect to Germany as of 2020, but again, these are only to provide a rough estimate and should not be considered as definitive. There are companies in Munich that pay developers less than 50,000 EUR annually and companies that pay above 100,000 EUR, but as our focus is to get a rough estimate of costs we will consider average salaries for each role and calculate the salary costs as given in Table 6.2: Head of BI – 1
Role and number of people
BI Dev Track Leads – 2
Salary costs in EUR
110000 X 1 = 110,000 90000 X 2 = 180,000
BI Support Track Leads -1
85000 X 1 = 85,000
BI Developers – 6
65000 X 6 = 390,000
BI Quality Assurance Engineers – 2
60000 X 2 = 120,000
BIBAs – 2
65000 X 2 = 130,000
BI Administrators – 2
65000 X 2 = 130,000
BI Architects – 1
78000 X 1 = 78,000
BI Analysts – 2
65000 X 2 = 130,000
Total salary costs for 1 year (first year)
1,363,000
Table 6.2: Salary costs for a BI team for WalgetABC in Germany as of 2020 for 1 year
Now, let’s look at other costs that contribute to people cost. The total per employee cost includes salary, benefits, equipment, seating, training, hiring, and admin costs. Again, these costs are not definitive but only for guidance and to provide a rough estimation. Let’s calculate the total cost related to people for a typical BI team in Germany as of 2020 as shown in Table 6.3: People Costs - Items
Gross salary
Other benefits IT Equipment Seating space facilities
Assumptions
Costs in EUR
From Table 6.1 (includes social security, health ~1,400,000 insurance, taxes, and job loss allowance) Bonuses, allowances, etc., assumed 20% of gross 280,000 salary 1000 EUR per BI employee per year, therefore 19,000 1000 X 19
and 1500 EUR per BI employee per year, therefore 28,500 1500 X 19
Financials of Business Intelligence Training, subscriptions, books, conferences, etc.
1500 EUR per BI employee per year, therefore 28,500 1500 X 19
Admin
1500 EUR per BI employee per year, therefore 28,500 1500 X 19
Hiring
Total BI team costs for the first year
151
5000 EUR per BI employee (4 replacements 20,000 every year assumed), therefore 5000 X 4
1,804,500
Table 6.3: People cost for a BI team for WalgetABC in Germany for the first year as of 2020
The cost of training the BI users is included in the preceding table as the assumption is that the BI team members, especially the BIBA and BI analysts will train the BI users, however, the hours spent by BI users in getting trained is not included. If training is conducted by an external agency, then we should include that cost as well. All people costs discussed up until now have been calculated considering that the BI team is an in-house team with internal team members. The people cost is usually between 25% to 50% of in-house team cost when outsourced (outside Germany to lower cost locations) and is around 150% to 250% of employee cost when freelancers are hired within Germany. In the calculations above, if we want to be more accurate, we should also include the cost of other IT staff such as server administrators, network engineers, DBAs and IT security. However, as their capacity is not entirely reserved for BI, this needs to be looked at case by case basis. For quick and simple calculation, for every 20 BI team members, we can include 1 (full-time equivalent) FTE cost equivalent of a BI administrator cost to cover the cost of other IT staff. As we now have calculated the people costs for the first year, let’s calculate the people costs for 5 years. For this, we have to make a few assumptions: 1. BI team builds the solution within 1 year. 2. From 2nd year onward, on an average only 30% of the initial BI team capacity is retained. Rest of the team members are redeployed to other projects, irrespective of whether they are redeployed to another BI project within Walget or even within WalgetABC they are no longer contributing to the people costs of this team. 3. There is an average increment of 2% in people cost per year.
152
Business Intelligence Demystified
The people cost for the BI team for 5 years is calculated to be 4.08 million euros as shown in Table 6.4: Year
People Costs in EUR
Comments
1,804,500
From Table 6.3
Year 3
563,220.5
30% of 1877401.8
Year 5
585,974.6
30 % of 1953248.83
Year 1 Year 2
Year 4
552,177
574,485
Total BI team costs for 5 years 4,080,357
30% of 1840590 (after 2% increment) 30% of 1914949.846 Sum of people costs from Year 1 to 5
Table 6.4: People cost for a BI team for WalgetABC in Germany for the first 5 years as of 2020
Note: For the sake of avoidance of any confusion, the exact calculation is carried out and presented in the table with decimal places too. However, the focus should be on the process to arrive at the estimated costs, therefore no need to focus on the precise numbers. If you get a rough idea that should be sufficient to follow the topic.
System cost
The investments in BI software and hardware depend on many factors, largely on functional and non-functional requirements, including but not limited to number of data sources or applications, current and forecasted data volume to be handled, number of expected users, performance requirements, availability requirements, local or global usage requirements, response time expectations, number of environments, number of layers of data, data retention policies, etc. Also, there are plenty of options available such as, on-premises or cloud, proprietary software or open and free software, various levels of support offered by hardware and software vendors, pricing models, license options (for example named user, enterprise, etc.), number of tools deployed (for example, there could be multiple data visualization tools used in the same department), and a combination of all these options. The hardware requirements are guided by and based on the sizing requirements or specifications provided by the BI software vendors. To go through each of these combinations is beyond the scope of this book. Calculating system cost for a BI solution without requirement specifications is a complex exercise and also not a useful one, however, as the goal is to understand the process of calculating and not about finding whether the costs are right, we will proceed with an example. So, to keep it simple and to provide a direction for BI system cost calculation, we will have to make several assumptions:
Financials of Business Intelligence
153
• BI architecture mainly consists of ETL tools (data integration tools), DWH, and BI reporting and analytics platform tools. Rest of the tools such as documentation tool, scheduling tool, data modelling tool, etc., are not considered in cost calculation as they are not as significant in comparison to those three tools. • There are three environments (2 non-production, development and UAT, and 1 production environment). Non-production environments are 50% in sizing compared to production environments. Production environment includes disaster recovery (DR). • No open source tools are used, meaning all tools are of proprietary software license type and all are deployed on-premises. • All software licenses are perpetual licenses with 20% of their initial license cost for renewal paid towards continued support on yearly basis. • Each of the servers are specifically used for BI and not shared with other applications. • The total data stored and processed in the data warehouse starts on day zero with 10 TB (initial load with past data) and increases to 20 TB (after 5 years). The daily increase in data is around 3.8 GB per day in the first year and around 6.5 GB per day in the fifth year. This is the basis on which the database cost is estimated. • The number of BI users remain constant (which is very unlikely, but we are assuming that to keep it simple). • There are in total 500 BI users. And that 20% of the BI users are analysts (power users), 10% are developers/creators, and 70% are basic users/viewers. This and the previous point is the basis on which the reporting and analytics platform costs is estimated. Tools in the premium range is considered. • WalgetABC doesn’t have the option to utilize any of Walget’s existing IT (especially BI) infrastructure because of regulatory and other concerns.
154
Business Intelligence Demystified
A rough estimation of hardware cost (server cost) for each of the three components of the BI solution are provided in Table 6.5: Cost component
Development – Number of servers and cost
UAT – Number of servers and cost
ETL server hardware 1 at 5K EUR cost
1 at 5K EUR
Reporting and 1 at 5K EUR analytics platform server hardware cost
1 at 5K EUR
Database server 1 at 5K EUR hardware cost
1 at 5K EUR
Production – Number of servers and cost
Total cost of servers (hardware)
1 + 1 DR, each 5K + 5K + 40K at 20K EUR = 50K EUR 1 + 1 DR, each 5K + 5K + 40K at 20K EUR = 50K EUR 1 + 1 DR, each 5K + 5K + 40K at 20K EUR = 50K EUR
Table 6.5: Example of system cost calculation for a BI solution with multiple environments
As we have assumed that all hardware is bought (owned) and that the software is bought with perpetual license, the first-year cost will be considerably high compared to subsequent years. Subsequent years include only support costs until hardware is replaced, usually after 5 years. To calculate the total system cost, we also need to include OS cost and BI tool cost with hardware cost. In Table 6.6, we have calculated system cost for the first year. This is the CapEx part of the system cost. Cost component ETL
Database
Reporting and analytics platform
Hardware Cost
OS Cost
50,000
8,000
50,000
8,000
50,000
8,000
Tool (Software) License Cost
Component Cost (First Year)
400,000
408,000
250,000 420,000
Total system cost for the first year
308,000 478,000
1,194,000
Table 6.6: Example of system cost calculation for a BI solution for first year
Now that we have the cost for 1st year, let’s derive the cost (support and maintenance or renewal cost) for the subsequent years. For hardware and OS, the support and maintenance cost are calculated as 10% of the initial cost. In Table 6.7, the system cost for the first 5 years is calculated based on the assumptions we have already made. Cost component
ETL
Database
Year 1
308,000 408,000
Year 2
55,800 85,800
Year 3
55,800 85,800
Year 4
55,800 85,800
Year 5
55,800 85,800
Total
531,200 751,200
Financials of Business Intelligence Reporting and 478,000 analytics platform Yearly totals
89,800
1,194,000 231,400
89,800
89,800
89,800
231,400
231,400
231,400
Total system cost for 5 years =
155
837,200
2,119,600
Table 6.7: Example of system cost calculation for a BI solution for first 5 years
To avoid any confusion on how the values for the subsequent years are calculated, one example of cost calculation is provided. Let’s take the example of the ETL cost component. The license cost is 250,000 euros. 20% of that cost towards license renewal for continued support is 50,000 euros. 10% of the hardware and OS cost toward maintenance and support is 5000 and 800 euros respectively. When all three of these component costs are added we arrive at 55800 euros as provided in Table 6.7.
Total cost of ownership
Now that we have the people cost and system cost, we can add these two as shown in Table 6.8 to get the total cost of ownership for the BI solution for 5 years. Year
People cost in EUR
System cost in EUR
Total cost in EUR
Year 2
0.55 million
0.23 million
0.78 million
Year 4
0.57 million
0.23 million
0.8 million
Year 1 Year 3 Year 5
5 Years Total
1.8 million
0.56 million 0.58 million
4.06 million
1.2 million
0.23 million 0.23 million
2.12 million
3 million
0.79 million 0.81 million
6.18 million
Table 6.8: Total cost of a BI solution for 5 years
Note that the cost of 6.18 million euros is arrived for a specific set of combination, that is, with an in-house local BI team, on-premises servers, proprietary software, and perpetual licenses. Also, bear in mind that we have not considered: • Hardware costs for network, server machines for other tools such as documentation, ticketing system, project management tools, and FTP / SFTP. • Network folders that are shared between different applications • Some software such as scheduling tools, data modelling tools, project management and product development tools. These costs are not considered as these are shared between multiple teams.
156
Business Intelligence Demystified
Even for those components that we have calculated the costs, the costs will obviously change if the parameters change, for example: 1) If the team is based in a different location with higher or lower cost, or if the team is staffed with freelancers or is outsourced. 2) Higher configuration of servers is used, or cloud servers/services are used. 3) Premium software or free and open-source software is used instead. 4) Different licensing model is used such as a core-based or subscription-based model instead of a perpetual license model. Every organization has to understand its specific needs and use the right set of combination. Let’s now look at some of the options available for team, hardware and software.
Team options Some options that can impact the costs for a BI team are as follows: • In-house team with internals (only employees) • In-house team with internals and externals (freelancers) • In-house team and outsourcing partner team (IT service provider) • Only outsourcing partner team • Part of managed BI service Business Intelligence as a service (BIaaS)
Hardware options Some of the options for hardware that has an impact on the costs are as listed as follows: • On-premises servers • Cloud servers (Public cloud or Private cloud) • Serverless computing • BIaaS (No separate hardware costs)
Software options Some of the options for software that has an impact on the costs are listed as follows: • Building all required BI software in-house • Building some BI software and buying some BI software
Financials of Business Intelligence
157
• Buying all BI software • Using only open and free software • Using both open and free software and proprietary software • Using only proprietary software Every set of options has its own set of advantages and disadvantages. For example, where free (no cost) software is used obviously the license cost will be equal to zero, however, the effort (people cost) required to make it run and maintain could be higher and sometimes more than the combined cost of people cost and software cost for building a BI solution with proprietary software. Of course, there are ways to reduce cost, and keep expenses to a minimum while deriving maximum benefits. In the next chapter, we will cover ways to reduce cost of a BI solution. For now, let’s assume that 6.18 million euros is the amount WalgetABC has to invest to enable 500 BI users. Should WalgetABC go ahead with the BI implementation spending approximately 6.18 million euros in 5 years? Is it worth the investment? Can WalgetABC avoid it? Or should WalgetABC spend this amount of money on something else? To answer such questions, we need to understand ROI and calculate the ROI for BI.
ROI for BI
Calculating the return on investment (ROI) for BI can be very challenging or very easy depending on how you choose to do it. In this section, we will see why it could be very challenging or very easy when we change the way we look at it. ROI is a measure, it is the ratio of net income over the cost of investment, and to get ROI (%) the ratio is multiplied by 100. In layman terms, ROI is how much more you will get on top of getting back your investment. For example, if we were to invest 100 euros and earn 200 euros based on that investment, then the ROI is 1 or ROI (%) is 100%, on the other hand if we were to earn 150 euros with a base investment of 100 euros, then the ROI is 0.5 or ROI (%) is 50%. When people refer to ROI, they are usually referring to the ROI (%). In this book, when we refer to ROI, it refers to ROI (%). The formula for calculating ROI is as follows:
Where Net income = Gross income from investment – Cost of investment 'To ensure that the formula is clear, let’s use the ROI formula for both of the cases mentioned above. Case 1: ROI = (200-100) X 100 / 100 = 100% Case 2: ROI = (150-100) X 100 / 100 = 50%
158
Business Intelligence Demystified
ROI for BI – Complex
The formula for ROI looks simple and straight forward, why then is calculating ROI for BI very challenging? Simply because BI solution for a business is not a product on its own, it does not have a revenue of its own, a BI department is not a profit centre but a cost centre. Note, we are not discussing about BI vendors, BI solutions, or service providers, for them, it’s their core business to sell BI products and services respectively to other businesses and generate revenue. We are discussing about a business investing in building BI capabilities for its own use. Whether it’s a retail chain such as Walget, a bank, or a car manufacturer, they do not generate revenue by selling BI, with BI, they are enabled to carry out their business better as BI improves business performance and increases profits. As explained in Chapter 2: Why businesses need BI, even though businesses can exist without BI, BI helps businesses run better and grow profitably. The cost of BI is very clear, we could with some assumptions easily calculate the costs, where it gets challenging is in deriving the gains, especially direct gains as a result of investment in BI. In Chapter 2: Why do businesses need BI, we listed numerous benefits of BI, now the effort is to convert those benefits into a financial metric that can be used to calculate the ROI. Quantifying the benefits of BI is not a simple task. BI influences the quality of decisions. Decisions based on the information and insights through a BI solution could include simple day to day decisions such as approving or rejecting a leave request of an employee based on predicted staffing requirement for a specific week, decisions to ensure right products are stocked in right quantity or even strategic decisions such as launching/discontinuing a product or service, entering/exiting a geographical location, etc. How can we realistically and correctly quantify these benefits? There are so many other factors that contribute to revenue, for example, marketing strategies, customer service, sales competence, product or service unique selling propositions (USP), etc. How can we determine the part of the revenue that was solely based on a BI decision? What if the decision was good but the execution went bad? And what about those other benefits such as finding revenue leakages or fraudulent transactions? How can we predict how much of the revenue leakage we will be able to find with BI? Or how much worth of fraud we may uncover? Retrospectively, we will be able to state and quantify what we were able to achieve using BI, that is, we can calculate actual ROI, however, how do we calculate anticipated or expected ROI? The expected ROI is the one that is used for deciding if we should pursue the BI initiative or not. To explain the challenge better, let’s look at an example (Taxi) that most of us can easily understand. As you go through the Taxi example, try to identify similar situations, questions, challenges in your line of business and also come up with calculations for your business. Taxi example: A person is considering purchase of a car to run it as a taxi. Simplified financials are as provided in the following table 6.9:
Financials of Business Intelligence Description
Amount in EUR
Expected average revenue per day
350
Cost of the car
Expected average expenses per day Expected income per day
30000
50 300
Total income per year
90000
ROI for 1 year
200%
159
Comments
(All costs considered)
Based on 10 trips worth 35 euros each Includes fuel, allocation, etc.
maintenance
350 – 50
300 X 300 (assuming 300 days of working days in a year) ((90000 – 30000) X 100) / 30000
Table 6.9: ROI for a car as a taxi
The expected ROI in this case for 1 year is 200%, that is, within 1/3rd of the year (4 months approximately), the invested money is earned back, the payback period is 4 months. So, this seems like a good deal. The person who was considering purchase of a car, can now decide to proceed with this investment based on the ROI. Although the numbers used above may not be fully representative of the taxi industry, the point is, it wasn’t difficult to calculate the ROI in this case, it was quite straightforward. Now, let’s consider the same situation with a small difference, the difference is that this time the car dealer offers two options to the purchaser which are as follows: • Option 1: A car for 25000 euros but the car will not have any of the information devices (speedometer, tachometer, GPS navigation system, etc.) on the dashboard. For explanation purposes, assume that it is legal to use a car without all of those information systems. The car can still be driven, just that, as we have already seen in Chapter 2: Why do businesses need BI, the driver of the car won’t know the performance of the car, will not be alerted when fuel is low, will not know what speed the car is running at, and so on. • Option 2: The same model car for 30000 euros with all of the information devices on the dashboard, that is, it will have all those meters, indicators, gauges, and systems. Without the calculation of ROI, which one would the person intuitively choose? If you were the person, which one would you choose? I’m sure most of us would choose Option 2. We would invest 5000 euros more than it was absolutely necessary to drive and use a car. Please note that the prices provided here are just examples, so, please don’t focus on the 5000 euros, focus on the part that we are willing to invest more to understand car performance, to be informed so that we can make informed decisions at the
160
Business Intelligence Demystified
right time, such as, filling of fuel at the right time, conforming to the speed limits, servicing the car at the right time, and so on, you get the drift. If the person had to justify quantitatively the spending of the extra amount (5000 euros in this case), how should the driver go about the calculation of ROI on the additional investment? For Option 2, the ROI is same 200% which was calculated already in Table 6.9. Now, let’s calculate the ROI for Option 1: Description Cost of the car
Amount in EUR 25000
Expected average revenue per day
350
Expected income per day
300
Expected average expenses per day
50
Total income per year
90000
ROI for 1 year
260%
Assumptions / Comments / Explanation
Does not include any of the information devices.
Based on 10 trips worth 35 euros each. Includes fuel, allocation, etc.
maintenance
350 – 50
300 X 300 (assuming 300 days of working in a year) ((90000 – 25000) X 100) / 25000
Table 6.10: ROI for a car as a taxi without information system – option 1 (incomplete)
As calculated in Table 6.10, the ROI for option 1 is 260%. So, why take option 2? Why not go ahead with option 1? Let’s consider a hypothetical situation that the person went ahead with option 1 and see what could happen. As there are no information devices, the taxi driver has to keep a mental note of fuel usage, has to keep guessing how much fuel is still remaining, or write it (guesstimate) down in a note book after every trip. Similarly, keep a mental note or make entry (guess work) of other things such as brake oil, engine oil, how much distance has been covered, etc. Driver may receive on an average 2 notices per month from the authorities to pay fines of 100 euros each for over speeding. This is very much possible as the driver doesn’t know what speed he is driving at. Also, on average, the car runs out of fuel 2 times in a month during trips, driver had to call the breakdown service which costs 100 euros fine each time. Again, very much possible that the fuel level is not correctly calculated. And in the worst-case scenario, the car could run out of fuel on a busy road and meet with an accident, causing damage to self, other people, own car, and others vehicles or property. Let’s now recalculate the ROI for option 1 with the additional information (in Italics) as shown in Table 6.11. For now, we will not yet consider the worst-case scenario.
Financials of Business Intelligence Description
Amount in EUR
Cost of the car
25000
Expected average revenue per day
350
Expected income per day
300
Expected average expenses per day Over speeding fines yearly
Breakdown service fines yearly Lost revenue because breakdown yearly
of
Previously calculated income per year Income per year based on new information ROI for 1 year
50
2400 2400 1680
90000 83520 234.08%
161
Assumptions / Comments / Explanation
Does not include any of the information devices Based on 10 trips worth 35 euros each
Includes fuel, maintenance allocation, etc. 350 - 50
100 X 2 X 12 100 X 2 X 12
4 X 35 X 12 (2 trips every month when car ran out of fuel and the next trip for each event lost because of time spent in fixing it) 300 X 300 (assuming 300 days of working in a year) = 90000 - 2400 - 2400 - 1680
((83520 – 25000) X 100) / 25000
Table 6.11: ROI for a car as a taxi without information system – option 1 (incomplete)
Even now, the ROI for option 1 looks better than that of option 2 based on the calculations in table 6.11. The calculated ROI makes it look as though there is no need for information devices in the car. So, the person should go ahead with option 1? Wait, we are actually missing some more points: 1) The driver had to spend at the least 30 minutes more on a daily basis with option 1 than with option 2 as notes were to be taken about fuel level. 2) Over speeding above certain limits can cause suspension or even revocation of the driving license. 3) What if the taxi was to be driven by multiple people, no one would have a clue as to how much the car has run, when should be the next service, and so on. 4) What about the worst-case scenario (accidents)? It’s not that accidents won’t happen in case of option 2, the point is, in case of option 1, there are more chances, and the fault (cause of accident) will most likely be on the taxi driver. How can we calculate ROI for all of these points? If we don’t consider all of these points, we get a false sense of assurance based on the positive ROI (234.08% in this
162
Business Intelligence Demystified
case), but when we consider the points, such as point number 2 and 4, then there is no license, there is no car, and in worst case scenario, person’s life itself is at risk. If such things happen before the payback period, in the worst-case scenario, that is, on the first day, then the whole investment is lost and additionally there could be more liabilities. If we try to find ROI for each individual device such as speedometer, fuellevel gauge, etc., it becomes even more complex. As you can see, the ROI calculation for this approach is getting more and more complex as we go through it in more and more depth and details about the scenarios. In Chapter 2: Why do businesses need BI, it was explained that BI to business is equal or more important than a car dashboard (all of the information devices) to a car. As you can now relate, BI’s relation to a business is similar to the relation of information devices to a car. If we use the same approach, as we did above, for calculation of ROI for BI, it will lead to similar problems as above. Using hypothetical possibilities, we will find that without BI, in the worst-case scenario, the business itself may close down, or may result into a major loss. Clearly, this approach of calculating ROI for BI is not going anywhere. This approach consumes a lot of time to come up with various hypothetical possibilities, researching the financial metrics for each of those possibilities and calculating the probability of it occurring. I hope that it has convinced you that the ROI calculation for BI is very challenging using this approach. As people try to use this approach for BI, they end up stating that estimating ROI for BI is difficult. So, is the answer then to ignore ROI calculation for BI? Similar to how when we buy a car, we consider information devices to be essentials of the car and we don’t calculate ROI for the information devices of a car separately, should we also consider BI as essential capability for a business and not bother about ROI? The answer is Yes and No. Yes, BI is an essential capability for a business. There isn’t an ounce of doubt about the huge benefits of BI (of course well-implemented BI – see Chapter 7: Ideas for Success with BI). If you think about it, do all businesses that use tools such as Microsoft Outlook, Microsoft Office (Word, PowerPoint, and Excel) and other productivity software, calculate the ROI for investing in it? In most businesses these tools have become essential, and are provided to employees on day 1 as a standard. BI capability should also be similar, employees should have access to information and insights from day 1 so that it helps them to do their jobs better, make better decisions from day 1. At the same time, ROI calculation for BI should not be ignored, what we need is a simpler approach, that we can use to proceed with BI initiatives so that every business does not have to go through the same long exercise again and again. In my view, if we cannot estimate ROI for a project (or business initiative), that means, either we have not understood the business well enough or the project well enough. So, we should not ignore ROI calculation but at the same time should not take too much time to calculate ROI.
Financials of Business Intelligence
163
ROI for BI – Simple
In the introduction of ROI, it was mentioned that ROI calculation for BI can be very challenging or very easy depending on how you approach it. After we have seen why it is very challenging, let’s look at it differently to see why it is very easy. BI essentially is the automation of deriving information and insights from data that enables data-based and data-driven decision making. So, the direct benefit of BI is the time saved. Let’s now calculate the savings based on the typical BI team cost and the system cost that we have calculated earlier. In the cost calculation for WalgetABC’s BI implementation we assumed that there will be 500 BI users. Assuming that 20 of those BI users are purely technical (decision enablers consuming BI user licenses), let’s categorize the rest of the 480 BI users as top, middle, and first level management as provided in Table 6.12: Decision Makers
No of employees
Example
20
CEO, COO, CPO, CFO, CAO, SVPs, EVPs, VPs
Top Level
Middle Level
100
First level 360
Head of departments
Line Managers, Sales Managers, Marketing Managers, etc.
Salary per year in EUR per person
Hours Savings per saved per year in EUR (all year employees)
300,000
4,400
750,000
110,000
22,000
1,375,000
90,000
79,200
4,050,000
Total savings for 2nd year in EUR =
6,175,000
Table 6.12: Savings in EUR based on BI usage
The hours saved per year in Table 6.12 is calculated based on the assumption that every employee who uses BI saves at the least 1 hour per day of work, and number of working days is calculated as 220 days, that is, excluding weekends and a few vacations. By multiplying the hours saved with the hourly rate of each employee we get the total savings as shown in the last column of Table 6.12. How does the employee save time? We will see examples of it later in the next section, for now let’s go with the assumption. In the people cost calculation, we had assumed that the number of BI team members remain same (30% of 1st year) from second year to fifth year and that the people cost increases 2% per year, here too, let’s assume that the number of BI users remain constant for the first 5 years and their salaries increase 2% per year to keep the calculations simple.
164
Business Intelligence Demystified
As per calculations, in the first year there is negative ROI as there is no time saved (assumption), but by end of 5 years, the total savings based on BI will be 25.45 million euros. In Table 6.13, we have calculated WalgetABC’s ROI for the first 5 years: Year
Cost in EUR
Savings in EUR 0
-100%
Year 2
0.78 million
6.17 million
63.27%
Year 3
0.79 million
6.29 million
172.54%
Year 4
0.8 million
6.42 million
251.09%
Year 5
0.81 million
6.55 million
310.50%
5-year ROI
6.18 million
25.45 million
310.50%
Year 1
3 million
ROI
Comments
Assuming BI from scratch takes at the development, in embedded cloud BI.
solution is built on-premises and least 1 year of and not investing BI or OOTB and
Based on sum of costs up to year 2 and sum of savings up to year 2 Based on sum of costs up to year 3 and sum of savings up to year 3 Based on sum of costs up to year 4 and sum of savings up to year 4 Based on sum of costs up to year 5 and sum of savings up to year 5 Overall ROI end of 5 years
Table 6.13: WalgetABC’s ROI calculation for the first 5 years of BI implementation
In the preceding table we have calculated ROI based only on the direct (quantitative) benefit of the time saved of the decision makers. With just direct benefit itself we are arriving at an ROI of over 300% in 5 years. Before we get to the explanation of how time is saved, let’s quickly, summarize the easy way of calculating the ROI of BI implementation. • For cost calculation, follow the steps as per the Cost of BI section, that is, calculate the people, hardware, and software cost. This is straight forward. • For the earnings/savings part, estimate the number of employees who will become BI users. In other words, estimate the number of potential BI users. • Categorize them into three groups (First line managers, Mid-level, and Toplevel decision makers) with their approximate hourly rates, to keep it simple, assume that out of the total BI users (not total employees), 5% BI users are top-level, 20% are middle-level and 75% are first line managers.
Financials of Business Intelligence
165
• Assume 1 hour (this is a conservative estimate) of saving per BI user per working day. • Multiply hourly rate with hours saved for all BI users to get the earnings/ savings. • Apply the values in the ROI formula to get the ROI for BI. Using this approach, it is quite easy to arrive at an estimated ROI without spending too much time in analysis and research. By now it should be clear why this approach is quite simple. Let’s now apply this approach to the earlier taxi example. In the case of the taxi, the decision maker is 1 person, the taxi driver. Assuming that the driver saves 1 hour per day with information devices, the number of trips per day increase to 12 instead of 10. The estimated overall ROI as per the calculations provided in the following table 6.14 is 260% which is 30% more than the previously calculated ROI of 200% in Table 6.9 earlier:
Description Cost of the car
Amount in EUR
Assumptions / Comments / Explanation
30000
Including cost of the information devices
Expected average revenue per day
420
average
70
Based on 12 trips worth 35 euros each (2 additional trips based on the time saved)
Expected average expenses per day
60
Expected income per day
360
Calculated total income per year
108000
Cost of the information devices
Increase in revenue per day
Expected average increase in expenses per day Increase in income per day due to investment on information devices
Increase in income per year due to investment on information devices
5000
10
60
18000
Already included above
2 additional trips based on the time saved
Includes fuel, maintenance allocation, etc. (10 euros increased) Assuming more running of the vehicle 420 - 60 70-10
360 X 300 (assuming 300 days of working in a year) 60 X 300
166
Business Intelligence Demystified
Overall ROI for 1 year
260%
ROI for information 260% devices for 1 year
((108000 – 30000) X 100) / 30000 ((18000 – 5000) X 100) / 5000
Table 6.14: ROI for a car as a taxi with information devices (equivalent of BI for a business)
Note that the 260% ROI for information devices is just based on the direct quantifiable gain. The risks, including life risk (putting life at risk), are avoided by having the information devices, which has not been considered in the ROI. The interesting thing to note is that we were able to come up with ROI for both overall investment (car) and the additional investment (information devices) easily with this approach of considering the savings based on the time saved.
Examples of saving time
In this section, four examples of how BI saves time of the decision makers are provided. Example 1: In an organization, a press release was to be made, the CEO orders the CFO to include the number of worldwide transactions in the press release. CFO doesn’t have the numbers and passes it to the VPs. The VPs coordinates with various IT teams to gather the information, each IT team interprets transactions differently, for example, one team understood it as all transactions including returns, another understood it as only successful transactions, and another team interpreted it as only in-store (offline) transactions. If the CEO had access to a BI dashboard that would show him the total worldwide transaction instantly, the CEO would have spent a minute or less. Without BI, it led the CEO to spend an hour following up with the CFO, the CFO spending 4 hours following up with VPs, one of the VP spending 2 days coordinating with various IT teams, the IT teams spent time distributing plans, each team spent 3 hours to get the information, assuming 20 different teams, the total hours wasted = 1 + 4 + 16 + (20 X 3) = 81 hours. This is a very conservative estimate, in reality often times, it can be more than double of that. Example 2: A Walget store manager receives a leave request from a floor staff for 2 weeks of vacation. The manager has to decide whether to approve it or to reject it. Without BI, the manager has to check and find out the staffing requirements to see how many staff members are required for those 2 weeks, has to check if there are any special days (shopping week, discount week, national, or state holidays) in those 2 weeks, then check the leave system to find who else is on planned leave during those two weeks and who could potentially cover the gap, enquire with potential replacements if they would be willing to cover the gap. It could easily take multiple hours or even a day for the Walget store manager to get all of this information. Instead, with BI, the manger could have information such as, are there any special days in those two weeks, is a replacement required based on past years’ trend and
Financials of Business Intelligence
167
the current month trend, if a replacement is required, who could potentially replace and who has already made themselves available for those weeks, who has in the past made themselves available but has/has not later showed up, and so on. The Walget store manager can make an instant decision. The floor staff could get an immediate answer instead of waiting for days. Example 3: In another organization, the head of IT development, responsible for 20+ software development teams (each team with about 7 to 8 team members) was unable to efficiently prioritize the development of the feature requests. Every month, the head spent around 5 working days in sorting priorities, convincing stakeholders, setting priorities for the team, but then it was most often changed by mid-month, therefore the head, together with the team again spent more time reprioritizing feature requests. By implementing a BI dashboard that showcased which features were mostly used by customers, which features were bringing in most revenue, it was transparent across the organization and the prioritization was based on the ranking of the feature in the BI dashboard. This saved not only multiple working days for the head but also hours/days for the team members and other stakeholders. Example 4: As per a report[33], Business Intelligence – Putting information to work, published in July 2006 by the Economist Intelligence Unit, sponsored by SAP and Intel, a micro-lending organization increased the number of loans it processed by 50% in any given timeframe. In this case, instead of 1 hour saved per 8-hour workday, we should understand it as 4 hours saved per 8-hour workday. I leave the ROI calculation to you. Apart from the time saving part, there are few other points as listed in the following that are interesting to which I would like to draw your attention: 1. A report published in 2006, talks about how an organization successfully used BI for decision-making not only at a strategic-level but also at the operational level by providing access to ground-level staff. Now we are in 2020, where some companies and businesses are still struggling to get to that level of BI adoption, in some cases not even top-level staff have access to BI. Some of the reasons for it, we have already discussed in Chapter 4: Challenges in Business Intelligence. 2. Advanced analytics engine was part of the BI platform that would judge/ predict the creditworthiness. This is important to point out, especially because a lot of people wrongly assume that to carry out prediction you need the so-called big data and BI was only reporting about the past. This 2006 report is a clear evidence that it is wrong to assume that BI is only about the past. 3. For predictions, they used statistical methods. Another point to convey is that we don’t have to always use machine learning models to do predictions.
168
Business Intelligence Demystified
4. Customer profiling and customer segmentation was carried out. Marketing campaigns were specifically designed based on the profile. Similar to the above four examples, 100s of examples can be provided where BI has saved time and continues to do so. So, let there be no doubt about the fact that BI saves time of decision makers, and also of those who are impacted by those decisions.
ROI for BI – Side benefits
In the ROI calculations earlier, we included only the direct benefits. If we include all of the benefits, including the side benefits, the total ROI will surely be more than double of the ROI of direct benefits. There are so many aspects to consider if we want to, for example, for every 1 hour saved for a BI user, at least 30 minutes (50%) may be spent in some other productive work (which wouldn’t have been possible for lack of time without BI) and this productive work may lead to both additional savings and earnings for the business, thereby taking the ROI even further up. Employees’ motivation in general should be higher with BI as employees will be better equipped to do their job when compared to a situation without BI. So, again there should be some gain in revenue because of the whole positive work environment. The possibility to upsell/cross sell increases, thereby increasing revenue from existing customers. Also, let’s not forget about the competitive edge that BI provides. But, the problem remains, how to quantify these? And more importantly, how to quantify these in a simple manner without spending too much time? One of the simple ways is to first calculate the ROI for BI based on the direct benefits and then double it to get the overall ROI for BI. This doubling ROI approach, of course, is not the most accurate, but it is still a conservative estimate and a simple way to arrive at an overall ROI without spending too much time. So, for example, if the ROI for BI based on direct benefits (time saved) for a time period is calculated as 300% then the overall ROI will be 600%. In reality, overall ROI will be more than double of the ROI calculated with only direct benefits, so the doubled ROI would still be a conservative estimate. Interestingly, even after all of these benefits and such high ROI for BI, we find some people in leadership positions in some organizations, who say “We don’t have the time to introduce BI because we are busy with daily operations”, this in my view, is equivalent of a person running blindfolded in a forest, and when the person is asked to remove the blindfold, the person responds, “I’m running late, I have no time to remove my blindfold”. That kind of organizational culture is one of the challenges that needs to be dealt with during the initiation stage as seen in the Chapter 4: Challenges in Business Intelligence. One of the steps that BI professionals could take, is to proactively capture the actual ROI for implemented projects, that is, once the BI solution has been made available to the users, a proactive effort should be made to capture and document as many
Financials of Business Intelligence
169
examples as possible of specific cases where BI has helped the users in bringing in more revenue to the business, has supported in saving time, and in reducing or cutting costs. At the least, once every year, a quick analysis should be done to find out the actual or achieved ROI. Usually, once the project is completed, and a BI solution is available, no one bothers to calculate the ROI, and therefore whenever the next project comes up there is no solid evidence. This situation can be avoided by proactively capturing the actual ROI.
Conclusion
In this chapter, we have gone through the details of the financials of BI. Costs (people, hardware, and software) and ROI calculation has been covered extensively. ROI calculations for BI can become very challenging and time consuming or simple depending on how we approach it. You should now be in a position to calculate the costs and ROI for BI solutions in your organization. Hope that you were able to apply the learnings in your line of business as we went through the calculations. As the chapter ends, let me remind, as mentioned in the beginning of the chapter, there are articles out there which are spreading misinformation about costs of BI by including only the frontend tool cost. With just the frontend tool, a BI user will not be able to progress to the extent compared to the approach where the data is cleaned, transformed, governed, and housed in a well-built data warehouse. When considering the cost of BI, the end-to-end BI solution cost has to be considered. In this chapter, we did not consider any of the cloud model implementations for the cost and ROI calculations. By using cloud services, we can remove the CapEx cost and include only OpEx cost, and in general with cloud models, the costs for BI solution should reduce drastically, at least in the first year. Some organizations are still not pro cloud, they prefer on-premises deployments. Cloud BI (end-to-end) solutions are definitely worth considering. In the next chapter, as various ideas for successful implementations of BI are explained, we will also touch upon the topic of cloud BI.
Points to remember
Some of the key points to remember are listed here: • Similar to other IT solutions, BI solution also has both people cost and system (hardware and software) cost. • People cost should also include hiring, managing, training, and retention cost of the BI team and the equipment cost. • System cost of BI is not only the cost of the frontend BI tool but should include the end-to-end solution cost. Considering only frontend tool cost is wrong.
170
Business Intelligence Demystified
• There are different combinations of teams, hardware, and software available in building a BI solution. Every organization has to decide which one works best for them. • ROI is a ratio of net income from an investment over the cost of the investment. It is commonly expressed in percentage. • The calculation of ROI for BI can be both challenging and easy depending on how we approach it. • The simplest approach to calculate ROI is to calculate it based on the direct savings (time saved). • A simple approach to calculate the overall ROI for BI is to double the ROI calculated based on the direct savings. • BI department is usually not a profit centre but a cost centre. It enables businesses to do its business better and improves the business. • Payback period is the time period in which the invested money is earned back. • Investment in BI can lead to huge returns and save businesses from failure. • Even before year 2006, organizations have implemented advanced analytics as part of BI platforms, carried out predictive analytics, enabled not only strategic level (top-level) but also ground staff to use BI successfully. On the other side, there are organizations still lagging behind in implementing a basic BI solution.
Multiple choice questions
1. Which of these does not add to the people cost of BI? a) Hiring cost b) ETL server cost c) IT equipment for the team d) Salary 2. System cost of BI typically includes a) Reporting and analytics platform b) Data warehouse, data lake and other data repositories c) Data integration or ETL tools d) All of the above
Financials of Business Intelligence
171
3. What do some of the BI vendors of BI reporting or BI reporting and analytics platform do to mislead people about ROI for BI? a) They exclude cost of the data warehouse b) They exclude cost of the ETL or data integration tool c) They exclude the people cost of the backend team d) All of the above 4. What is formula for ROI calculation? a) (Net income from investment / cost of investment) X 100 b) (Net income from investment + cost of investment) / cost of investment) X 100 c) (Net income from investment / first year’s cost of investment) X 100 d) ((Net income from investment – cost of investment) / cost of investment)) X 100 5. Assuming that the salary of an average BI developer is 100000 euros per year, what would be the salary (approximate) of the Head of BI in that organization? a) 110000 euros b) 200000 euros c) 170000 euros d) 130000 euros 6. Assuming that a company in Germany has a development team with full of freelancers from Germany, by how much percentage does the people cost increase compared to having an in-house team with internals? a) Above 500 % b) 250 to 400 % c) 150 to 250% d) 20% 7. If the cost of investment is 200000 euros, and revenue generated is 300000 euros in the first year, what is the First-year ROI? a) 50 % b) 100% c) 150% d) 200%
172
Business Intelligence Demystified
8. What is the payback period for the previous question? a) 6 months b) 8 months c) 1 Year d) 2 Years
Answers 1. b 2. d 3. d 4. a 5. c 6. c
7. a
8. b
Questions
1. Assuming that a BI developer’s salary is 80K USD, 1500K INR, 45K GBP in the US, India, and the United Kingdom respectively, how much is the people cost for a typical BI team in each of these cases? 2. How much is the people cost of a typical BI team if all of the team members were freelancers based in Germany? 3. What are the different team options for building a BI solution? 4. Why calculating ROI for BI can be very challenging? 5. Explain with an example the simple approach of calculating ROI for BI?
Ideas for Success with BI
173
Chapter 7
Ideas for Success with BI T
his chapter presents a collection of wide variety of ideas based on experience and learnings which can help BI team, top management and other BI users achieve success with BI. Also included are some unconventional ideas and do not claim to be the best ways or certified ways but are ways in which I believe you can achieve success with BI. Previous chapters avoided prescriptions, but this chapter is intentionally prescriptive. After going through this chapter, you will get an understanding of what mistakes to avoid, what approaches must be taken, what issues to watch out for, and what and how to prioritize when building BI capabilities and implementing BI solutions. Depending on your role, only some of these could be within your responsibilities, however, it would be beneficial to know these ideas as you can pass it on to the relevant roles. It also provides insights about various real scenarios and shares anecdotal references. At the end KABI methodology is also introduced. Experienced BI team member/user will find it easier to understand the ideas compared to those who are new to BI.
Structure
This chapter is structured as listed as follows: • Ideas for management o Approaches
174
Business Intelligence Demystified
o BI team setup • Ideas for BI teams o Approaches o Ideas for prioritization • Ideas for BI users • Unconventional ideas for BI teams o Approaches for development o Ideas to deal with data quality
Objectives
Learn and understand several ideas for achieving success with BI. Learn what mistakes should be avoided and what is usually ignored but shouldn’t be by the management and BI teams. Also learn some unconventional ideas, especially for dealing with data quality issues to achieve success with BI. Learn about KABI methodology.
Ideas for management
These ideas are more relevant to BI management or anyone involved in BI initiatives, programs, or projects. If you are not one of those, approach these ideas with an assumption that you have been given the responsibility to build BI capabilities in an organization. These ideas cuts across management decisions, technological decisions and BI team organization. At first, various approaches that can be adopted to ensure success with BI are provided followed by what mistakes to avoid, what topics shouldn’t be ignored and then at the end ideas are about organizing BI team.
Approaches
These ideas are a set of generic approaches that management can adopt to achieve success with BI. Every approach may not be applicable for every organization, so, one needs to pick and choose whatever is relevant for their organization.
Buy-in from the top
If the goal is to build a corporate or an enterprise BI solution then there should be buy-in from the top management. Top-level (C-level or executive level), those who have authority over all of the departments/business units, sponsorship should be available. It is not enough if only one of the departments takes the lead assuming that other departments will join in as they go along. Without continued sponsorship
Ideas for Success with BI
175
from the top management, the planned enterprise BI solution will remain a department-level BI solution, and the organization may end up with multiple and probably redundant BI solutions. To acquire the support of the top management, it’s best to showcase the expected ROI or business value for the proposed BI solution. At least in two different organizations where I have worked it was noticed that even though there was an Enterprise BI team there were department-level BI initiatives progressing simultaneously.
Look for cheaper and smarter alternative solutions
I have experienced that some teams/organizations tend to miss cheaper alternative solutions when they have already set their focus on expensive solutions. For example, one department in an organization had already secured budget approval of almost 1 million euros for purchase of in-memory analytical database to address the problem of slow running queries in the data warehouse. A closer look into the problem of slow running queries found that the existing data warehouse retained over 12 years of data while all that the users required was usually previous oneyear data and occasionally last three years’ data. By creating a simple process to archive older data out from the presentation layer of data warehouse and retaining only last 3 years’ data ensured that the queries ran faster as expected. The point is, latest technologies and tools can be used to solve problems, but that does not mean organizations overlook available in-house cheaper and smarter solutions.
Use business-related names
BI projects/programs/initiatives or solutions should have names that are businessrelated. Names should readily convey the scope/purpose of it and both business users and management should be able to easily understand and feel connected. Technical names such as Data Lake project, DWH, Data Vault project or solution shouldn’t be used. When technical terms are used, they are perceived by the business as an IT project where they don’t have a role to play, and it doesn’t signify how or where (department or function) they will help the business. Use business-related names such as Call Center BI, Corporate BI, Product Analytics, Sales Intelligence, etc.
Watch out for duplicate solutions
If you go through the details of an organizations’ spending on BI/analytics, you will notice that in some organizations there are redundant BI solutions, sometimes several of them. Organizations should have processes in place to prevent duplicate BI solutions from being built. Periodic assessments and reviews should be carried out to find if there are any parallel BI initiatives in different departments or acquired companies. Reviewing the spend on technology stack throughout the organization, centralizing procurement, documenting existing solutions, creating awareness about existing solutions are some of the ways of preventing build of duplicate solutions. One of the companies that I am aware of, which was in an acquiring spree, ended
176
Business Intelligence Demystified
up with five BI tech stacks. One department had two BI tech stacks, one old and yet to be migrated to new tech stack since several years. Another department had one tech stack. The other two BI tech stacks came along with acquisition of two separate companies, and both continued to be retained for years. In the same organization, another department bought a data platform spending 0.5 million euros in licenses and never used it because there was already another tool that served the purpose; therefore, users had no reason to switch to the new tool.
Build solutions for needs
It’s an irony that technology and tool selection for BI, which is mainly used for decision-making is itself not decided based on facts, but influenced by other factors such as hype, fear of missing out (FOMO), buzzwords, and vendor promises. Note that, if entire organizations’ data can fit within an excel file, then excel is your BI tool. It is good enough for that volume of data. If users want to collaborate on that excel file then use network-based or cloud-based spreadsheets. Of course, it is no longer practical for most companies to work only with excel files because of several limitations, but the point to convey here is that there is no need to spend millions on BI because other companies are doing so. BI solutions should be built based on the current and the anticipated requirements of the organization and not be influenced by the aforementioned factors. Purchasing the most expensive technology or tool in the market doesn’t necessarily translate into success. Similarly, capturing and storing a huge volume of data just because there is technology available to do it doesn’t necessarily always add business value. For example, huge volumes of customer app usage behavior data of 3-4 years ago may not be relevant for the current situation because of various reasons such as, changes to the website/mobile app, available options in the market, changes to the population, etc. Whereas the purchase/transaction history (low volume data) is still relevant for various reasons. It is better to selectively offload unwanted data than to spend on storage and maintenance. Similarly, if simple rule-based algorithm would do the job, there’s no need to deploy machine learning algorithms, which is happening a lot these days. Organization needn’t invent or mold a problem to fit to a hyped technology or tool, which we currently see a lot. As we have seen in earlier chapters, there is little value in implementing real-time BI if users are going to use it only on weekly basis. Before investing on a technology or tool, management should understand the real needs of the organization, and available options within the organization.
Try cloud-based solutions
With a variety of reliable cloud-based BI solutions and services available in the market for every layer of a BI solution, businesses can quickly carry out proof of concept (POC) implementations and then decide on the tools and technologies. Not only for POCs, cloud-based solutions can be used as the permanent and main BI solution
Ideas for Success with BI
177
thereby minimizing the CapEx. Some companies are still sceptical about using cloud services because of data privacy and security concerns among others. Organizations should consider using cloud services such as Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure provided by globally reputed companies. Actually, it is also in the interest of these companies to ensure there are no data security issues to safeguard their reputation. Global companies will strive harder than a local data center provider to ensure there are no data security breaches. By using a combination of cloud servers and open-source (and also free) software, BI solutions can be implemented with zero CapEx cost. Cloud service providers also provide different layers of BI solutions as a service, such as analytics database/data warehouse database as a service (for example, Google BigQuery and AWS Redshift) or analytics and reporting platform as a service (for example, Google Data Studio and AWS QuickSight). By using such cloud-native, serverless services, organizations can not only minimize the cost on software and hardware but also reduce the cost spent on IT teams which would otherwise be required to maintain the servers and services.
Combine buy and build
Management should be open to both the options i.e., buy and build. In some cases, for some parts of the BI solution, it makes sense (regarding time to market) to buy what is already available. For other parts of the BI solution, it may make economic sense in the long term to build it in-house, although in the short-term there can be a delay in going live. All options should be explored. For reporting and analytics platform, because the capabilities expected are mostly a common set of requirements, it usually makes sense to purchase licenses of one of the many tools available instead of building one from scratch. On the other hand, for ETL jobs a combination of proprietary ETL tools, open source ETL tools, and a programming language could be required to deal with a variety of use cases. For example, there could be a requirement to deal with proprietary data formats for which no existing ETL tool may have an inbuilt connector. Therefore, management should consider combining both buy and build option which will result in optimal solutions in most cases.
Avoid getting distracted by buzzwords
BI, in fact any field related to data, has quite a lot of buzzwords. New fancy terms are quite regularly introduced, for example, big data, dark data, data literacy, data swamp and data artist, some of which becomes popular. While it’s useful to be aware of the buzzwords, latest tools and technologies, it should be also clear that not all latest tools and technologies are suitable for every business. Sometimes, just because there is hype about a new tool or technology that is marketed as a “must have” for every organization, they shouldn’t begin to consider upgrading technologies or tools unless existing tools and technologies have reached its end. By falling into such traps, BI teams would be left migrating from one tool to another and upgrading
178
Business Intelligence Demystified
to different versions without having the time to focus on important business topics. Avoid investing in wrong tools and technologies based on hype, instead check the ROI, speak with the users (not IT or BI team) of other companies that have implemented it and understand if they have actually benefitted from the tool. And the most important, always have a clear and concise plan about the specific business process that will be improved by implementing the proposed tool. Without business contribution, there is no justification for any investment. Consult vendor-neutral BI consultants before investing in BI tools and technologies.
Avoid copying
BI tech stacks used by tech giants such as Amazon, Facebook, Google, or Microsoft, or companies belonging to different industries is not necessarily the best fit for your industry and your specific business. Avoid blindly copying the BI tech stack of other organizations. Having the same tech stack doesn’t guarantee the same results. Also, avoid copying tech roles. Buying tools, investing in technologies, and hiring the right roles should all be in sync, again they should be based on the specific requirements of your organization and not an imitation of another organization from some other industry.
Capture realized benefits and ROI
As discussed in the previous chapter, the realized benefits and actual ROI for BI are often not actively captured by both; development team and the support team. In most organizations, the development team’s responsibility ends when the solution is handed over to the support team. In large organizations, development teams aren’t even aware of the BI reports/dashboards actually used by BI users in the production environment or how and in which decisions they exactly help the business user. On the other hand, the support team may only be interested in the maintenance and operations of the environment, and doesn’t track the realized benefits or actual ROI. Therefore, management should ensure that there are processes set in place to capture realized benefits and document the actual ROI for BI.
Forethought about data quality is required
Management usually either overlooks or ignores the fact that there could be data quality issues when planning BI solutions and data quality becomes an afterthought. Bad data quality can negatively impact the trust of users on BI. Management should have forethought about data quality. They should establish processes to prevent data quality issues in the first place. For example: 1) Ensure that every software development team (not only BI teams) have to mandatorily adhere to non-functional requirements related to data quality. 2) Include appropriate data quality targets as performance objectives for employees who are responsible for entering data into the system.
Ideas for Success with BI
179
3) Ensure data quality is monitored and steps are implemented to prevent data quality issues. Several ideas on how BI teams can ensure that the management doesn’t ignore data quality issues are shared in later sections of this chapter. However, most of those ideas are a cure rather than prevention. It is well-known that prevention is most often cheaper than correcting it later. Therefore, it’s best if management would give more importance to data quality and have forethought about it.
BI team setup
The set of ideas presented here are related to organizing BI teams, team sizes, and leadership.
Team per topic
To ensure that there is clear ownership, fair distribution of responsibilities, visibility of progress, faster implementation, and better utilization of knowledge and experience it is better to organize dedicated BI teams for specific topics for a certain duration (ideally until completion of the topic) instead of splitting a topic between multiple teams or assigning multiple topics to a team at the same time. Topics could be based on functional areas such as marketing, sales, HR, finance, call center, or based on source systems such as a transactional system, CRM, ERP, mobile app, or other parameters. The idea is to organize teams that are focused on a specific topic for as long as they are required to implement a topic. Once the team starts working on a topic that was assigned to its roadmap then the team should be allowed to complete it and not be interfered by assigning different topics unless it is really unavoidable.
Choose team names wisely
It is better to choose team names that does not tie down the team to a specific topic than to choose one that does. Even though this point may seem trivial, it actually isn’t as covered below, so, don’t ignore this. An approach is to choose team names based on a theme that interests majority of the team/department members. For example, BI teams at Walget could be named based on a theme of cars, the idea is to select something that is not related to any of the topics at Walget such as team Audi, team Benz, team BMW, or based on a theme of fruits such as team Apple, team Orange, team Mango, and so on. How does this help? This approach makes it easy for both organizing team members and development roadmap planning purposes without much structural changes, that is, without involving HR department. For example, team Audi could be assigned to work on integrating marketing data until completion and then assigned to work on sales data. In this way an implementation roadmap based on topics can be defined for a team. Ownership of implementing a topic remains with a team, however as and when required, a team member from one team (for example, from team BMW) could be transferred to another (team Audi)
180
Business Intelligence Demystified
for a specific duration without much administrative work or organizational level changes.
BI team should be self-contained
A BI team should be self-contained. After the BI infrastructure has been setup, with or without support of other IT team members, a BI team should be able to deliver the BI solution without involvement of other teams. It should be able to gather requirements, develop, test, release, and maintain the solution. All roles required for end-to-end development and support of BI solutions should be distributed among the team members in a BI team. Dependency on other teams in the process of BI delivery can become a big problem. For example, dependency on other teams for ETL jobs to be scheduled in production environments or BI releases to be done or BI requirements to be gathered by some other teams, all such arrangements can become a major bottleneck, and slow down the delivery. It should be ensured that there are no such unnecessary dependencies on other teams as it can slow down the team and therefore slowdown the deliveries.
BI development team size
A small development team of 4 to 5 members dedicated to a particular topic is usually better than a large team that handles multiple topics. When there are more than 5 team members, there should be sub-teams per track of work. For example, if there is a team of 8 members, and they are required to work on two big topics such as BI on CRM data and BI on transaction data simultaneously, then it is better to split this work into two tracks (sub-teams) that work dedicatedly on each of the topics instead of the entire team working on both. This approach as mentioned earlier provides each of the track members more responsibility and ownership, and also saves the team from wasting time on context switching. It also makes it easy to plan the work. The product owner responsible for the track has to only prioritize the requirements within the larger topic and not across multiple larger topics. An ideal BI development team has one BIBA, 3 BI developers, and one BI QA engineer.
Full stack BI developers over part-stack
A development team with full stack BI developers, those who can work on all the layers (data acquisition, data storage, data processing, and information presentation) of a BI solution is a better approach than having part-stack developers, separate developers for different layers. A developer should be able to take up a requirement (user story), for example, a report or a dashboard requirement, and then design, develop, and test (unit testing) features in each of these layers of a BI solution. For the report to be usable, it may require some ETL work to fetch, integrate and transform data, some data modeling work to arrange data, and some work in creating the necessary objects in the reporting and analytics platform. With full stack BI developers, it becomes easy for all, the developer, and the team lead/product owner/business analyst to come up with timelines for deliverables. In case of teams
Ideas for Success with BI
181
with part-stack developers (some for frontend, some for backend), there will be dependencies between them and usually frontend developers have to wait for the backend developers to complete their part before they can start their work.
Unit testing and peer code reviews
Management should organize development teams such that developers should carry out self-reviews of code, unit testing and peer code reviews. By including the process of peer reviews (one developer reviewing the code of the other developer) it not only improves the code quality but also ensures active knowledge sharing between team members.
Selection of BI team leaders or department heads
BI teams and departments should be led by BI professionals who have team leading skills. Those in team lead/head positions without BI experience are usually unable to bring out the full value both from the team and the BI solution. Of course, there are exceptions. Even IT professionals who have only non-BI projects experience are not the right candidates to lead BI teams or departments because of a few reasons: 1) Lack of understanding of what a BI task is and what is not. 2) Applying same principles of software development in other (non-BI) projects to BI projects. 3) Application-oriented thinking instead of data-oriented thinking. Therefore, management should ensure that BI teams/departments are led by BI professionals.
Data analysts should be with business teams
Ideally data analysts should be part of business teams such as sales, marketing, HR or product management and not part of IT/data/BI team within a shared services department. If a BI team is dedicated to a business unit then it is fine for the data analyst to be part of that BI team. Shared services BI teams should focus on acquiring, storing, and preparing data, technical implementations of data governance policies, ensuring data, information and insights are available to the users. But the data analysis responsibility should be with business teams because business users not only have the domain knowledge but more importantly know the intricacies involved in the business operations, inter-dependencies involved in the business decisions and timing of decisions together with expected timing of results based on the decisions that were taken earlier whereas disconnected IT/BI team usually doesn’t have these details. For example, in Walget, a data analyst working together with a store manager will be able to provide more value as they understand how the business decisions may bring about an upward/downward trend in sales whereas a data analyst part of a BI/IT team located away from business teams wouldn’t be in
182
Business Intelligence Demystified
sync with the day to day operations at the store and therefore may not understand the reasons for the trend. Data analysis will be more meaningful and effective when it is carried out by the business teams. The important point is that a data analyst should be aware of all the relevant business decisions. If a product is being launched and the data analyst is fully aware, then they can already expect that there will be a change (increase) in sales. If a product is dropped from the catalogue, then the data analyst expects some of the values to be missing, whereas an offshore/disconnected data analyst, unaware that a supplier was dropped or a product was decommissioned will think it is a new finding and share it with the store manager, however, for the store manager it is not new information.
Business-centric roles should be staffed by internals
When a BI team exists in the organization, business-centric roles such as data analysts and business analysts should ideally be staffed by permanent internal team members (permanent employees) or temporary external staff (freelancers or contract roles who are an extension of the team) whereas technical roles such as developers or quality engineers are fine to be outsourced with long-term contracts to outsourcing partners. This approach ensures that deep business knowledge stays within the business, less dependency on outsourcing partner and avoids outsourcing partner lock-in situations.
Business model and domain knowledge
Compared to other software development and support teams (non-BI teams), a good level of understanding of the business model and domain knowledge is required for most BI team members. For example, various possible scenarios in a transaction or business event, its corresponding data flow and transitions have to be considered to design data transformations and calculation of KPIs and metrics correctly. Lack of knowledge of the business model and domain can result in wrong calculations of KPIs and metrics. Therefore, it is important for the management to ensure that the business model is clear and transparent to BI teams, especially the analysts. Management should also ensure that BI teams have sufficient domain expertise. When details of the business model are known, BI teams can come up with ideas and suggestions for metrics and KPIs that in turn help management to run the organization better. For example, in one of the organizations, I was asked by a board member to provide the value for number of financial transactions. While it seems like a simple question, it actually isn’t as there are multiple interpretations. As I had some level of understanding of the domain, and was aware of the business model, some of my questions to the board member in response were as follows: • Should transactions include all debit and credit transactions?
Ideas for Success with BI
183
• Should debit transactions include payment transactions (initiated by cardholder) as well as other transactions such as recurring fee transactions such as inactive usage fee and annual fee? • Should we consider the date of transaction or the date of booking for each transaction? • Should reversed and rejected transaction be included or excluded? • Should we include or exclude fraudulent transactions? Based on answers to these questions, the value derived for the number of transactions was different from what was reported by another team which had not considered the business scenarios but based it purely on technical events (logs).
Establish a core BI group
It doesn’t actually matter what this core BI group is called, it could be called center of excellence (COE), BI competency center, or center of expertise. The important point is to establish a core BI group, especially in organizations where there are multiple BI teams. This core BI group is expected to carry out the activities of BI governance function as explained in Chapter 5: Roles in Business Intelligence. Without a core BI group, BI teams may operate in silos, and therefore not share best practices, tools, technologies, learnings, and resources between teams, resulting in redundant BI solutions, overspending, and inconsistencies in implementations. A core BI group can be created by bringing together various roles from different BI teams, such as Principal BI architects, Heads of BI departments, Lead or Principal ETL developers, Lead or Principal BI frontend developers.
Ideas for BI teams
We will cover different approaches, highlight what should be avoided, and what is usually ignored but shouldn’t be by BI team to ensure success with BI. These ideas are relevant for BI team members and team leads. If you are not one of these, approach these ideas with an assumption that you are BI team member. As it is now a common practice to build software solutions in a modular, scalable, and portable way, these practices are considered as standard, so any common best practice applicable for general software solutions are considered as implicitly applicable and therefore will not be discussed. It is attempted to focus on approaches that are specific to BI solutions and projects but may not be necessarily exclusive to BI.
Approaches
When BI teams are in a dilemma about choosing the right approach for development of the BI solution, some of these approaches can help teams navigate those situations
184
Business Intelligence Demystified
better. Of course, some of these principles can be applied to other solutions too, but here the focus is on BI solutions.
Something is better than nothing
In case of BI something is better than nothing as long it is clear what that something is, what is available, and what is not. Don’t go for an all or nothing approach in BI. It would really be a mistake. Start with something and you will move slowly towards everything. Don’t wait for every requirement to be clear or every constraint to be resolved to begin the project. This approach of going ahead with something has worked every single time. For example, recently when I executed a not-forprofit mini project (covid-stats.de) for a doctor based in Munich using Germany’s COVID-19 data, it started with what was available. After building the first version and showcasing it to the doctor, I received additional clearer requirements, which were implemented later on. The COVID-19 project started with basic trends, patterns, and metrics related to the number of COVID-19 infections, deaths, and number of recoveries as this was already available in the source data. After reviewing the data visualizations, the doctor stated his requirements and asked to include doubling rate, showcasing comparisons between select cities, corresponding state and country (Germany in this case) in the same view. This is how it works in BI, showcase what is available, so that BI users can provide better requirements. In BI, in most cases, only after the users have seen the first version, can they provide real requirements. So, start with something, that is deliver something and you will move towards everything.
Think like an owner
As a BI business analyst, it is important to think like an owner or the executive manager of the business/business unit. That is, “If I own this business, what are the KPIs, metrics and dashboards that will help me to; run the business better, see more opportunities, get a quick overview of business performance and business health, and enable me to make better decisions”. This way of thinking will support in envisaging useful metrics, KPIs, and dashboards even before BI users (management/ owner) have thought about it or requested it. Usually, in other applications (example an ATM), you have to think how an action will be carried out by the end user (ATM user in this case), but in case of BI, you have to think what data or information will help in decision-making. So, it is important to think from a management or an owner’s point of view.
Gather requirements from owners
BI requirements should be gathered directly from owners/custodians of the profit and loss of the entity, from those who have the highest-level decision-making powers in the business entity. BI teams should be aware that in every organization some people are decision makers at the organizational level and some are implementers. Non-users of BI may not be able to understand the requirements of the decision
Ideas for Success with BI
185
makers and therefore gathering BI requirements through intermediaries may lead to gathering incorrect requirements. Additionally, discussing BI requirements with those employees whose jobs could be at risk when BI gets implemented is definitely not going to lead to intended results. Some employees may even want to delay the implementation of BI as much as possible and also keep BI teams away from potential users of BI. Some middle managers may not want BI because it may reveal their fraudulent activities. So to be successful with BI always gather requirements from the highest-level decision maker.
Deliver even if it is small
It doesn’t matter which methodology or management techniques you use, start delivering even if it is a small deliverable instead of holding it for a big bang delivery approach. Today’s organizations go through organizational and people changes at a much faster rate than in the past. Such changes could change (increase or mostly reduce) the relevance of recently built solutions or even make them obsolete. Therefore, it is better to deliver as soon as possible and as frequently as possible. By delivering early on, it engages the user and keeps them interested while allowing them to provide valuable feedback which helps to improve the solution. Also, once users appreciate the BI solution and are benefitted by it, they also naturally become promoters of the solution.
Visualize before building
Create data models, designs, architecture diagrams, mock-ups, flowcharts, and data flow diagrams before coding. As there are many components and processes involved in a BI solution, and in most cases multiple tracks (frontend, backend, analysts), it is best to visualize the solution to bring everyone on the same page before beginning to code. In case of reports and dashboards, if users have not provided mock-ups or do not have the capacity to provide mock-ups, then create a mock-up based on your understanding and get it reviewed by the BI user. The flow should be Envision Specify Design Build Test Roll out. Most of the differences in understanding of the requirements are resolved by first creating mock-ups, models, and architectures before development starts.
Tools and technologies selection based on specific requirements
While choosing the technology and tools, select it based on the specific requirements of the organization, and not based on what seems to be popular, or what other companies (sometimes in totally different industry) have chosen. For example, in one organization, real-time BI could be a must have requirement, whereas for another organization, it probably isn’t useful at all. If all of your source applications are from a single software vendor, it makes sense to check what options this software vendor provides for BI before involving a new software vendor. Existing software vendors’ BI tools may have prebuilt integration to the source applications provided by the vendor.
186
Business Intelligence Demystified
Identify BI champions and promoters
You need BI champions within the team and also need to identify the promoters of BI across business units. Both BI champions and promoters play important role in increasing BI adoption in the organization. BI champion is one who is aware of almost all possibilities with BI and creates awareness among others. Some of the early adopters may become the promoters. These promoters will be in better position to sell the idea of BI to other potential business users who are sceptical about it. A business user who has benefitted by using BI in a department or even another company from past experience could be one of the first promoters. Identify those potential promoters as the first set of users who prefer/enjoy using BI solution and are willing to support in creating awareness about the solution. Develop more BI champions within the team by encouraging them to attend relevant BI trainings, conferences, etc., from which they gain more knowledge about usage of BI.
Choose the right layer
There are no definite rules on where the calculations (metrics, KPIs) should be done, whether in the ETL jobs (backend) in the data processing layer, or in the information presentation layer (frontend). If the calculation is complex and takes a lot of time, more time than users can usually wait, it should be part of the transformations in the ETL jobs so that it is already calculated and doesn’t slow down the report refresh. The downside with that approach is that, when a calculation has to be changed, developers are required to make changes to the ETL job. On the other side, if the calculations are simple with negligible time taken for calculation during report refresh, it can be done on the fly within the report during the report refresh. This approach allows users to easily change the calculations without involving the BI team. However, to avoid duplication of calculations within different reports, it’s better to retain these calculations at the semantic layer (layer between data storage and information presentation). Architects and developers will have to consider the specific requirements and choose the right layer where the calculations are placed.
Promote BI usage
Heavily promote the usage of BI solutions for BI purposes (decision-making, business performance management, finding business opportunities, and identifying problems) and actively prevent the use of BI solutions for non-BI purposes. While it may seem absurd to some to even state that BI solutions should be used for BI purposes, the fact is, BI solutions get misused. If BI teams don’t prevent non-BI usage of BI solutions, they might end up in a situation where they do not have time to work on BI tasks. For example, in one of the companies, there was a special request from a customer to provide reconciliation files, which has nothing to do with BI. Management agreed to provide reconciliation files and according to them it was to be done using the BI solution as other teams were not working with data. When this requirement was placed to BI team, it was already late for discussion/ rejection. BI team ended up building the reconciliation files and thereby lost months
Ideas for Success with BI
187
of development time that could have been used for BI work. In another instance, what started off as a set of data extracts specific for a customer, ended up as the de facto data extracts for all customers, again which led to loss of development time. Right from the start ensure that BI team members are engaged only in BI work.
Promote Self-service BI
Promote self-service BI. Let users get insights directly without the BI team having to come in between information and users. BI teams should be like a bridge that enable users to connect to information and insights, and should not be like a wall that blocks users from accessing necessary information and insights. BI teams should work on building the self-service BI platform and ensuring that the self-service BI is usable by non-technical users too.
Interactive dashboards over static reports
Replace static reports with interactive dashboards, retain static reports only if there is a real requirement. A well-built dashboard could be equivalent of 10s of static reports. No amount of static reports will be enough, one or the other dimension or metric that a user requires will be missing. On the other side, to include every metric for every dimension will result into a heavy static report that will extend to 100s of pages, which is not desirable. It’s better to create dashboards which contain the basic or commonly required data visualizations and also includes the possibility to analyze all available metrics across all available dimensions so that interested users can drill down and get information as and when required.
Provide governed datasets
We cannot expect either the current BI users or BI teams to have the forethought about all the possible metrics and KPIs that may be required in the future, it’s simply not practical for them or anyone to have that foresight. So what BI teams could do is to focus on delivering governed datasets on top of which any metrics or KPIs can be created as and when required by the BI users themselves. Multiple layers of governed datasets can be provided such as raw layer, integrated layer, and optimized for analytics layer.
User onboarding
Onboard new users to BI as soon as possible. Create processes that makes it easy for them to sign up to become a user. One of the companies bought licenses of a BI reporting and analytics platform worth hundreds of thousands of euros but was not using any of it for over a year because they didn’t manage to find the time to deploy the software, what a waste! On top of that, after deployment, whenever an enthusiastic potential user wanted to become a user, the process was so cumbersome that the potential user would drop out during the onboarding process. In another organization, around 25% of the users had access to BI but didn’t use it, they didn’t
188
Business Intelligence Demystified
even remember they had access. Onboard as many users, provide initial and periodic training, user guides, and ensure adoption rate is high. It is ok to promote BI to more people than the number of licenses available. If there is more demand than the existing number of licenses, either increase the budget or reallocate some of the unused user accounts.
Common working definitions
As we have seen in earlier chapters, there’s quite a lot of misunderstanding about BI. Usually, we find ourselves in situations where each team member has different views on what BI should provide, how it should be implemented, and what processes should be included/excluded. It is important that a BI team establishes common working definitions and ensures every team member is on the same page. Without common working definitions, a lot of effort could be spent in the wrong direction losing both time and money.
Data granularity
Different levels of granularity of data should be maintained in different parts of the BI solution to meet different requirements. In the topic of data granularity, one size fits all approach should not be used. For example, to answer a question such as, what percentage of customers are new customers? A customer’s personal details are not required, summarized data is sufficient. Let’s say the answer for that question is 20%. Now, if the BI users would like to retrieve the list of names, gender, and email ids of those 20% people, then access to the name, gender and email address data becomes essential. There can be various reasons to require access to customer level data, such as upsell, cross sell, targeted campaigns, offers, promotions, birthday wishes, notifications, and reminders. So, one may argue, why not store all of the data in the most granular level and provide access to every BI user as that would ensure that the BI users have access to everything. This approach wouldn’t be a good solution for mainly two reasons: 1) Providing all BI users access to personal data of customers, including even to those who don’t require it to do their job would be a violation of data protection regulations such as GDPR. 2) Report or query performance will be negatively impacted as it will have to deal with more amount of data. Storing either only summarized data or only the most granular data, both have their own advantages and disadvantages. Therefore, to satisfy the requirements of different stakeholders, sometimes even same stakeholders at different points in time, a proper data architecture with the right combination of data granularities should be developed. For example, consider that a source system has customer data as provided in Table 7.1:
Ideas for Success with BI Name
Gender
Customer 1
O
Customer 2
[email protected]
F
[email protected]
F
[email protected]
Customer 3
M
Customer n
M/F/O
Customer 4
Email Id
189
First time purchase 1 0
[email protected]
1 1
[email protected]
x
Table 7.1: Sample of customer data in a source system
Assume that a copy of the customer data table is available in the staging area (data lake) of the data warehouse and a sample summary table as provided in Table 7.2. is also available in the data warehouse. Gender
Number of customers
First time purchase
F
200000
10000
O
100000
M
300000
5000
20000
Table 7.2: Customer data summary table
To answer the question “how many people fall under the first-time purchase category?”, the BI user doesn’t require access to the personal data (Table 7.1), access to a summary table as provided in Table 7.2 is sufficient. This approach is better in both aspects, that is data security and performance. Access can be provided to only those users who require access to personal data for carrying out their job, that is need-based access instead of a blanket access.
Requirements matrix
For gathering requirements from business users, a dimensions and metrics requirements matrix (DMRM or DM requirements Matrix) can be used. DM requirement matrix can be easily understood by both business users and BI teams. A dimensional model (logical data model) can also be used for gathering the requirements, however, some BI users may not be familiar with data models, or may consider it as too technical (even if it’s actually not). So, an easy to understand matrix in an excel file or any spreadsheet, which most users are familiar with makes
190
Business Intelligence Demystified
it easy and engaging for the business users. A partial sample DMRM for Walget is provided in Table 7.3: All time period Gender Location Age Payment method …..
# of
Churn rate
Customers X X
Salary
# of on-time deliveries
# of SKUs Average transaction value Sales per square foot
All time period Location Type …..
Suppliers Employees X
X
X
All time period Gender Location Ethnicity Age Department …..
X X
X X
All time period Product category Location …..
Products X
All time period Location Manager Type …..
All time period Gender Location Age …..
Stores
…………..
X
X
X
X
X
Table 7.3: A partial sample dimensions and metrics requirements matrix
In Table 7.3, the second row is the list of dimensions which provides the context for analysis of the metrics. The first column is the list of metrics. In the first row, above the row of dimensions are the attributes of the dimensions based on which the metrics can be sliced and diced. The X in the table indicates that a particular metric is relevant (as requested by a business user) for a dimension. It’s simple to interpret this matrix for business users, for example, the first metric “# of” means number of, it could be number of customers, number of employees, number of stores, number of products, and so on. In this particular case as # of is applicable for each of the dimensions, the intersection is marked with an X under every dimension. Now let’s look at how the attributes are used for slicing and dicing. For explanation purposes, let’s consider the first metric (# of) and the first dimension (customers) as shown in Table 7.4:
Ideas for Success with BI
191
All time period Gender Location Age Payment method …..
Customers
# of
X
Table 7.4: One metric with one dimension
From Table 7.4, it’s easy to understand that BI users require the number of customers metric. By listing the attributes, for example, gender, it means that the number of customers per gender is also required by the BI users, similarly number of customers per age group, number of customers per location, and so on are all required. You will notice that “all time period” is mentioned across all of the dimensions, this is usually the case with BI solutions, most metrics are required at various time periods such as daily, weekly, monthly, quarterly, half-yearly, and yearly basis. Some of the metrics maybe required at an even more granular level such as hourly. Users are interested in knowing, for example, number of customers in a day, week, and month. The X for a metric across multiple dimensions means that the metric is also valid for a combination of dimensions. For example, consider Table 7.5 with two dimensions of customers and stores with 1 metric “# of”: All time period
All time period
Gender
Location
Location
Manager
Age
Type
Payment method
…..
…..
# of
Customers X
Stores X
Table 7.5: One metric and two dimensions of a DMRM
It means BI users have a requirement to know not only the overall number of customers but also the number of customers per store. As explained earlier, for every dimension, all of its attributes are used for slicing and dicing, for example, it should be possible to get answers from a BI solution for questions such as how many male
192
Business Intelligence Demystified
customers shopped in a store? How many female customers shopped at all stores in a particular location? How many male customers shopped at all stores of a particular type/format? As you can notice, the requirements matrix is very useful as it helps both the business users to provide requirements and BI teams to capture and understand requirements.
Avoid complex visualizations
Currently it’s become a trend to create fancy and complex data visualizations. The fancier and complex they look, the more they are appreciated on social media. A simple chart that does its job perfectly such as pie charts are ridiculed by some selfclaimed experts. There are data visualizations being created with an intention to showcase the coding and software development skills of a developer rather than an intention to help the decision maker decide. These fancy data visualizations are like showpiece items, it may attract some people (not necessarily for whom the data visualization should have been relevant in the first place), it may be useful in a speaking event, but these are not the ones that can necessarily be used in a business setting. The whole purpose of data visualizations is to make it simple for business users (already pressed for time) to comprehend information that is derived from large amounts of data. If business users cannot use it easily and are not able to comprehend it quickly then that data visualization, how much ever artistic it may be, is not useful for the decision maker and it’s not serving the intended purpose. Avoid vanity data visualization that makes it even more complex to understand, it may look fancy and cool but if it takes more time of an executive to understand than a simple chart would take, then simply avoid it. There’s no point in providing twopage guidelines or a one-day course to understand a chart.
Avoid non-BI work
Unfortunately, sometimes BI teams are used (misused) for functions they are not supposed to be used for. BI teams end up being misused because of multiple reasons such as skillset of the team, the technologies and tools they work with, and the knowledge they possess about the data of the business. Sometimes BI teams get punished for their own good work. For example, in one of the companies, it was common to hear statements like, “Team A can do this work only after 6 months and it will take them 2 months to do it, but the BI team will take only 2 weeks to build a workaround temporary solution, so let the BI team do the work”. Some BI teams or a few team members may willingly take up non-BI work to be in the thick of action and others may end up with no other option. If BI teams don’t avoid non-BI work, the non-BI work will end up taking most of their capacity. BI work is highly important and strategic, whereas some of the non-BI work could be “urgent” operational work due to poor management. So, how to differentiate between BI work and non-BI work? It’s quite easy. If the work is part of the process in which information and insights are derived from data, that’s BI work. If it’s simply moving data for operational purposes with
Ideas for Success with BI
193
no intelligence derived out of the data, that’s non-BI work. Some examples of non-BI work that end up with BI teams are listed here: • Internal data transfers
o Regular moving of data between applications
o Regular moving of data between departments • External data transfers o Regulatory and compliance data dumps/extracts o Data extracts to partners, suppliers, vendors, customers, etc. • Data migration o One-off move of data from one application to another - could be because of replacement or decommissioning of an application, or mergers and acquisitions, upgrade of software or hardware. • Ownership of data lake o When data lake is used for other non-BI purposes As you can notice, the work listed above are by no means less important or less urgent, they are, in fact, urgent and essential work for the operations of the business. But that still doesn’t make it qualify as BI work. So, what’s the way out? One of the ways is to create a separate team such as operational reporting team, separate from BI team but could use some of the tools that BI teams use. This operational reporting team can take up ownership of all operational data transfers and as and when data migration work comes up, a sub team can be formed by taking some members from the operational reporting team while filling the gap in the operational reporting team with externals. Also, it’s important to educate stakeholders about the difference between BI teams and the operational reporting teams. Some people assume that as BI teams use data lake, BI teams should maintain it too. If BI teams are the only users of data lake, then yes. But if data lake is used as a staging area for BI, and if it is not exclusively used for BI purposes, then data lake could be maintained independently by a non-BI team instead of it being maintained by a BI team. The BI team would be one of the users of data lake. There are other use cases for data lake as it stores data in raw format too. For example, the data extracts, that is detailed lowest granularity data in csv, xml formats required for downstream applications or partner companies can be provided from the data lake. Providing such data extracts is not the responsibility of the BI team. Other use cases of data lake are not in the realm of a BI team’s responsibility.
Avoid one-size fits all approach
Avoid the “one-size fits all” approach for BI as it is a highly limiting approach. Different sets of BI users need different ways of presentation, different levels of
194
Business Intelligence Demystified
data granularity as we have seen earlier, different data freshness, different types of reports and dashboards, and different delivery methods. Acknowledge these different set of requirements and avoid the one-size fits all approach. For example, for one user, having access to the BI dashboard is a “must have” whereas for another user in the same level of hierarchy in the same organization, receiving reports via email is a “must have” requirement and may not bother about access to an interactive dashboard. BI team has to ensure that requirements of different types of users are met in the BI solution.
Avoid fequirement
A fake requirement that appears as a genuine requirement is meant as fequirement here. It’s easy to lose time and effort in dealing with fequirements and thereby ending up not having enough time to deal with real requirements. Business analysts should understand the real need behind every requirement before progressing the development activities related to those requirements. Business analysts play a key role here in prioritizing the right kind of requirements. For example, in one of the organizations I worked, a requirement to build a report based on which billing would be carried out for a specific customer turned out to be a fequirement and was directly rejected. The cost (circa EUR 10K) to build the report was way too high compared to the forecasted billing (EUR 2 per month) it would generate. Apart from the development cost of the report, the combined cost of billing, collecting payments, and reconciliation (circa EUR 500 per month) itself would be more than the per month billing amount of 2 euros. So, there was no point going ahead with that fequirement at that time.
Do not ignore data modeling
Data modeling is one of the most important phases of building a BI solution. A data model establishes the relationship between the data entities. To build a data model detailed knowledge of the business is required and analysis should be carried out data entity by entity and field by field. In the process of building a data model a lot of questions are clarified. To come up with a good data model several stakeholders will have to collaborate and work together. In multiple projects it has been noticed that the development was not progressing in the right direction, and the primary reason was that there was no data model. Data model is one of the most important pieces of documentation. It acts as an agreement between frontend development team members and backend development team members. Development of ETL jobs and reports without an agreed data model is similar to construction of a building without an architect plan. You will not see any marketing about data model, definitely not to the extent of other parts of a BI solution, that’s because, it’s not something that is usually sold by a BI vendor, it’s not a commercial-off-the-shelf (COTS) product. Data models are usually custom built based on the specifics of the business and the specifics of the source applications
Ideas for Success with BI
195
(source data). To build it properly, it requires collaboration between BI development teams, business stakeholders, source application teams, business process experts and data governance teams. A sample data model, one of the two data models based on which the PublicBI EUProc solution was created, is provided in Figure 7.1:
Figure 7.1: Example of a dimensional data model
This data model in Figure 7.1 is a type of data model known as dimensional data model which is different from other data models such as the Entity-Relationship model created for OLTP applications. In the center of the dimensional model there is a fact table (F_EU_PROC_CA) which stores the metrics for all combinations of the dimensions. The surrounding D tables are the dimension tables. Fields in the dimension tables are the basis on which the metrics are analyzed. One of the best and simplest explanations of the techniques and terminologies in dimensional modeling is provided in the Kimball Dimensional Modeling Techniques[35] document. It is definitely worth going through that document as it provides an overview of the core concepts of dimensional modeling. For a detailed explanation you can refer to the book, The Data Warehouse Toolkit - The Definitive Guide to Dimensional Modeling[11]. To avoid repetition, dimensional modeling concepts will not be covered in this book. I would like to emphasize that dimensional modeling is very much relevant even today, no other modeling techniques have replaced it, and have only complemented it.
196
Business Intelligence Demystified
Do not ignore training of users
The importance of training users cannot be stressed enough, yet we see that BI user training is ignored. BI users end up depending on the BI team to get to the answers which they themselves could have found in lesser time if only they were given the necessary training. In multiple organizations it has been noticed that only a small percentage (less than 10%) make use of above 50% of the features available in BI solutions, all other users use less than 20% of the features. A BI team’s job is not completed if they have only managed to build the BI solution but have not enabled the BI users to make use of the BI solution. As usual in businesses everyone, including the BI users, are pressed for time and therefore even if training is organized, it doesn’t guarantee attendance by the BI users. Organizations have to find ways to ensure that BI training is not ignored, for example, include it as an objective for both BI team members and BI users, organize regular training workshops, provide training videos and other materials as part of onboarding a user, and come up with creative ideas such as data puzzles (puzzles for which the answers (metrics, KPIs) can be found only using the BI solution).
Ideas for prioritization
In organizations there are objectives set at every level (enterprise, department, team). In general, these objectives guide the prioritization of topics to be implemented in BI. The ideas for prioritization discussed in this section should be considered within the framework of the wider objectives of the organization. The generic approaches such as solving the low-hanging fruit opportunities or problems, delivering minimum usable solution are well-known. The point is, within BI, how do we find that lowhanging fruit so that it can be prioritized accordingly? This is the question that we will try to address.
Primary data first
Focus on primary data first and then consider secondary data. If an organization does not have the capacity to derive value out of primary data then it shouldn’t bother about secondary data. Once organizations have allocated enough capacity to derive value out of primary data then it makes sense to allocate the capacity to derive value out of secondary data. Primary data is the data that businesses have to anyway store for the operations of the business, without it the businesses cannot operate[36]. The point to highlight here is that, there is no additional cost to design data repositories for primary data or to store the primary data in its raw form. All other data that is not primary but could be useful for deriving information, for example, social media data in some businesses, is considered as secondary data. In case of secondary data there is additional cost to design data repositories and store the data. By building BI solutions quickly with primary data first, BI teams can build confidence among the users. On the other hand by focusing first on the secondary data, it may not provide holistic information without primary data. Also, it could take longer and users may
Ideas for Success with BI
197
lose trust in the team and begin to wonder if the team will deliver anything at all. Users may even look for alternate solutions. Therefore, it is ideal to focus on primary data first. After that point while dealing with secondary data, again prioritize and select the data based on its relevance to the business and based on its expected value instead of selecting based on hype or buzzwords.
KPIs first
KPIs first, all other metrics later. As we have seen in Chapter 2: Why do businesses need BI, all metrics are not KPIs and therefore it is not necessary to create all the metrics. Quantity of the metrics is not what will help a business, it’s the quality. In pursuit of building KPIs, some metrics will anyway get built, and that is fine, and that is how it should be. The focus has to be on developing KPIs. If a BI solution is able to provide KPIs, it implicitly means that multiple metrics are already available, but the other way around is not true, you may have a lot of metrics but you may not be able to calculate a particular KPI. For example, let’s say one of the KPIs is fraud rate, to calculate this you would need the total number of transactions and the list of transactions that are marked as fraud transactions. So, if you have calculated fraud rate, it means you have other metrics (number of transactions and number of fraudulent transactions). But if you only have transactions data, and fraud transactions are not marked then you cannot calculate the fraud rate. So, first focus on building the KPIs, and bring in the metrics and dimensions necessary to derive the KPIs.
Descriptive analytics first
To be able to use predictive BI and prescriptive BI capabilities, descriptive BI should first be in place, without which the predictions and prescriptions are more of guess work and can lead to unintended negative results. Therefore, first focus on building a solid data foundation on which descriptive BI is built and then extend the solution with the predictive and prescriptive capabilities.
Apply the 80/20 rule
The well-known 80/20 rule applies to data too. Usually, we can get 80% of the benefits from probably 20% of the data, or in other words, 80% of the questions can be answered using 20% of the data. So, the approach should be to first identify that 20% of the data that is highly relevant for business improvement which can answer 80% of the questions.
Ideas for BI users
In this section, the ideas or points covered are intended to support BI users in building a better BI solution and also helping them get the best out of the BI solution.
198
Business Intelligence Demystified
Ask what you need
Ask what you need and not what the BI team has. No, the previous statement is not a mistake, it’s exactly what I intend to say. BI users shouldn’t cutdown their requirements based on what is currently available in the BI solution, instead they should specify what they actually require from BI to help them run their business unit better, to make better decisions and take decisions in-time. BI users have to put some effort in thinking what (metrics, KPIs, dashboards, etc.) they require and then let the BI team figure out how to fulfil that requirement. Too often we notice that business users are using manual workaround solutions instead of obtaining a permanent solution from BI team. For instance, in one organization, it was found that two employees were assigned to create quarterly summary reports and send it to the management of that business unit. This report could have been implemented in the BI solution if the BI team was made aware of this requirement. Years ago, a process was setup for data extracts to be sent to that department on quarterly basis. That department continued to use those data extracts to create summary reports manually instead of getting it automated through BI solution. BI users may assume that they are helping the BI team find a workaround, but it’s almost certain that once that workaround is in place, it stays as the main solution, and every new user has to then learn the workaround. If users stick to conveying what they actually require instead of finding workarounds, it will ensure that every BI user, including future users, benefit and don’t have to live with the workaround solution.
Create a mock-up
One of the best things that a BI user can do is to provide a mock-up of the requirement to the BI team. Creating a mock-up of the required dashboards/reports will clarify a lot of questions both for the user and for the BI team. For those who are not experienced in BI it may seem that I am attributing more importance to mockups than it deserves. Trust me, mock-ups are very useful. On several occasions it has happened that the initial requirements provided in text format by users either changed drastically or turned out to be invalid when asked for a mock-up. To create a mock-up, the users have to think through the requirement in more detail and in a more structured way compared to requirements in text format. There is no need of sophisticated mock-ups, a simple excel file with table and optionally charts is good enough. It can be as simple as shown in Table 7.6. Use dummy numbers, however, based on the knowledge of the business, provide realistic dummy numbers.
Ideas for Success with BI Month Jan-2020
Number of products sold
Sales volume in millions of EUR
Number of employees
1050
8.06
12010
Feb-2020
Mar-2020 Apr-2020
May-2020
1000 1100
1050 1200
8.05 9.04 8.05 10.5
199
12000 12020 12030 12045
Table 7.6: A very simple mock-up of a report
Instead of providing the requirements as only a statement such as “I need the sales volume, number of products sold and number of employees on a monthly basis”, provide a mock-up, it helps you in understanding your own requirements better. You may think that after providing such a mock-up, all requirements should be clear. No, not necessarily, many more questions will have to be answered such as: • Is the information required at a store level or a regional or country level? • What does the number of employees mean? The number of employees could have changed from the starting of the month to the end of the month. Which number should be considered? • By when should this report be available every month? • How to deal with products that were returned back to the store in the same month? Should it be included in the sold category? • What happens to those products that were taken out of the product list during the month? Should it be included or excluded? And many more such questions. Mock-ups make us get to important questions sooner. That’s why mock-ups of reports or dashboards are very important. A simple mock-up with dummy values can provide more clarity about a requirement than pages of descriptions. Users should create mock-ups based on the expected output. When mock-ups are available there is a lesser chance of going wrong with the implementation.
Provide your time
In case of most software development projects the users of the software are customers. For example, in case of a taxi aggregator application, the users are passengers and drivers, who are not involved in the day to day development of the software. They get to use the software only after it is launched. Whereas development of a BI solution is not a typical software development project. In case of BI the users are the management (decision makers), therefore management has to invest time as the BI
200
Business Intelligence Demystified
development activities should be in full collaboration with the decision makers. As a BI user you should allocate some time to provide timely requirements and feedback to the development team, the time invested during development will surely save your time and money in the future as you will be able to take better decisions using BI.
Unconventional ideas for BI teams
Ideas provided in this section are again for BI teams. These are ideas that may appear as an incorrect way of doing things at the first glance, however, with some experience and careful consideration, readers would be able to understand why these ideas could be used to achieve success with BI. Caution – these ideas should only be applied after thinking through various scenarios, and whether you are able to meticulously apply them.
Approaches for development
Some unconventional approaches for development are explained in this section.
Authoring instead of development
If we look at it from a fresh perspective, creation of BI reports is similar to creating a MS Word document or PowerPoint presentation, not based on the functionalities or access rights but only based on the impact of the user actions on the underlying software. That is for example, you as a user of MS Word are not changing/editing the software itself. Once the software has been installed, you are using it to create documents. Your usage of MS Word software, no matter how good or bad a document you create, does not break the MS Word software. Similarly, when you as a BI user, use BI reporting and dashboarding capabilities, you are not changing the BI software (reporting and analytics platform) but creating documents (reports and dashboards) and other objects such as metrics, attributes, filters, etc. No matter how good or bad reports you create, it will not break the underlying BI software. So, the important question is, why should report development be considered similar to software development? Why not create the reports directly in BI production environment? Why should reports be created in development environment and then packaged into a release to be promoted to a higher environment (example UAT) and then promoted to production environment? Why not create the report in BI production environment and set the status of the report as draft/work in progress/ to be verified? You don’t create a MS Word document in one environment and then package it and release it in another environment, do you? So, why not apply the same approach to parts of the BI solution where it can be applied? To make this approach work, set the right limits for various parameters such as query run time or number of records fetched in a report.
Ideas for Success with BI
201
Currently report development in many organizations is treated similar to other software development projects. In other software development projects (example ATM software), the software (ATM software) itself goes through changes when functionalities are added, and most likely the user of the application is an end consumer (bank account holder). Whereas in case of BI projects the software such as MicroStrategy, Tableau, or Qlik does not get modified but only gets used similar to MS Word usage, and most likely the users of the application are internal management or management team of the customer (in case of B2B2C business). So, the users can always be informed about the current status of the report and also about the purposes for which it can be used and for which it shouldn’t be used. There isn’t a real need to go through the development cycle.
Start with OLTP
Development of data warehouse-backed BI solution takes time, it takes time to understand the source systems (OLTP systems), build processes to design and develop ETL jobs that will fetch data from OLTP systems, transform the data and load into a data warehouse. It may even take several months before first report can be generated from the production data warehouse. Where possible, for a temporary period, till the time data is available in the data warehouse, create reports directly on the OLTP databases or on snapshots of OLTP databases and schedule the reports such that it runs during the period of least impact on the OLTP database. In this way, when the BA/ETL developers are figuring out the rules of data transformations and designing data models the frontend developers could already create reports/ dashboards, showcase it to stakeholders and incorporate their feedback. This approach can be summarized as follows: • Build mock-ups based on realistic dummy data. • Get mock-up reviewed by the BI user and update mock-up based on review comments. • Based on the mock-up, develop report pointing to OLTP database or already available copies of OLTP database. • Simultaneously build ETL processes to fetch data from OLTP, transform the data as per requirements and load into a data warehouse/any analytical database. • Point report to analytical database once analytical database is ready. There are several advantages in using this approach such as, stakeholders remain involved in the project from the very beginning, they can see that there is continuous progress, they already get used to the frontend tool, the requirements with respect to reports become much clearer.
202
Business Intelligence Demystified
Adopt Agile KABI
Adopt agile KABI methodology if you are at the starting stage of implementing a BI solution to speed up the implementation process. For those who are new to KABI, an introduction to KABI is provided in the next paragraphs. Agile KABI or KABI[37] is an highly effective, iterative, and continuous feedback based agile development methodology for implementing BI solutions from scratch. KABI has very few processes, minimum number of meetings, is least restrictive, frees up the team from non-productive work. It creates an environment for optimal team productivity throughout the project duration by enabling every team member to work to their full potential even when they face several unknowns, resource constraints, dependencies and unpredictable workload. The building blocks of KABI are depicted in Figure 7.2:
Figure 7.2: Building blocks of KABI
KABI’s foundation as depicted in the Figure 7.2 is based on the agile manifesto. Peer inspiration and mutual trust are the two core values. Peer inspiration is about each team member working in an exemplary manner that it inspires every team member to give their best. Peer inspiration should not be confused with peer pressure, it is exactly what is avoided when KABI is adopted. The three guiding principles of KABI are: 1. Prefer on-time healthy breakfast over delicious heavy breakfast ready only at evening. That is, deliver when it’s relevant. There is no point if a breakfast, no matter how delicious it is, is ready only by evening because it’s too late for today and stale for tomorrow. Teams are expected to deliver working
Ideas for Success with BI
203
solutions on-time instead of over engineering and delivering it when it’s no longer relevant. 2. Prefer menu card approach over mama’s kitchen. That is, prioritize building standard products/deliverables over customizations. It is expected of the team to come up with a catalogue of standard deliverables, focus first on building it and make it available to users, only then to work on customizations. It is expected that the team focuses first on building reports, dashboards, metrics, KPIs, etc., that are required by majority of the stakeholders instead of initially working on custom requirements of a few. 3. Prefer common sense and facts over theories, books, models and everything else. That is team is expected to choose those approaches, methods, models, etc., that works best for their requirements and not cite what is supposed to be the traditional ways of implementing something, which may not be practical for the current situation. Daily demo and sync-up (DDAS) is an effective practice introduced in KABI, it’s for an intra-team demo and sync-up on daily basis. DDAS is a better replacement for daily stand-ups. A maximum of 30 minutes is allocated for DDAS in which each of the team members one by one will not only talk about work that was carried out but also actually show the work that was carried out and then talk about the planned work for the day. By showing the in-progress work on daily basis there are more chances to quickly detect any deviations as all team members are viewing it. It also ensures that entire team is in sync and avoids redundant/conflicting work. DDAS not only ensures that the progress is very transparent to all team members but also ensures continuous knowledge sharing within the team. As aforementioned DDAS is mainly for the BI team, however, guests such as BI users and other stakeholders are welcome but they must remain as silent listeners. No additional preparation (for example presentation slides) is expected for DDAS other than keeping the original work ready to be shown. As-is work status should be shown by the team member to seek feedback from all team members. There are some rules as listed below that should be followed to ensure that DDAS is effective. • All team members should make themselves available for DDAS and prioritize DDAS over other work. They should be ready with what (reports, ETL jobs, scripts, test cases, user stories, etc.) they would like to show in order to save team’s time. • During DDAS only quick clarifications are allowed, all other discussions are arranged outside of DDAS. • No repetitions allowed, only incremental work done on the previous working day by that team member needs to be shown.
204
Business Intelligence Demystified
• Every team member has the right to question every other team member about what was accomplished in the previous working day in case there is nothing shown by a team member. Initially it may seem that DDAS is an overhead but let me assure you that meticulous application of it will lead to realizing great benefits. Regarding the practical aspects of how to organize DDAS, each team can choose what works for them. For example, if team is co-located in the same room then DDAS can be conducted in the team room either by each team member sharing their screen on a big monitor in the team room or by team members moving from one desk to another as each team member shows his work. If DDAS is done online then obviously one of the conferencing tools that includes screensharing can be used. Roles in KABI: In KABI there is no scrum master role, there’s an agile product owner and the technical team that implements and maintains the BI solution. Every team member has equal responsibility to ensure that the agreed processes are followed by the team. KABI team is self-organizing team. Technical team handles both development and operations. Working process of KABI: Although development is iterative in KABI similar to Scrum it is not time boxed as in Scrum. Roadmap planning (high level planning) and the releases are ad hoc on need basis. Providing feedback and making improvements are continuous and a daily routine, there are no special days or meetings for that. Product owner notifies the stakeholders (users, management, support team, etc) as and when there is a new product increment. Stakeholder demos are once every few months (large audience) or on need basis (deliverable specific audience). Product owner writes all requirements as user stories such that the implementation of that user story results into smallest concrete deliverable independent of the size or complexity of the deliverable. As there is no time boxing there isn’t a need for breaking a user story based on time required to implement a user story. Complexity and effort estimate of a first-time task are only determined after work on that task has started. For everything that the team must do as part of work, there is a user story, whether it is to develop a new BI dashboard, acquiring a hardware, or hiring or onboarding a new team member, everything is a user story. This approach helps in planning and reporting. User stories are prioritized only by the product owner. User stories are tracked on a customized Kanban board (high level board or KABI board). In the high level board only two statuses (to-do and in progress) of the user stories are tracked, once the user stories are completed it automatically moves out of the board. Every team member has access to KABI board but only product owner can change priorities of user stories. Sub-tasks within the user stories are tracked on a detailed level customized Kanban board. Detailed board helps technical team to track each of the sub-tasks to closure. These boards for example are setup in application such as Atlassian Jira.
Ideas for Success with BI
205
Developers always pick up highest priority user story from the to-do list, analyze it, create as many sub-tasks as required to complete the user stories. Each sub-task is to be completed in one working day as per that team member’s expectation. Team members implement the user stories based on the available capacity. Releases happen as and when a product increment/deliverable is ready. If it makes sense to consolidate some of the deliverables before release then it is done so. It’s one of the responsibilities of the product owner to ensure that the technical team is not distracted by users/other stakeholders, for example, users directly providing requirements to the developers or changing priorities. Developers should get continuous uninterrupted time for development activities and not be distracted, especially because of meetings that are not relevant for the developers. It is a common observation that most stakeholders claim that their requirement is the most important and most urgent. Only product owner has full knowledge of all of the requirements from all stakeholders, and therefore is in best position to prioritize the requirements. Weekly status report (WSR): On the last working day of the week a weekly status report is sent out to management and other stakeholders. The WSR consists of: • Which of the planned activities were completed this week? • What unplanned activities were carried out this week? • What was delivered in this week? • What is planned for next week? • Any blockers for next week (meaning it requires management support)? • Upcoming deliverables in the short-term (next 3-4 weeks) with exact dates and current status • Any additional information such as team member going on vacation, training, etc. Benefits of KABI: The main benefit of KABI is that with KABI, BI implementations can be faster compared to other methodologies such as Scrum and the team doesn’t burnout to achieve faster implementation. This benefit is achieved because of following reasons: •
Team spends very less time on non-productive tasks and more time on developing the product.
•
On daily basis the completion status of tasks is transparent to all team members and therefore avoiding last minute surprises. Defects are detected earlier as there is daily demo/show and therefore reduces rework at a later stage.
206
Business Intelligence Demystified
•
Team remains highly flexible (no sprint commitments) and therefore it can accommodate unplanned work at any time with no/minimal overhead.
•
There will be fair distribution of workload and no team member will be over loaded.
A comparison of scrum and KABI is provided in Table 7.7: Roles
Daily meetings
Sprint planning
Delivery Feedback process
Scrum
KABI
Scrum master, product Product owner and technical owner, and development team (developers, quality team (developers and quality engineers, and admins) engineers)
Daily stand-up, usually 5 to 15 DDAS, usually 20 to 30 minutes. minutes. In some of the teams Work progress is actually seen. this has become just a formality to ensure team members come on time and no useful info is shared. Story point are assigned to user stories. A set of user stories are planned for a sprint (usually 2 weeks). At the end of sprint.
No sprints and therefore no sprint planning. Teams picks up a user story as and when there is capacity to pick up. As soon as the user story is completed.
At the end of sprint as part of Daily routine. sprint retrospective.
Communication Sprint commitments are shared Weekly status report irrespective to stakeholders every sprint cycle. of deliverable status. User story
User stories should be broken down to less than sprint duration. Extra effort and time spent in breaking down the user stories.
Demo
Usually demo is at the end of On daily basis mainly for sprint for all including team team, guests are also welcome. members. Need- based demo for other stakeholders and a demo every 2 months for larger audience.
As there is no time boxing, user stories are deliverable based. For example, irrespective of whether one dashboard takes 1 week or 1 month, it is still implemented in one user story.
Ideas for Success with BI
207
Unplanned higher Unplanned higher priority task Unplanned higher priority tasks priority task would break the sprint, effort are simply executed by placing required to rearrange sprint. other activities on hold. Handling impediments
Impediments are handled by Impediment are handled by any the scrum master of the team member (usually the team member facing the impediment). It is escalated to management when necessary through the WSR. Table 7.7: Comparison of KABI with scrum
KABI was created based on practical challenges that were faced while building a BI solution from scratch. Following are some of the realities that are overlooked usually in other methodologies which KABI actually takes into consideration. •
In a new project of building BI solution, team is still beginning to form, that is, there may be one or two members at the start and more members join in at different points in time.
•
Some team members leave and some of those positions are filled during the project, it is possible that there are gaps or overlaps, that is, number of team members is not constant. Also, even if the number of team members is constant, team capacity is not constant due to vacations, unplanned leaves, training, etc.
•
All team members are not at the same level of experience and expertise. All team members may not be skilled in all of the tools used in the project. Some of the user stories can be implemented only by some of the team members, while this is not desirable, this is practical challenge to be aware of.
•
At the beginning, infrastructure is still not setup, it needs to be setup/organized as part of the project. BI team may not have any control over the timelines for infrastructure setup. Deliverables from BI team could be delayed because of delay in infrastructure setup.
•
Business users/stakeholders usually do not have the time to provide continuous (for example daily) feedback, product owner has to provide feedback and answer on behalf of the stakeholders and therefore should have domain knowledge and very good understanding of real requirements of the BI users.
•
For a first-time task, the effort (person days), complexity and time (schedule) required to implement is not known until the work has actually started.
To consider such challenges and realities, KABI’s processes are simple and flexible. For example, there is no maximum number of WIP (work in progress) user stories/ items in KABI as compared to Kanban. A team member can pick up a user story and
208
Business Intelligence Demystified
if stuck because of dependency on another team, is free to pick up another user story from the prioritized list while the other team resolves the dependency. History of KABI: KABI was put into practice since 2016. The name KABI was derived from Kanban and Business Intelligence. To learn more details about KABI please refer www.akvkbi.com. Each of the processes established in KABI are based on my own experience. Here, I will briefly explain how two (DDAS and weekly status report) of the processes came about. DDAS is very useful. In fact, I would say that DDAS is the most useful practice of all within KABI. The idea for DDAS was derived based on the “Show and Tell” approach that was followed in one of my earlier projects in the UK between 2010-2011. In “Show and Tell” approach, as an agile BA/developer one of my responsibilities included, on almost daily basis to sync with the stakeholder (client) and show (demo) to the client the progress (whatever was developed – report or dashboard) and tell(explain) what was done. Using this show and tell approach there was daily feedback on the work and therefore it saved from last minute unpleasant surprises and rework, and thereby met or exceeded client expectations every time. It was based on that experience that I thought why not use the same approach for internal team sync? Why should we just talk (stand-up meeting) about it? Why not actually show what we have done? That’s how DDAS was born. Works great both for co-located teams and distributed teams (with audio/video conferencing and screensharing). I recommend DDAS not only BI teams but also for any other team. It is known since ages that visuals are effective way of communication, there are sayings such as seeing is believing and idioms such as a picture is worth a thousand words which also emphasize on using visuals, yet in practice we not using visuals (for example a report, mock-up or ETL flow) where we could. Weekly status report is a simple report created and sent on the last working day of the week. It captures high level details of what was done in the current week, which of those activities or tasks were planned and which was unplanned work, what will be done in the next week and also captures any blockers that should be addressed by the management. This report helps as it proactively communicates to all stakeholders what will be progressed next week and clearly communicates what was achieved in the current week. It sets a direction for the team. It doesn’t contain delivery dates unless it’s absolutely clear that something can be delivered. Weekly status report idea was based on the idea of fortnightly report that I had come up in one of the earlier projects for a customer in Germany. Up until the time the fortnightly report was introduced at the client site, the work carried out by us (my team) was not transparent to the client managers and the trust level had taken a hit. With the introduction of fortnightly report there was opportunity to highlight the achievements of those two weeks and what work was planned for the next two weeks. This simple report increased the transparency and the trust level also improved. We went on to win and execute multiple projects for that customer. When
Ideas for Success with BI
209
KABI was launched replacing Scrum at Wirecard, we got rid of sprint commitments (which we in any case could never deliver) and replaced it with the weekly status report. If you plan to adapt weekly status report I highly recommend that the weekly status report is created and published by each of the BI team members on a rotational basis instead of it being done by a single person/role all the time. When a team member creates this report by taking inputs from each of the team members there are several advantages, for example, they get a better understanding of the overall project, they get more involved, some knowledge sharing happens, and the team collaborates better. Similarly, each of the processes in KABI are based on experience. After the article on KABI was published in my blog, people from different parts of the world have contacted and informed that they are implementing KABI methodology in their respective projects. I would guess that more people are using it than those who have contacted and informed me. A consultant from El Salvador published an article[34] in Spanish about KABI in 2018, which has helped in creating awareness about KABI in Central America and South America too. I would like to emphasize that KABI has played an important role in ensuring that my ex-team was able to focus on the development aspects (productive tasks) of the BI solution instead of wasting time on administrative (non-productive) topics. Therefore, I would recommend to check out KABI development methodology for your new BI implementation and adopt KABI either in part or full as necessary in your organization. If BI solution is already implemented and is in maintenance/ enhancement phase then KABI’s benefit may not seem significant compared to Scrum, but if you are starting to build BI solution from scratch then it will definitely help you and your team to implement BI solution quicker.-
Ideas to deal with data quality
The points under this section could seem contradictory to what is generally believed as the right approach to deal with data quality issues and some of the points could look like that I’m asking BI teams to shy away from responsibility of dealing with data quality issues, but anyone who has been or is hands-on in building BI solutions and leading BI teams will surely understand why these points make sense. As we have seen in the Chapter 4: Challenges in Business Intelligence, data quality is one of the biggest challenges in BI. To achieve success with BI some specific approaches are required to deal with data quality challenges.
Status of data quality should be transparent
Ensure that the status of data quality is transparent and avoid trying to cover up the data quality issues. If data is of bad quality, it is even more important to make it
210
Business Intelligence Demystified
transparent that it is of bad quality. Understand that it is not your (BI team’s) fault that data is of bad quality. Continue to build BI solutions along with the existing data quality issues, mark it as under development and label the % of data quality and then continue to improve the data quality, not the other way around. Even though the approach of fixing data quality issues before building the BI solution looks more correct, that’s not the case. Avoid the temptation to first fix all data quality issues before building BI solution, you will end up spending all your (team’s) time in trying to fix the never-ending data quality issues and will not be in a position to showcase any BI deliverables. The data quality issues that you are dealing with may not even be known to management. So, make it known, otherwise the stakeholders will find it difficult to understand why it takes so much time to build the BI solution. Keep relevant stakeholders updated about the % of data quality. Let them request or demand for it to be improved. Without this approach, progress will not be seen, budget will not be allocated to fix data quality issues unless they are able to see the impact of it.
Development and data quality improvement in parallel
Related to the previous point, development of BI solution and measures to improve data quality should happen in parallel. Build the analytics data mart with whatever data quality is existing. Let users begin to use it being fully aware about the inherent data quality issues. In parallel, continuously update the quality of the analytics data mart to a point where the data quality can be considered as fit for board reporting.
Data quality issues should be fixed at source
Data quality issues should be fixed at source, BI teams should strive and get management backing to get it fixed at source. Otherwise every downstream application will have to fix the issues, and thereby spend a lot more time and money than if it would be fixed at source once. Showcase the savings from improved data quality, avoid taking ownership of those data qualities issues for which you don’t have any control, identify and raise it to the right teams/departments. Data quality issues are most often thrown over the fence landing into BI team. If BI teams take ownership of data quality without the necessary control, other teams will usually ignore it and continue to introduce more data quality issues as they don’t have to deal with it. For example, in one of the companies, the transaction type field in the source system was misused. It was filled with other values that are not transaction type, this was done to save time on development effort in the source application. While the application worked fine, the billing team couldn’t bill properly because of the errors in the transaction type field, the responsibility to fix the data errors was thrown over the fence to the BI team. Therefore, BI teams should suggest process and technical changes at appropriate times to relevant teams to prevent data quality issues. The development teams of source applications don’t usually understand how data is transformed to information
Ideas for Success with BI
211
because this is not their area of focus. They focus on application and not data. For example, a field in an online order form that is unnecessarily left as free text field instead of restricting it with a list of predefined values could lead to multiple data quality issues as users may enter incorrect values. The way to fix this issue would be as simple as changing the free text field to a select list. So, in general, a change in the way we go about fixing data quality issues is required. Processes should be updated to ensure issues are fixed at the source.
Categorize datasets based on the level of data quality
Categorize data based on the level of data quality. For example, categorize data as: 1. As-is operational data: This is the raw data that can be made available to data scientists and data analysts. 2. Fit for analytics data: Data after some level of transformation which can be made available to data scientists, data analysts, business analysts, and managers. 3. Fit for board reporting data: Data which has been accurately transformed and verified to ensure high data quality which can be used by top management for strategic decision-making purposes. 4. Fit for external reporting data: Audited data which not only can be used internally by management but also can be shared outside, for example, in quarterly or annual company reports. The specific point here is that, just because data is not fit for external reporting doesn’t mean it should not be used internally. And the general point is that the businesses may categorize the datasets based on data quality so that they can choose what they can do and shouldn’t do with those datasets.
100% clean data requirement should not be a blocker
In general, decision makers are OK to take decisions based on information and insights based on data that is not necessarily 100% clean but based on data that is usually considered as reliable (good enough data quality). If they are aware that there are chances of erroneous data, for example, 5 to 10% bad quality data, then if possible, they will exclude the erroneous data and make the decisions based on the rest of the available data (90% or above) or make decisions keeping everything into consideration. In several cases users would like to just get a sense of what is happening in the business to be able to decide and not necessarily need the information perfect to 3 or more decimal places. Let’s consider a scenario in which Walget’s store manager has to select a customer target group for a promotion (marketing) purpose. The requirement is to target the age group that does the least purchases, and it is found that the age group of 18 to 25 years does less than 10% of purchases whereas all other groups were more than 15%, in this case it doesn’t
212
Business Intelligence Demystified
actually matter if the actual percentage of purchases of age group 18 to 25 years is 10% or 12% or 8%, the decision can still be made and this group can be chosen the target group. On the other hand, if reports are sent out to third parties such as customers, suppliers, or partners, and if these reports (assume billing reports - summary of the invoices) are the basis on which the third parties have to make payments then of course the data is expected to be 100% accurate. If these reports are expected out of a BI solution then the 100% data quality requirement for the billing report (not a BI usage) shouldn’t be made to look as a blocker for BI users using it for BI purposes.
Visualize data to showcase data quality
It is generally a known fact that good quality data is required for data visualization and that’s why we first clean and transform the data before we visualize it. We can use it the other way too, that is we can use the power of data visualization to find out data quality issues. To do this, visualize the data with whatever level of quality it is in, that is without fixing the issues, and showcase the results that actually highlights the state of data quality. Poor data quality is easily visible when data is visualized. For example, see a sample chart based on EU public procurement data in Figure 7.3:
Figure 7.3: Poor data quality gets highlighted when data is visualized – source publicbi.com
It can be easily noticed that the year 2006 in Figure 7.3 has very much higher value than any of the years from 2007 onwards, which for a person who has knowledge in this business will be clear that it cannot be true in usual scenario unless the data is of poor quality. Let me provide one more real example to drive home this point. In one of the companies, the Jira tickets were not maintained properly and had too many data quality issues. Data quality issues included wrong type of tickets, wrong assignees,
Ideas for Success with BI
213
incorrect statuses, incorrect prioritizations, several tickets under “others” category, dealing with these issues was a pain for months. A simple initiative to visualize the Jira tickets data prompted full clean-up of the Jira tickets within a few days, because now there was a visual of how bad the quality was which previously was missing. That’s the kind of impact data visualization has. So, use the power of data visualization to highlight the data quality issues.
Conclusion
Several ideas including some unconventional ones that can help you achieve success with BI have been presented in this chapter. Some ideas require time and effort to implement and some are as simple as choosing the right name for the BI solution or project. It is for each individual, team and organization to pick and choose the ones that are relevant for the specific situation. Based on the learnings from this chapter you should now be able to apply the knowledge and make a list of dos and don’ts specific for your BI project or BI initiative. In case you currently don’t have experience in either implementing a BI solution or as a BI user, revisit this chapter after you have acquired some BI experience as that experience will help you in getting a better understanding of this chapter. The focus of this chapter was on how organizations can achieve success with BI, in the next chapter you will be briefly introduced to Individual Business Intelligence (IBI), where the focus is on how each individual can learn more about themselves and improve.
Points to remember
Some of the key points to remember are listed as follows: • Select ideas that best suit your team or organization. Some ideas could be used without any change, for others customize it to adapt to your specific requirements. • Processes should be in place to capture realized benefits and actual ROI. • Processes should be in place to prevent data quality issues. • Avoid buzzwords and hype, consult vendor-neutral consultants before making investments in BI. • It is better to have team names that does not tie down a team to a specific topic but at the same time it is better to have teams focus on specific topics for a certain duration. • Domain knowledge and knowledge of business model is very important for BI teams and critical requirement for analysts. • BI requirements should be gathered directly from decision makers.
214
Business Intelligence Demystified
• BI teams and management should strive to prevent non-BI usage of BI solutions, promote BI usage and promote self-service BI. • Where possible replace static reports with interactive dashboards. • Onboard users as soon as possible and as many as possible. • Maintain different levels of data granularity. • Make use of dimensions and metrics requirements matrix (DMRM) for requirements gathering and documentation, it’s quite useful. • It is very important to first focus on primary data, and in primary data focus on KPIs first. • A complex data visualization is of less value to a business user than a simple data visualization. The whole point of data visualization is to make it easy for the business user to comprehend the presented information and insight. • BI teams should avoid taking up non-BI work. • In BI one-size fits all approach is highly limiting. • One of the most important but highly ignored activity in building BI solution is data modeling. • Give importance to training of BI users. • A simple mock-up of a report or dashboard can provide more clarity on requirements than several pages of descriptions. Insist on mock-ups. • When possible, author reports instead of applying software development process. • DDAS is an excellent practice when done in the right way. Make use of it. • Agile KABI can be used as a methodology for implementation of BI solutions. • Don’t try to cover up data quality issues, let data quality issues be highlighted. Fix data quality issues at source. Use the power of data visualization to highlight data quality issues.
Multiple choice questions
1. One of the departments (for example marketing) would like to build an Enterprise BI solution that will be useful for all departments. Is there a need for C-level backing for this initiative? a) No, as marketing is a powerful department b) No, the initiative is good for all departments
Ideas for Success with BI
215
c) Yes, without C-level support BI solution will not be built d) Yes, without C-level support BI solution will not be an Enterprise BI solution 2. Which of these names is the best name for a BI solution? a) Finance Analytics and Reporting b) Archive Reporting Database c) Information System d) Data Lake Solution 3. A code review checklist of a software development team consists of a mandatory checkpoint “Is master data used for existing data entity?”, why should master data be used for existing data entity? a) Ensures the code runs faster b) Takes less time to code c) Reduces chances of data quality issues d) No benefit 4. A subsidiary of Walget in India has 3 BI teams, which of these name sets are better choice for the 3 BI team names? a) Sales, Marketing, and HR b) CRM, ERP, and Billing c) Honda, Suzuki, and Yamaha d) Cloud, Enterprise, and Global 5. What does DMRM stand for in the context of requirements gathering in BI? a) Data management and reporting management b) Dimensions and metrics requirements matrix c) Data management requirements matrix d) Data measures and reporting management 6. What is the order in which Walget should prioritize integrating the data sources into BI solution? a) Transaction or purchase data, customer service data, competitor data, social media data
216
Business Intelligence Demystified
b) Customer service data, social media data, transaction or purchase data, competitor data c) Social media data, customer service data, transaction or purchase data, competitor data d) Competitor data, social media data, customer service data, transaction or purchase data 7. What is fequirement? a) A real requirement that should be dealt on first priority b) A fake requirement that appears as a real requirement c) It is a functional requirement d) It is an important non-functional requirement 8. Which of these points below are usually ignored but ideally shouldn’t be ignored? a) Data modeling b) Training users c) Documenting actual ROI d) All of the above 9. DDAS stands for? a) Data, dimensions and storage b) Derived data and storage c) Daily demo and sync-up d) Daily demo and support 10. Agile KABI methodology can be useful for? a) BI projects b) Non-BI projects c) Quicker implementation d) All of the above
Answers 1. d 2. a
Ideas for Success with BI
217
3. c 4. c
5. b 6. a
7. b 8. d 9. c
10. d
Questions
1. Why organizations end up with redundant BI solutions? 2. Why should management be open to both the options of buying some software and building some software when it comes to building a BI solution? 3. Why should BI requirements be discussed with the decision makers and not with intermediaries? 4. Come up with a requirements matrix for a burger chain. 5. Why data model is considered as one of the most important pieces of documentation? 6. Based on what was the name KABI derived? 7. Based on which approach was DDAS created? What is the main difference between the earlier approach and DDAS? 8. Why is it recommended to deal with data quality issues using unconventional ideas? 9. Why BI teams should venture to fix all data quality issues before building the BI solution? 10. Why should development of KPIs be prioritized over other metrics?
218
Business Intelligence Demystified
Introduction to IBI
219
Chapter 8
Introduction to IBI A
ll of the previous chapters, including Chapter 7: Ideas for Success with BI in which various ideas to achieve success with BI have been covered, have focused on BI in a commercial establishment, institution, or organizational context. In this chapter, we will make an intentional short diversion to introduce Individual Business Intelligence (Individual BI or IBI), that’s how BI can be applied at an individual level and how BI can be used for self-improvement. This diversion is required because of the importance that IBI deserves. I would like all of us to succeed in all aspects of life and not only at work. Over the years, starting from 2017, I have already shared a lot of information about IBI in my BI blog (https://www.akvkbi.com/) [38]. Therefore, only an introduction and a few steps to help you get started with IBI is covered in this chapter.
Structure
This chapter is structured as listed as follows: • What is IBI? o Points and connection o Trigger for IBI • How to start with IBI?
220
Business Intelligence Demystified
o Generic steps o Specific steps Learnings
Objectives
To introduce Individual Business Intelligence (IBI), triggering your interest in IBI, and understanding what triggered the idea of IBI. We will also learn how to get started with IBI and understand the data capture process involved in IBI.
What is IBI?
As defined in the IBI blog[38], Individual Business Intelligence is the self-reflection with consciously self-captured data about oneself, that is, getting to know oneself better using one’s own data, discovering oneself factually, or in other words, you getting to know yourself better with your own consciously self-captured data. The same way in which we can understand an organization better and improve it by capturing and analyzing data about its activities and processes, similarly we can learn more about ourselves better and thereby make plans to improve ourselves by capturing and analyzing data about our activities, thoughts, and ideas. It is based on the premise that if organizations can improve using BI then so can individuals. Note that the word business in Individual Business Intelligence refers to something that involves someone personally, in the same sense of usage as, for example, “Mind your own business” or “It’s none of your business”, we don’t mean business as a commercial establishment but one’s area of concern. Whereas data here refers to both quantitative data and qualitative data. IBI logo, in use since 2017, is provided in Figure 8.1:
Figure 8.1: Logo of IBI
Introduction to IBI
221
Wouldn’t it be great to identify a trend or a pattern in your own life? For example, how many days of the week do you dream? Which day/s of the week do you dream? How many days in a week are you happy? Which day of the week or which month of the year do you often fall sick? How many days in a year do you get ideas or does the moon phases have an actual impact on you? Once we know the trends or patterns in our lives, it becomes our choice to continue them or to change them at our own discretion. The point is, we can use the knowledge we gain from our own data to our advantage. But how can we identify those trends and patterns if we don’t have the necessary data organized in a way it can be analyzed? That’s where IBI comes in. It helps you capture your own data and organize it in a way that can be analyzed by you. IBI has personally helped me in learning more about myself. Evidently, I wouldn’t have realized some of the aspects about myself if I hadn’t used IBI, for example, see couple of the charts provided in the Figures 8.2 and Figure 8.3:
Figure 8.2: Daily number of hours sleeping and sick leave pattern
Figure 8.2 is one of the many charts that is available in the IBI template that I use. This chart shows my daily sleep hours from 19th Feb 2017 until end of October 2019 together with the sick leave information. By visualizing the sick leave data, it was easy to notice a pattern that every March (2017 to 2019) I have a taken a sick leave. By deliberating about the cause for it, I realized that it is the time when the season changes from winter to spring and I understood that it could be because of pollen allergy. Based on this knowledge, to keep the explanation short, I did my best to avoid falling sick in March 2020. The point is, by using IBI, there is a possibility to
222
Business Intelligence Demystified
learn more about oneself and to change for the better. Now, let’s take a look at the dream pattern in Figure 8.3:
Figure 8.3: Day wise dream pattern
The day-wise dream chart provides information about days on which I have a higher chance of dreaming. As can be seen from the chart, on Tuesdays there are higher chances of dreams for me. Similarly, I have learnt more patterns about myself related to air travel, role changes, sleep hours, etc. One may wonder, is this kind of information useful? What is the purpose of this? As of now I don’t know how this dream related information and a few other data/ information that I capture could be useful and I haven’t put some of it to any use yet. However, my assumption is that a lot of things in life are connected and we don’t realize the connections until we dig deeper. I collect over 100 such data points on a daily basis without fail since 19th Feb 2017 and most of the data points are already visualized. The number of data points to be captured depends on each individual and their interest in what set of data they would like to capture and accumulate. Without collecting data, it is very difficult if not impossible to find out the connection between different life events, and to identify trends and patterns. This is better explained with an analogy of points and connections.
Points and connections
Let’s try out a mind exercise – please don’t use a pen and paper. Imagine there are 4 dots (points) and straight lines connecting each of these points are drawn as shown on the left in Figure 8.4:
Introduction to IBI
223
Figure 8.4: Points and connections
With 4 points, drawing lines connecting every point to every other point should be quite simple. Now try the same mind exercise with 9 points, that is, mentally draw lines that connect each of the 9 points to each other. This can get very difficult, isn’t it? The number of connections will be equal to n(n-1), where n is the number of points. 2
However, when we put those points on a paper and draw the connections as shown on the right, it’s not that difficult. The connections are visible. As the number of points increase, the level of difficulty in establishing connections mentally increases, and more connections may remain invisible. In the same way, when there are many data points and if those data points are only in the mind, it’s very difficult to see all of the connections, and we may miss several of them. But when data points are captured and visualized, the connections (relations) between the data points come to the surface. This is why it’s very important to capture data in a way it can be analyzed. With advanced BI tools that provide automated insights it’s even simpler to get insights from data. Hopefully, this triggered an interest for you to learn more. In the next section we will take a look at what actually triggered the creation of IBI.
Trigger for IBI
On 26th January 2017, the day of the flight from Germany to Iceland, a notification from Google (Gmail) is what triggered the thought process about IBI (not named yet at that point in time). The notification, if I remember correctly, was about how many hours before I should leave from home to be able to catch the flight. I had not made any specific calendar entries, so the notification must have been based on the flight booking information available in Gmail. My thought was “If a company is able to guide me based on my own data, why can’t I guide myself with my own data? Afterall, I am the master source of all my data, companies, no matter how hard they try, can only get a portion of my data”. Throughout the journey and the Iceland trip I wrote down several thoughts and questions about individual data, companies using and misusing data,
224
Business Intelligence Demystified
individuals not using their own data and not even aware that they can make use of their own data. The essence of these notes was that, almost every company values data as an asset, including data about us. Companies are using data to improve their services and products; however, we as individuals don’t value our own data as an asset. If companies can improve using BI, can’t individuals improve using the same? With these thoughts, since 19th Feb 2017, I have been strictly capturing my own data on daily basis and analyzing it regularly. I have regularly added new data visualizations and now use Google Data Studio dashboards for monthly and ad hoc review.
How to start with IBI?
As explained in the previous chapters, BI is a concept and a process, similarly IBI is also a concept and a process. IBI can be considered as a subtype within BI. There are no specifications or limitations of technologies or tools that can be used to implement IBI. It’s entirely up to the user to decide how they would like to implement it. As there is no need to share your data, information, or insights with anyone else, there is no need to agree with anyone else on process or data format. An individual can gather and accumulate as much data as practically possible about themselves using whatever means available and organize it in such a way that it can be visualized, and information and insights can be derived from it. The individual could use these insights to make changes to their life for self-improvement. This process is captured in the IBI process diagram in Figure 8.5 for better understanding:
Figure 8.5: IBI process
Introduction to IBI
225
Only for those who need some help to get started with IBI, a generic set of simple steps are provided in the next section followed by specific set of steps that I follow, which could be used as a reference.
Generic steps
The simple generic steps to get started with IBI are as follows: 1. Think and identify: Think through and identify data points that you would like to capture. These are data points that matter to you irrespective of whether it makes sense to others or not. 2. Setup processes for data capture: Setup processes to be able to capture each of these data points identified in Step 1. It could be a manual or automated process or a combination of manual and automated processes. For example, to capture timings of your sleep you may decide to capture it manually or use a sleep tracker watch that can track it with an app. Keep the process simple. 3. Setup processes for information and insights generation: Setup processes to generate information and insights from the captured data. Again, it could be manual or automated or a combination of both. For example, you may decide to visualize data and look for information and insights, or you may use an automated insight generation tool or a combination of both. Automate it to the extent possible. Once the above steps are completed, you are ready to start with IBI. The frequency of data capture (hourly, daily, and weekly) is your choice, however my recommendation is to capture data at the least on daily basis. The frequency of insights generation is also your choice.
Specific steps
The purpose of sharing the steps that I follow is to help those who would like to know or follow tried and tested approach for getting started with IBI. Over the years I have made multiple changes and these steps have been updated to the one’s that I follow now. Some of the steps are daily, some monthly, and some less frequent as indicated in each of the points. To be able to follow the following mentioned steps, download the generic IBI template from https://www.akvkbi.com/2019/12/ibi-newtemplate-excel-version.html and customize it as per your needs. 1. Daily data capture: Update the IBI data capture sheet (Google Sheet) every morning investing around 7 to 10 minutes to fill the data. I fill the details of sleep and dreams for the current day and all other details (for example, ideas, time spent on various work topics, movies watched, physical exercises, financial transactions, etc.) for the previous day, that is, for example, if today
226
Business Intelligence Demystified
is 30-May-2020, then I fill the row that has 30-May-2020 with dream and sleep data and then fill the row that has the date 29-May-2020 with details of the previous day except for the dream and sleep details as I would have already filled it on the previous day. A partial screenshot of my IBI data capture sheet for 29th and 30th May 2020 is shown in Figure 8.6.
Figure 8.6: IBI data capture sheet. Note: Partial screenshot of two days – some of the columns are intentionally hidden
2. Monthly review: Review of the data for the last month is done on 1st of every month. As part of the review, I go through the charts in both Google Sheets and the dashboards in Google Data Studio. If there are any data entry errors (sometimes wrong column is updated by mistake instead of another column), these are corrected. For example, the dream details are described in the “connected” column instead of the “dream details” column, such issues are fixed.
3. Monthly backup: Save a copy of the IBI data capture sheet for backup purposes. A copy is saved in cloud and a copy is saved locally. The oldest copy is deleted. 4. Ad hoc changes: Include new columns as and when required in the IBI data capture template. Hide columns which are no longer relevant. I don’t delete the columns.
5. Milestone analysis: A detailed data analysis is carried out whenever I reach a milestone, for example, completion of 1 year, 500 days, 2 years, 1000 days, and so on. The next milestone will be when I complete 4 years of data capture on 19th Feb 2021.
6. Understanding and decisions: As and when some trend or pattern is noticed there is more understanding of oneself. Decisions are made and changes are made as required for improvement.
Learnings Some of my learnings based on the experience of data capture are as follows:
• The data capture activity should be done every morning (or whenever your day starts – based on your shifts), else there are chances of forgetting relevant details of the previous day as the events of the current day take over.
Introduction to IBI
227
• Those ~10 minutes of data capture should be done without a break. All possible distractions should be avoided, this helps in recollecting all of the events of the previous day. • The data capture activity in itself is very useful. As we try to recollect the events of the previous day, many a times, it reminds us of the tasks or commitments which otherwise might have been forgotten. • Previous days’ events are better registered in memory and retained for a longer time as we consciously try to recollect events from the previous day. • Filling data on daily basis is a motivation to do better each day. For example, if for a positive data point (for example, did I learn something?) if I fill it with zero for not having learnt anything on a particular day it reminds and motivates me to learn something on the current day. • A lot of ideas which I would have otherwise forgotten, I was able to not only capture but also take to completion. Capturing ideas can often lead to coming up with new ones. Capturing (writing it down) ideas also motivates to pursue those ideas. • For some of the data points we may have to depend on devices. For example, for data points such as number of hours of deep sleep or for number of steps walked, I take the data from the smartwatch whereas for tracking data points such as number of mistakes made in a day or happy news, there are no trackers except for manual input into the spreadsheet. • Needs discipline to do it every day and patience to wait for the results. • Should resist the temptation to update with incorrect data to make the data look good. There’s no point in making the data look good with false data, the whole point of IBI is to improve ourselves, so data must be captured to reflect the as-is status without any thought about how it will turn out in the chart. Hope that the steps together with the learnings helps you in your journey with IBI.
Conclusion
In this short chapter you have been introduced to IBI. As the aim was only to trigger an interest in IBI, a detailed explanation was intentionally avoided. For more details, refer to my blog. There are also links to videos available in the same blog that showcase the process of daily data capture and data visualization. There are lots of benefits of IBI not just for individuals but for the larger community. If each of us use IBI, we will know ourselves better and improve individually and as a community. It’s time (in fact it’s already late) that we value our own data as an asset and make
228
Business Intelligence Demystified
use of it. I have come across several interesting findings from my own data but as not everything can be shared publicly it hasn’t been shared here. I strongly recommend you try it out and realize the benefits. Hope this chapter has triggered your interest in learning more about IBI and using it. In the next chapter we will return to the main BI topic. The next chapter deals with the topic of BI architectures.
Points to remember
Some of the key points to remember are listed as follows: • IBI is self-reflection with consciously captured own data. • The word business in Individual Business Intelligence does not refer to a commercial establishment but one’s area of concern. • IBI is a concept and a process—any tool, technology, or process can be chosen to implement IBI.
Multiple choice questions
1. Which of these data points can be collected as part of IBI? a) Quantitative data like money spent, earned, number of hours slept, etc. b) Qualitative data such as details of dream, reason for happy or sad, etc. c) Either qualitative or quantitative d) Both qualitative and quantitative 2. The data points captured and accumulated by an individual should make sense to whom? a) To a community b) To a business or organization c) To the individual d) Friends and family 3. To implement IBI a) We need to use paid software applications. b) We need to spend more than 1 hour per day. c) We can use free to use software applications. d) We need expensive tracking devices.
Introduction to IBI
229
4. How frequently should data be analyzed as part of IBI? a) Depends on the individual b) Daily c) Weekly d) Monthly 5. What is true about the data captured as part of IBI? a) All data captured is useful immediately b) Some data could be useful now and some in the future c) A pattern or a trend has to be there, else there is no use d) All data is useful only in the future 6. How many data points should be captured and accumulated as part of IBI? a) Has to be less than 10 b) Should be more than 100 c) Between 10 to 100 d) Depends on the individual 7. Data is an asset, this is applicable for? a) Both organizations and individuals b) Organizations c) Individuals d) Only corporates
Answers 1. d 2. c 3. c
4. a
5. b 6. d 7. a
230
Business Intelligence Demystified
Questions
1. What does business in the context of individual business intelligence mean? 2. What are some of the data points that you would like to capture? Note: This is only for your consideration and not to be shared with others. 3. Some people have the habit of writing a diary (hard notebook) on daily basis, can it be used by the individual to understand trends and patterns? What are the challenges? 4. What is the point in collecting data that may provide interesting but not necessarily currently useful information?
BI Architectures
231
Chapter 9
BI Architectures A
fter a short diversion from the main BI topic, we will now focus on BI architectures. There are some misconceptions such as BI architectures are complex and BI architectures should always include a data warehouse or data marts or other analytical data repositories. We will demystify these points in this chapter. You will be introduced to multiple examples of BI architectures, starting from a simple BI architecture with very few components and we will gradually increase the number of components to this architecture. Pros and cons of the architectures are also covered in this chapter. We will focus on showcasing various BI architectures to convey that all BI architecture doesn’t have to be complex and doesn’t always have to include a data warehouse. After going through this chapter, you should be able to understand which architecture is used in your organization if there is already a BI solution in place. It will also help you understand which architecture you should choose for your BI solution.
Structure
This chapter is structured as follows: • BI architecture - Explained o Examples of BI architectures Data-in-place BI architectures
232
Business Intelligence Demystified
Data repositories-based BI architectures
o Sample BI architectures
Objectives
Learning what a BI architecture is and taking a look at various examples of BI architectures. Understanding the pros and cons of various BI architectures.
BI architecture - Explained
BI architecture is the proper arrangement of all the components of a BI solution to fulfil the BI requirements of a business unit or an organization. An architectural diagram provides a visual overview of these components and the relationship between these components of the solution. Architectural diagrams could be at a macro level (specific tools are not mentioned), at a micro level (specific tools are mentioned), or at an in between level where some of the components could be abstracted or grouped whereas other components could be more detailed and represented separately depending on the target audience. As every organization’s BI requirements are different, BI architectures deployed by them are also different. Some organizations may start with the most basic BI deployment and gradually add on more components as their requirements grow. BI components are selected primarily for four main logical layers—data acquisition layer, data storage layer, data processing layer, and the information presentation layer. The components for these layers are selected based on requirements, budget, and tools availability. Technological approaches such as ETL, ELT, data virtualization, and data repositories are selected based on requirements and constraints. For example, if there is no need of historical data (point in time data or versioned data) and if the data source is able to handle the analytical queries, then probably there is no need to introduce an additional data repository such as a data warehouse. Therefore, there is no fixed number of components or a specific component that should be part of every BI architecture.
Examples of BI architecture
The architectural diagrams provided under this section are not to be considered as reference architectures or best practices. These diagrams are not meant to be comprehensive and are simply examples of BI architectures to showcase that a BI architecture doesn’t always have to be complex. We will cover BI architecture in order of increasing level of complexity and increasing number of components. These architectures can be grouped under two headings—data-in-place BI architectures and data repository-based BI architectures.
BI Architectures
233
Data-in-place BI architecture In this category of BI architectures, data resides only at source. The data stays where the data is initially generated or stored initially and there is no other copy of the data specifically for BI purposes. Pros and cons of data-in-place BI architecture when compared to data repository-based architecture are as follows: Pros:
• Takes less time to implement. • No redundant data.
• No additional storage requirement. • Lesser number of components. Cons:
• Negative performance impact on the source application in case of a live connection. • Data in the source is accessed as many times as the reports, dashboards, etc., are refreshed. • Data has to be cleaned and transformed on the fly every time. • Usually, data changes and historical data is not available.
• Not suitable for analytical operations because data is not modelled (organized/prepared) specifically for analytical operations. Let’s go through a few examples of data-in-place BI architecture and the pros and cons respective to those example architectures. Architecture 1:
In some of the basic BI solution architectures as shown in Figure 9.1, static reports such as pdf or html files are created directly from the data source and shared with the users.
Figure 9.1: BI Architecture 1 – most basic
234
Business Intelligence Demystified
Static reports, as the name indicates do not reflect live data but only a point in time data. For example, if reports are created at 9 AM on daily basis then the reports reflect the status of the data only up until 9 AM of that day. Any changes made between 9 AM of that day until next day 9 AM are not reflected until a static report is created the next day at 9 AM. This architecture could be an interim solution. For example, let’s assume that Walget introduced a new employee work hour tracking system. As it would take some time (weeks/months) before the new system’s data can be integrated into the data warehouse, reports are created directly on the data source (work hour tracking system’s database) every night (when the time tracking system is least used) and the reports are shared with the BI users (managers in this case) on a daily basis. The pros and cons of Architecture 1 are as follows: Pros:
• No performance impact on the data source when BI users are actively using BI reports as the reports do not maintain a live connection to the data source.
• Useful as an interim solution as long as the source application’s performance is not impacted during report creation. Cons:
• Data in the report may not be in sync with the data in the source.
• There could be performance issues on the source system during report creation process, especially when data grows in size. Architecture 2a:
As shown in Figure 9.2, users may be provided with an interactive data visualization tool instead of static reports.
Figure 9.2: BI Architecture 2
BI Architectures
235
The interactive data visualization tool (DVT) could either be an out-of-the-box tool or an in-house developed tool. In this setup, a live connection is maintained between the data visualization tool and the data source, and therefore, there can be a performance hit on the data source (source application that writes data to the data source). Continuing with the previous example, in this case it would mean that the analysts at Walget have been provided with an interactive data visualization tool which is connected to the database of the employee work time tracking system. The pros and cons of Architecture 2a are as follows: Pros:
• Data in the report is in sync with the data in the source. • Supports interactive analysis.
Cons:
• Negative performance impact on the data source when BI users are actively using the BI reports as the data visualization tool maintains a live connection to the data source.
Architecture 2b:
Architecture 2b is a combination of both Architecture 1 and Architecture 2a as shown in Figure 9.3:
Figure 9.3: BI Architecture 2b
Architecture 2b could be useful in some cases, for example, when external users (customers) should have access only to the static reports, whereas the internal users
236
Business Intelligence Demystified
(employees) can have access to both the frontend tools (static reports portal and the interactive tool). And for another use case, when internal users such as analysts would prefer using an interactive tool while some of the management users would prefer either receiving a daily static report over email or prefer viewing static reports on a portal. The pros and cons of Architecture 2b are the following: Pros:
• Satisfies the requirements of different set of users, for example management and analysts or internal users and external users.
Cons:
• Output from the two tools may not be in sync with each other. • Live connection from data visualization tool could impact source application performance negatively.
Architecture 3:
Architecture 3 includes a reporting and analytics platform (RAP). RAP supports various types of output (static reports, pixel-perfect reporting, ad hoc data analysis, and data visualizations) as well as multiple delivery methods (for example email, FTP / SFTP, portal-based delivery) and other features such as role-based access, different set of frontend tools for developers and users. A representation of Architecture 3 can be seen in Figure 9.4:
Figure 9.4: BI Architecture 3
The pros and cons of Architecture 3 are as follows: Pros:
• Satisfies requirements of different set of users using a single platform, user rights, and permissions stored in one platform.
BI Architectures
237
• Usually only a web browser is required to access the services (creation of reports, dashboards, etc.) Cons:
• Live connection can impact source application performance negatively.
Architecture 4:
In each of the preceding architectures we have considered only one data source. In architecture 4, there are multiple data sources but data from different data sources are not integrated. Different set of users may access different data sources through the same frontend tool. Whenever data has to be combined users will have to combine the data themselves. Architecture 4 is similar to Architecture 3 but has multiple data sources as shown in Figure 9.5:
Figure 9.5: BI Architecture 4
The pros and cons of Architecture 4 are as follows: Pros: • Single platform to connect to different data sources. Cons: • Data is not integrated. Therefore, users are exposed to several data sources and each data source may need a different type of connection. Users need to know the technical details of various sources.
238
Business Intelligence Demystified
Architecture 5:
In this architecture, data is virtually integrated and provides users with an integrated data model as shown in Figure 9.6.
Figure 9.6: BI Architecture 5
The pros and cons of Architecture 5 are as follows: Pros:
• Users don’t have to know the technical details of the underlying data sources. • A common data model is exposed.
Cons:
• Data cleansing and transformation takes place on the fly every time and therefore usually more time is required for report refresh. • Lack of historical status of data. Only current state of data is available.
Data repository-based BI architecture In this group of BI architectures, apart from the data residing in the original data source, one or more copies of the data exists in one or more data repositories such as data warehouses, data marts, data lakes, data vaults, ODS, and other reporting databases. The usual pros and cons of data repository-based BI architectures are as follows: Pros:
• Least impact on the source application as data in the source application is accessed only once to extract the data.
BI Architectures
239
• Data cleansing and transformation is carried out only once. • Data changes are tracked and historical status of data is available. • Optimized for analytical operations. • Data can be modelled for specific requirements with no impact to source applications. • Auditability. As snapshots of data (point in time data) is stored, at any point information published in the past can be exactly reconstructed and verified. Cons:
• Takes more time to implement. • Consumes more time in maintenance. • Multiple copies of data, therefore increased storage requirement. • Increased number of components.
Let’s take a look at few example data repositories-based BI architectures, and their pros and cons respective to that example architecture. Architecture 6:
In this architecture, there is an additional data storage layer (integrated data layer), data is extracted from different data sources and loaded into the integrated data layer where data from various sources is integrated. For example, the transaction data of Walget’s customers available in the billing system is combined (integrated) with cashback or coupon data. Architecture 6 is depicted in Figure 9.7:
Figure 9.7: BI Architecture 6
240
Business Intelligence Demystified
The extraction and loading of data may happen in near real-time or in batch mode (for example hourly or daily). Integrated data doesn’t necessarily mean that data is prepared (modelled) to suit reporting and analytics. The pros and cons of Architecture 6 are as follows: Pros:
• Lesser number of data layers to maintain compared to multi-layered architectures. • Lesser storage requirement compared to multi-layered architectures.
Cons:
• Raw data is available only in the source; therefore, users do not have access to the source and therefore no option is available to carry out troubleshooting data issues or data mining. • Data is not modelled (for example, dimensional model) to make it fit for reporting and analytics.
Architecture 7a:
In this architecture, a 2-layered architecture is used. Data from multiple sources is extracted and stored as-is in the raw data layer (landing zone) and then after cleaning, transforming, and integrating, the data is stored in a prepared data layer fit for reporting and analytics. Access to the 2 layers are provided based on roles. Some users get access to both the layers, and others only to one of the 2 layers. Architecture 7a is depicted in Figure 9.8:
Figure 9.8: BI Architecture 7a
Depending on the requirements, raw detailed data may be stored in the staging/raw data layer in 3 possible ways:
BI Architectures
241
1) Deleting data immediately or before next run, that is, after the required data is propagated to the prepared data layer. 2) Deleting data periodically, i.e., every month or every year. 3) Data is stored permanently, that is, the data from raw data layer is never deleted. The pros and cons of Architecture 7a are as follows: Pros: • Availability of raw data for troubleshooting and data mining purposes. Cons: • More data layers to maintain. • More storage requirement.
Architecture 7b:
Similar to Architecture 7a, data in Architecture 7b is stored in multiple layers—3 layers in this case. Data extracted/streamed from different sources is stored as-is in the raw data layer. Data is cleaned and stored in the cleaned data layer before it is integrated and prepared for storage in the prepared data layer for reporting and analytical purposes. Architecture 7b is depicted in Figure 9.9:
Figure 9.9: BI Architecture 7b
The pros and cons of architecture 7b are as follows: Pros: • Separate storage layers for raw data, cleaned data, and integrated data in raw data layer, cleaned data layer and integrated data layer respectively.
242
Business Intelligence Demystified
Easier for troubleshooting data issues. Clear logical separation too.
• Tools that do not need cleaned and integrated (prepared) data but need only clean data can be given access to the cleaned data layer. Cons: • With each increasing layer of data storage apart from more storage requirement more time is required for maintenance. Architecture 8:
When there are multiple BI solutions at one level of corporate hierarchy and a BI solution is required at a higher level there are three architectural approaches that can be considered. This is easier to explain with an example. Assume that different departments of a corporate has its own BI solution and now a BI solution is required at corporate level. There are three approaches to build the corporate-level BI solution: 1) Consider the departmental analytical data repositories as the sources and build a consolidated data warehouse on top of those departmental data depositories. 2) Build an independent corporate data warehouse using the same data sources as the department-level data warehouses. 3) Data virtualization – Data remains in the departmental data warehouse and is combined virtually on the fly during report execution. As we have seen in earlier chapters, a corporate could end up with multiple BI solutions because of multiple reasons such as acquisitions and mergers, multiple departmental BI initiatives, or organizational politics. Here, we will consider a dependent consolidated BI architecture. Architecture 8 is depicted in Figure 9.10:
Figure 9.10: BI Architecture 8
BI Architectures
243
The pros and cons of architecture 8 are provided as follows: Pros:
• Corporate or global information available through one BI solution
Cons:
• Dependency on departmental BI solutions
Sample BI Architecture
In Chapter 1: What is Business Intelligence, a sample architecture for a contemporary BI solution was provided, the same is used here for explanation purposes in Figure 9.11:
Figure 9.11: BI Architecture 8
The sample BI architecture attempts to fulfil requirements of different groups of users such as management, analysts, consultants, or domain experts by providing access to both the prepared data layer, raw data layer and direct access to the data sources based on roles and needs. This architecture combines the best of both worlds, that is, it has a combination of data-in-place architecture (for example, App Analytics) and data repository-based architecture. Data from different sources are extracted and loaded as-is into the data lake (raw data layer) on a daily or more frequent basis or near real-time. Data is then cleaned, transformed, integrated, and loaded into analytical databases (prepared data layer) fit for reporting and analytics purposes. A semantic layer which is part of the reporting and analytics platform maps technical fields (for example column names in database tables) to business centric metrics,
244
Business Intelligence Demystified
attributes, filters, derived metrics, custom groups, etc. Information is provided through multiple delivery channels and accessed by different tools.
Conclusion
In this chapter you have been introduced to BI architectures and exposed to a set of BI architectures. By going through these it should be clear now that a data warehouse (or any additional data repository other than the data source itself) is not a mandatory component in all BI architectures and that all BI architectures need not be complex. The number of components included in the architecture are driven by requirement. It is one of the responsibilities of a BI architect to determine the best architecture that meets the requirements in consideration with the organizations’ needs, budget, limitations, and constraints. In the next chapter, we will clarify some of the misconceptions about BI technologies, tools and concepts. Note: For details about BI architecture terminologies such as hub-and-spoke, corporate information factory (CIF), or enterprise bus architecture refer these books.[11][13][39]
Points to remember
Some key points to remember are as follows: • There are no fixed number of components that should be included as part of all BI architectures. • Based on whether data is stored in additional data repositories or not, BI architectures can be grouped into data repository-based architectures or data-in-place architectures.
Multiple choice questions 1. Which of these statement(s) is/are true?
a) Every BI architecture should include a data warehouse b) Every BI architecture should include a data mart c) Every BI architecture should include a data lake d) BI architecture may or may not include a data warehouse, data mart or a data lake 2. BI architecture diagrams include specific tools: a) Yes, always b) No, never
BI Architectures
245
c) Yes, when a tool is a fixed component d) Yes, when a tool could be potential component 3. Data-in-place BI architecture a) Usually takes more time than data repository-based architecture b) Usually takes same time as data repository-based architecture c) Usually takes less time than data repository-based architecture d) There is no comparison 4. Data virtualization is a technique that can be grouped under a) Data-in-place architecture b) Data repository-based architecture c) Both of the above d) None of the above 5. Which architecture is in general better suited for reporting and analytics? a) Data-in-place BI architecture b) Data repository-based BI architecture c) Both d) Data-in-place for reporting and data repository-based for analytics
Answers 1. d 2. c 3. c
4. a
5. b
Questions
1. What is BI architecture? Draw an architectural diagram for a BI solution at a corporate level, assume that this corporate already has multiple BI solutions at each line of business, and is interested in pursuing data virtualization. 2. Justify why it is not mandatory for every BI architecture to include a data warehouse or any other data repository?
246
Business Intelligence Demystified
3. What are the benefits of data-in-place architecture over data repositorybased architecture? 4. What are the pros and cons of virtual data integration over integration based on data repository (making copies of data)?
Demystify Tech, Tools and Concepts in BI
247
Chapter 10
Demystify Tech, Tools and Concepts in BI A
s stated in the Chapter 1: What is Business Intelligence, BI is not limited by technologies or tools. On the contrary, the more technologies and tools there are, the better it is as it offers more choices when implementing BI solutions. Quite a lot of misinformation has been spread about which technologies, tools, or concepts are used in BI and which are not. For example, the notion that in BI there is no usage of machine learning is not correct. This chapter intends to clear out such wrong notions related to technologies, tools, and concepts used in BI. The number of technologies, tools, and concepts used in BI solutions are so many that it would require an entire book or two to go through each one of them in detail. So, to keep this chapter within the scope of this book, that is to demystify, the focus is limited to explaining only those topics that have been misunderstood or are confusing while the rest are only briefly described. Topics including difference between a data visualization tool and a reporting and analytics platform, the long pending confusion about the concept of data marts and data warehouses, and the myths about data lake are covered. It’s attempted to provide simple and clear explanations such that even a business owner/manager, who may not necessarily be familiar with BI, is also able to understand it and then enables them to have meaningful conversations on these topics with different stakeholders such as BI vendors, business analysts, and BI/data architects.
248
Business Intelligence Demystified
Structure
This chapter covers the following topics: • Technologies and tools o Technologies commonly used in BI o Tools commonly used in BI o Why so many technologies and tools used in BI? o Where’s the boundary of BI? o DV tool vs BI RAP Data visualization tool BI reporting and analytics platform o ETL vs ETL tool Pros of ETL tool over hand coding Cons of ETL tool over hand coding • Concepts o Concepts commonly used in BI o Data mart and Data warehouse Data mart Data warehouse o Data lake What exactly is a data lake? Claimed use cases of data lake Myths about data lake o Machine learning usage in BI
Objectives
Understanding the terms technology and tools in the context of BI and introducing some technologies and tools used in BI. Learning how to determine if a technology or tool is in scope of BI or not. Understanding various concepts used in BI and clearing out misconceptions related to these technologies, tools, and concepts.
Demystify Tech, Tools and Concepts in BI
249
Technologies and tools
The words technology and tools are intertwined, and are often used interchangeably, but not necessarily always correct. Consider the example of one of the most widely used tools in the world, Microsoft Excel. Microsoft Excel is a software, and a tool, it is also an implementation of spreadsheet technology which is on top of computer technology. An organization that was using paper-based notebooks for record keeping purposes, when they upgrade their record keeping work from paper-based notebooks to Microsoft Excel, they will claim “we have upgraded to technology-enabled solution for record keeping”. While Excel is a tool, it has been built based on multiple technologies, so it’s a tool, and it’s a technology product. And that’s why people end up using the terms tool and technology interchangeably. Let’s now look at some of the technologies and tools respectively in the context of BI.
Technologies commonly used in BI
Technology is the implementation of a technical concept in solving a specific challenge or a specific practical problem. Let’s consider an example in BI, to solve the problems associated with moving and storing of large amounts of data for data integration purposes, people came up with data virtualization technology which solved the problem. Similarly, every technology in BI has tried to solve one or more problems. Some of the commonly used technologies that are used in BI are provided along with a short description in Table 10.1: Technology Spreadsheet
Database
Short Description Software that allows users to input, store, edit, organize, compute, and visualize data in two-dimensional grids. MS Excel, Google Sheets are examples of the spreadsheet technology.
Database technologies organize large amounts of data in an electronic format. They provide governed ways to easily access, create, modify data into a database. There are various types of databases such as relational, NoSQL, and distributed database.
NoSQL database NoSQL database supports storage and access of data in a non-tabular or non-relational format. Supports flexible data models. Data model is updated as per the needs of the application. Useful for high volume and low data latency scenarios. In the context of BI, data latency is same as data freshness as explained in Chapter 3: Types of Business Intelligence. Distributed database
Supports storage of interrelated data at multiple sites (multiple machines) and processed locally at each site but appears as one integrated unit for the application that uses the distributed database
250
Business Intelligence Demystified
Columnar storage
In-memory databases OLAP
Stores data by columns instead of rows. The amount of data that is fetched per analytical query is usually less than what would be fetched from a row-based database. Therefore, the query performance is expected to be better. In a row-based database, columns that are not required for an analytical query are also fetched whereas in columnar storage, only those fields that are required are fetched.
Takes advantage of the memory (RAM) capacity. The entire database is loaded into the memory for real-time applications. Useful for realtime BI. Stands for online analytical processing. Instead of storing data in a two-dimensional structure, using OLAP, data is stored in a multidimensional way. This technology enables querying of data with respect to multiple dimensions (perspectives) both each on its own and combined.
Translytical data Supports both transaction processing and analytical workloads in stores a single unified database and therefore data is made available for analytics purposes in real-time as there is no data latency. Data warehouse appliance
A package optimized for data warehouse, which includes all of the required software and hardware including the storage integrated as a self-contained unit that is ready for deployment.
Data virtualization
Supports creation of a logical data layers that virtually integrate data from multiple data sources. Source data is not copied/moved and stays where it is. Users get a unified view of the data sourced from different applications and can access data in a uniform way instead of having to access each data source in different ways.
Data visualization Automated insights
Visualizing data graphically in objects such as tables, graphs, charts, maps, and scorecards. Presenting related information in dashboards, reports, infographics, etc.
Discovering valuable insights (trends, patterns and correlations) automatically from input data without users input (questions). The Explore feature in Google Sheets is an example for this technology. It uses machine learning on the input data to come up with possible interesting metrics and charts.
Demystify Tech, Tools and Concepts in BI Voice-enabled BI
251
AKA voice-controlled BI and conversational BI is a recent addition to the list of technologies used in BI. Instead of using clicks, drag and drop approach, using voice to pull up a report/dashboard, navigating, applying filters, creating metrics and dimensions, adding or modifying charts, etc. Note: Not to be confused with call analytics, which is analytics carried out on call (call center) data.
Cloud in general DWaaS stands for data warehouse as a service. BIaaS and other cloud and Cloud BI options have been explained in Chapter 3: Types of Business Intelligence. technologies (BIaaS, DWaaS, and so on.) Table 10.1: Some of the commonly used technologies in BI
Other commonly used technologies such as web technologies are used in in BI too.
Tools commonly used in BI
From the point of view of business users, the whole BI solution in itself is a tool using which information and insights are derived by users based on which decisions are made to improve the business. But from the viewpoint of BI developers, an ETL software, reporting and analytics platform, or data modeling software are referred to as tools as it helps them in developing the BI solution. Tools in general help us in our work. BI solution as a tool helps business users improve the business, whereas the software used by BI team members and others helps them in developing the BI solution. So, it should be clear that there are different levels of interpretation for the word tool. In this chapter, the piece of software that either forms the frontend part of the BI solution which the BI users access or that software that assist in the building of the BI solution is referred to as a tool. Some of the tools used in BI are listed in Table 10.2: Tools
Examples
Spreadsheet software
Microsoft Excel, Google Sheets, LibreOffice Calc
NoSQL databases
MongoDB, Apache CouchDB, Cassandra
Relational databases Data modeling tools
ETL/Data integration tools
Data virtualization tools
Oracle DB, SQL Server, PostgreSQL
Oracle Data Modeler, Toad Data Modeler
Informatica, IBM InfoSphere DataStage, SAP Data Services (BODS), Ab Initio, Pentaho Data Integration, Talend, Microsoft SSIS Denodo, Data Virtuality
252
Business Intelligence Demystified
Data visualization/ Dashboarding tools Reporting platforms
and
Scheduling tools
MicroStrategy Desktop, Tableau Desktop, Power BI Desktop analytics MicroStrategy, SAP BusinessObjects Business Intelligence, IBM Cognos Analytics Control-M, Apache Airflow
Ataccama DQ Analyzer, Trifacta
Data profiling tools
Knime, RapidMiner
Data mining tools
Collibra, Alation
Data catalog tools
IDE (Integrated development Oracle SQL Developer, PL/SQL Developer, pgAdmin environment) tools Code version control tools Collaboration tools
Git, Subversion, Mercurial
Atlassian Confluence, MS SharePoint,
Data warehouse automation Astera, WhereScape tools Project management tools
JIRA software, MS Project, MS Excel
Testing tools
QuerySurge
Data preparation tools
OpenRefine
Table 10.2: Some of the commonly used tools in BI
After going through both the lists, you should now have a better idea of the technologies and tools used in BI and the distinction between them. Notes:
• Languages such as SQL and Python are used as tools across multiple tool categories. For example, we can use SQL for data profiling, we can also use SQL for testing purposes. • ETL tools can also be used for testing and data validation outside of BI. • The lists in both tables 10.1 and 10.2 are not comprehensive. There could be many other technologies and tools too.
At this point, you might wonder, why so many different technologies and tools are used in BI? The answer to this question is provided in the next section.
Why so many technologies and tools used in BI?
BI has gone through several technological changes over the years. From the old times of IT/BI teams creating a set of static reports based on predetermined requirements, to self-service BI, to governed self-service BI enabling business users to create reports, dashboards, and analyze data on-demand and derive insights on
Demystify Tech, Tools and Concepts in BI
253
time to automated insights. From daily batch loads populating centralized data warehouse to streaming data into the data lake in near real-time. But all the while the core concept of BI has remained the same, to enable business users to take better decisions. For business users to make good decisions, they need reliable, consistent, consolidated, concise and timely information and insights. The fundamental challenge in BI is to transform the siloed, dirty, and unreliable data to reliable, consistent, consolidated, and concise information and to make it available in a timely manner, but without negatively impacting the performance of the source applications and at affordable cost. But, as we can understand, when the number of data sources increases, the time and effort required to consolidate data increases. With a greater number of data sources, there is more scope for data quality issues to occur, and obviously when number of data quality issues increases, the time and effort required to clean and transform the data to get reliable information increases. When the volume of data increases, the storage capacity requirements, processing power requirements, and also the time and effort required to process it increases. Any of these factors either on their own or combined could increase the waiting time to get to the insights, and it also increases the cost. At the same time there is pressure from the BI users, they want to get to insights as quickly as possible. Also, at the same time there is pressure from management to keep the costs on BI low. BI team faces pressure from all sides as depicted in Figure 10.1:
Figure 10.1: BI team experiencing pressure from all sides
Various technological evolutions and advancements in BI technologies and tools that we have seen, and we continue to witness everyday are trying to address
254
Business Intelligence Demystified
the aforementioned fundamental challenges. Any technological advancement or upgraded tool that would relieve some pressure of a BI team, enables faster information and insight generation from data, saves time of the business users, or reduces cost is gladly welcomed, tried and adopted. To add to it, with volumes of data processed in businesses exploding, new technologies and tools are entering the market to solve the data challenges and to exploit the potential hidden in the data. Thereby increasing the number of technologies and tools that can be used in the BI process. As there are so many technologies and tools that can be used in BI, to decide and select the right set of BI tools and technologies is one of the difficult tasks for an organization in its BI journey. For example, if you do not have good understanding of the different categories of BI tools, you may even end up selecting the best tool for a particular category, but the category itself may be wrong and thereby you may end up with a wrong tool for the job.
Where’s the boundary of BI?
As there are so many different technologies and tools that can be used in BI, who or what defines which is a BI tool or BI technology and which is not? Where exactly is the boundary of BI? As stated earlier, there is no limit to which technologies and tools can be used in BI. Businesses choose options that makes the most sense to them at that point in time. To drive home the point, in the future, hypothetically if it’s cheaper to employ robots instead of humans to clean data before loading data into a data warehouse, businesses may engage robots. That would mean robotics is used in BI. The point is, it’s still BI, regardless of which underlying technology is used or not used, it is the purpose that dictates which technology or tool comes under BI scope or outside of BI. The easiest way to determine if something is in the scope of BI or not is to check if a technology or a tool is useful in the process of deriving information and insight from data to enable decision makers to make better decisions to improve the business. If yes, then it’s in scope of BI.
DV versus RAP
There are multiple BI frontend tools such as data visualization tool (DV tool), reporting and analytics platform (RAP), data profiling tool, data preparation tool, and so on. Not many people are aware about the differences between a data visualization tool and reporting and analytics platform, and they usually consider both as same and use the terms interchangeably. Therefore, it necessitates to make it clear why DV tools and RAP look similar but are not the same.
Data visualization tool
Any tool that can be used for visualization of data, that is, creation of charts, tables, or maps, to visualize data can be grouped under this category. It could be commercial
Demystify Tech, Tools and Concepts in BI
255
proprietary tools such as MicroStrategy Desktop, Tableau Desktop, or Power BI Desktop which comes with easy-to-use GUI or open-source data visualization libraries such as D3.js, Chart.js, etc. The commercial proprietary DV tools are mainly meant for use directly by the end users such as data analysts or BI analysts without requiring any support from BI developers whereas data visualization libraries are meant for use by developers/engineers to setup data visualizations in web pages or applications as it requires some level of technical skill. The output, for example interactive charts built based on data visualization libraries and published on a website is then used by an end user.
BI reporting and analytics platform BI reporting and analytics platform on the other hand are more than data visualization tools. Of course, RAPs have data visualization component too, and it is mainly used for data visualization purposes. But they also include a whole list of other features that makes them more than a data visualization tool. Examples of RAP tools include MicroStrategy, SAP BusinessObjects Business Intelligence, IBM Cognos Analytics, Panorama Necto, Pentaho Business Analytics, etc. The list of features usually available in RAP tools and not available in data visualization tools are listed below. Note that some of the data visualization tools are catching up with the RAP tools so the lines are slowly blurring. • Web and mobile interface. • Features to collaborate and work on same reports/dashboards and share it with other users. • Authentication and authorization features. • Row and column-level security features. • Organization of created reports/dashboards per user and other folders (for example, per department). • Semantic layer or technical to business mapping layer, a metadata layer in which all technical fields are mapped to business language, tables are correctly joined such that users don’t have to bother about joins between tables. Semantic layer is the one that enables metadata-driven reporting. • Data federation – data from various sources can be combined to appear as one database. • Version history tracking of all the objects (reports/dashboards, metrics, dimensions, filters, groups). • Distribute reports through various channels such as emails, network folder, SFTP and FTP.
256
Business Intelligence Demystified
• Report bursting feature - Reports automatically created per selected field (report bursting field). For example, a report run for Walget results in one report per Walget store if the selected field for report bursting is Walget store field. • Pixel-perfect reporting. • Alerts and notifications. • Tools to promote objects from one environment to other environments, for example, from dev environment to test and prod environments. • Data lineage – ability to visually trace the transformations and source of each of the metrics, dimensions, and other objects. • Various APIs, for example APIs for embedding results (data visualizations). • Scheduling capabilities. • Tool usage data storage and reporting, for example, every report run, answers to prompts, report refresh times, performance of the tool, etc., are all logged in an internal database. This metadata and pre-created reports on top of this data can be used for monitoring, troubleshooting, auditing, and other purposes. • Administrative features such as creation and maintenance of users, roles, groups, backups, license updates, and so on. • SDK support for custom branding. • Developer tools – Tools that are usually used by the developers to develop reports, dashboards, attributes, metrics, filters, prompts, semantic layer, and cubes. In RAP simple reports/dashboards are created by business users directly in production environment whereas the complex reports/dashboards are developed by BI frontend developers in development environments and then promoted to production environment. DV tool is directly used by the end user. Some of the common features available in both RAP and DV tools are listed as follows: • Self-service capability - easy to use GUI with drag and drop features with no coding to analyze large amounts of data. • Interactive data visualizations with a wide range of charts. • Reporting and dashboarding. • Download to various formats such as pdf, Excel, HTML, etc.
Demystify Tech, Tools and Concepts in BI
257
• Data preparation. • Data blending – joining multiple data sources. • Automated and augmented insights. • Object search. • Support for additional chart libraries. • Works with various data sources both on-premises data sources and cloud data sources. • Prebuilt connectors for various data sources and formats. After going through the list, it should be clear why RAP is more than a DV tool and that is not correct to use these terms interchangeably.
ETL versus ETL Tool
Many do not understand clearly the difference between ETL and an ETL tool. We will demystify this. Extract, Transform and Load (ETL) is a concept and sequence of the processes, and is one of the ways of data integration. Whereas an ETL tool is mainly a developer tool that help developers to design and build ETL jobs. Let’s now get into the details of it. ETL jobs or pipelines or flows implement the process of data extraction from various sources, transform the data according to the requirements and load transformed data into target areas (tables, files, etc.). ETL jobs can be developed by hand coding using programming and data processing languages such as SQL, PL/SQL, Python, and Java or by using ETL tools such as Informatica, Ab Initio, IBM DataStage, Talend, and Pentaho. ETL jobs are not only used in BI projects but also in other projects such as data migration and data integration projects. In almost all of the BI projects that I have worked for customers, one or the other ETL tool was always used. In most projects all transformations were carried out in the ETL tool, and in very few cases the ETL tool was mainly used for orchestrating purpose while the entire ETL logic was maintained within the scripts. There are both pros and cons of using an ETL tool over ETL scripting.
Pros of ETL tool over hand coding Some of pros of ETL tool over hand coding is provided as follows: • ETL tools come with a graphical user interface with many features including drag and drop, point and click features, using which ETL developers can easily design, develop, debug, schedule, and test ETL jobs.
258
Business Intelligence Demystified
• Technical complexity is abstracted by ETL tool. Developers can focus on “what” rather than on “how”. That is, developers can focus on the business rules that needs to be implemented in the ETL process and don’t have to focus on how to write code for the technical aspects, for example, to connect to various data sources, to carry out some mathematical calculations, and to deduplicate. • Good ETL tools create optimized and performant code based on the ETL design developed by the developer. • ETL tools have built-in connectors for almost all of the known types of data sources. And some ETL tools provide features to create custom connectors for proprietary data sources. • Easier and take less time to train and ramp up developers in ETL tools compared to that in the case of programming languages. • ETL tools document the ETL jobs automatically and the mappings can be viewed, it can be used for impact analysis and for data lineage. • Easier to maintain and enhance ETL jobs using ETL tool. • ETL tools are supported by vendors and therefore through vendors, the customers can have access to best practices and lessons learnt from various organizations. • In BI projects, around 60% of the effort is spent on developing ETL jobs. By using a good ETL tool, the time to implement the solution can be drastically reduced.
Cons of ETL tool over hand coding Some of the cons of ETL tool over hand coding is provided as follows: • Most of the good commercial ETL tools are known to be quite expensive. To get pricing of these ETL tools in itself is a challenge as pricing is not transparent. • In general, higher skilled team members are required to be able to hand code production-grade ETL jobs. • Free to use ETL tools have only limited features and of course no vendor support. • For basic flows such as extracting data from table or flat file, transforming and loading into table or flat file, ETL tools are sufficient, however, when it comes to dealing with complex files such as highly nested XML files, ETL tools have limited features.
Demystify Tech, Tools and Concepts in BI
259
• Vendor lock-in. Once ETL jobs are developed using one of the commercial tools, to migrate to another is not simple and usually is a project on its own. Whether to use an ETL tool or use ETL hand coding or a combination of both is a choice best left to the decision of a BI architect of the project as many factors (budget, schedule, number of data sources, types of data sources, BI team capabilities, existing tool and tech stack, etc.) need to be considered before arriving at a decision. In general, in my view it would be good to have a combination of both. With that we come to an end on the topic of demystifying technologies and tools used in BI. Let’s start with the topic of concepts.
Concepts
There are several concepts used in BI. And as mentioned in Chapter 1: What is Business Intelligence, BI is also a concept. Concepts from various disciplines/fields are used in BI. Whether to use a concept or not simply depends on the requirements and the available resources. Under this section, after going through the list of some of the commonly used concepts in BI, we will go through a couple of misconceptions and demystify them. These misconceptions are selected based on the general trend that is noticed in the online Q&A forums, social media, and other articles. As there are many concepts used in BI, there are misconceptions too. For example, the notion that data lake will replace a data warehouse is actually a misconception as we will see in detail later.
Concepts commonly used in BI
There are several concepts used in BI. Let’s take a look at some of the commonly used concepts in BI as covered in Table 10.3: Concepts
Data warehouse
Data mart
Data vault
Short description
A data repository in which data from various data sources is integrated, versioned, and arranged in a specialized way such that it is easy for reporting and analysis purposes. For more details refer the data warehouse section in the following pages. Depending on the approach used for building the data warehouse, a data mart could be building block of a data warehouse, or a subset derived from a data warehouse. For detailed explanation refer the section on data mart in the following pages. A type of data model specifically architected for creating a data warehouse.
260
Business Intelligence Demystified
Data lake
Data lineage
Data wrangling Data purging ODS
Data staging layer
Information presentation layer
Semantic layer
Star schema
Data lake is a data repository expected to store wide variety of data, mostly in its raw or original form and provide means to authorized users to access the data for various purposes. For detailed explanation refer the section on data lake. Documentation (text or graphical or both) of the transitions of the data. Helps in understanding and tracking the source of the data and the transformations that have been applied on the data. For example, a metric or dimension is traced back to its sources through data lineage. Transforming or cleansing data to bring it to a useful form such that data can be analyzed, summarized, visualized, etc. Systematically deleting or archiving the data. As part of data purging the statistics about the deleted/purged data is maintained.
ODS stands for operational data store and is useful for operational reporting purposes. Similar to data warehouse, ODS stores integrated and subject-oriented data but does not store historic data, the data in ODS is current.
It is the layer between the data sources and the prepared data layer (data mart layer of data warehouse). The layer in which data is first loaded, combined, and cleansed. Data is either permanently stored or deleted periodically based on requirements. Data staging layer stores any type of data and is not limited to tabular data. The layer which BI users use to access, create, visualize, modify reports, dashboards, and so on.
A semantic layer is part of the reporting and analytics platform, it maps technical fields (for example, column names in database tables) to business centric metrics, attributes, filters, derived metrics, custom groups, etc. The semantic layer is usually developed by the BI development team. The tables are correctly joined such that users don’t have to bother about joins between tables. Semantic layer is the one that enables metadata-driven reporting. The reporting and dashboarding tools present the business side of the semantic layer to the business users through the information presentation layer. A type of dimensional structure/data model created as part of dimensional modeling process. In star schema, a fact table is connected to multiple dimension tables. The primary keys in dimension tables become foreign keys in the fact table. As there is one fact table in the center surrounded by dimension tables it appears as a star. Star schema is the most common type of model used across all domains.
Demystify Tech, Tools and Concepts in BI Snowflake schema
Hybrid schema
ETL
ELT
Data integration
261
Similar to star schema, this is also a type of dimensional model. When all of the dimension tables of star schema are normalized, a multi-level structure appears with base dimension table connected to the fact table and a chain of dimension tables connected to the base dimension table and therefore appearing as a snowflake. As storage costs have drastically reduced compared to when snowflake model was introduced, the benefits of star schema outweigh any savings of using snowflake over star. Therefore, best to avoid snowflake schema.
Combination of star schema and snowflake schema. That is, all of the dimension tables are not snowflaked, only a few of the dimension tables are snowflaked. Useful in cases were, at the base level of a dimension table there are 100s of attributes, and at the next lower level (child-level) there are only a few attributes but hundreds of records. In such cases if it is not snowflaked then the hundreds of attributes of the base level dimension will have to be repeated for hundreds of records at the child-level. Assume dimensional modelling for procurement process. There is one procurement record with 100s of attributes and for one procurement record there are hundreds of lotlevel records with each record containing only a few attributes. ETL or extract, transform and load is the process of extracting data from various data sources (files, databases, websites, etc.), transforming data as per the business and technical rules and loading data into the target datastore. This concept is not limited to data warehousing or business intelligence. It is also used for data migration and data integration.
ELT or extract, load and transform is the process of extracting data from various data sources, loading it first into the target datastore and then transforming the data as per the requirements. In ELT approach, the target datastore’s capability is used for the transformations. See the “Data is integrated” topic under data warehouse section.
Machine learning See the details under the Machine learning header. Metadata
Data mining Data latency
Metadata is information about data. It provides context to data. There are different types of metadata such as technical metadata, business metadata, process metadata and operational metadata. Explained in detail in Chapter 2: Why do businesses need BI.
In the context of BI, data latency is same as the data freshness explained in Chapter 3: Types of Business Intelligence.
262
Business Intelligence Demystified
Data, Information, Insight
Explained in detail in Chapter 1: What is Business Intelligence.
Operational BI, Tactical BI and Strategic BI
Explained in detail in Chapter 3, Types of Business Intelligence.
Descriptive, Predictive, Prescriptive analytics
Explained in detail in Chapter 2: Why do businesses need BI.
Table 10.3: Some of the commonly used concepts in BI
There are more concepts used in BI, some of which we have covered in the previous chapters and some we have not which are too detailed and beyond the scope of this book. Now let’s look at the three concepts (1. Data mart and data warehouse, 2. Data lake and 3. Machine learning in BI) in detail as these have been misunderstood and need to be demystified.
Data mart and data warehouse
For efficient reporting and analytic purposes, data is stored and maintained differently in reporting and analytics data repositories (RADR) compared to how data is stored and maintained for online transaction processing applications. Data marts, data warehouses, data lakes, and data vaults are examples of RADR. The best possible way to organize the data for reporting and analytics purposes is chosen based on the organization’s requirements, budget, limitations, constraints, etc. Even after couple of decades of data mart and data warehouse being in usage, there is still confusion about the meaning of data mart and data warehouse, and you will find that most people use these words interchangeably or incorrectly. It is attempted here to clarify the confusion. Simply, the reason there is confusion is because of different people are using the same words to mean similar but different things. Let’s see the details.
Data mart
One of the reasons why people get confused about data mart is because of lack of awareness about different types of data marts. The type of data mart depends on the approach used for building it. Let’s first look at the types of data marts. Dependent data mart: Some data experts refer to a data mart as an analytical data repository containing mainly summarized data created to meet specific needs of a
Demystify Tech, Tools and Concepts in BI
263
department. In this approach data marts are fed from a centralized data warehouse which contains granular data as shown in Figure 10.2:
Figure 10.2: Data marts fed from a data warehouse
As seen in Figure 10.2, in this approach data marts fall outside of the boundary of the data warehouse. Data marts depends on the data warehouse and each data mart contains a subset of the data in the data warehouse. This type of data mart is called a dependent data mart. This approach of first building a data warehouse and then building data marts is known as top-down approach. Independent data mart: As building a data warehouse and then building a data mart takes considerably longer time and is more complex, several developers/architects resort to building disconnected data marts in which the data is directly sourced from the original data sources and not from a data warehouse as shown in Figure 10.3.
Figure 10.3: Independent data marts
264
Business Intelligence Demystified
A collection of these disintegrated and independent data marts is wrongly referred to as a data warehouse by some groups. Both Bill Inmon (father of data warehousing) and Ralph Kimball (father of dimensional modeling), while they disagree on several other points related to data warehouse, agree on this one point that merely a collection of independent data marts cannot be considered as a data warehouse. However, some organizations continue to create independent data marts because of the perceived short-term advantage. The short-term advantage of independent data mart over dependent data mart is that it takes lesser time and lower cost to build, and the architecture remains less complex. Conformed data mart: While there is an advantage of choosing the independent data mart approach over dependent data mart in the short-term, there are multiple disadvantages too, such as duplication of transformations, duplication of data storage, lack of data integration, and too many data quality issues related to inconsistent data summaries. In the long run the disadvantages overshadow any advantage. These are the reasons why data architects have always discouraged the independent data mart approach and instead recommended the conformed data mart approach. In the conformed data mart approach dimensions (physical or logical) are shared across multiple star schemas as shown in Figure 10.4.
Figure 10.4: Conformed dimensions approach
Using the same dimension tables for the same domain/subject area ensures integration of facts/metrics across star schemas. For example, by using the same location dimension table, date dimension table, and customer dimension table across all star schemas, we will be able to report facts/metrics from multiple fact tables at the same level. A collection of these star schemas integrated by conformed
Demystify Tech, Tools and Concepts in BI
265
dimensions is referred to as a data warehouse in the bottom-up (Kimball) approach. Data mart layer is also referred as the presentation layer of the data warehouse. The point to emphasize here is that, in this approach a data mart is within the boundary of what is termed as a data warehouse, essentially the data marts are the building blocks of the presentation layer of the data warehouse as shown in Figure 10.5.
Figure 10.5: In bottom-up approach data mart is part of the data warehouse
In the bottom-up approach, the data marts not only contain summarized data but also the transactional level (most granular) data. A data mart may contain subset of the data from the operational or transactional database; however, it can still contain transactional level data. For example, let’s say that a transaction record in the transactional database has 200 columns in it. For reporting and analytics purposes we may be interested in only 50 of those columns right now and we may be interested in another 30 of those columns in the future. So we would consider storing only 80 columns out of those 200 columns in the data mart. Even though we have only a vertical subset of the data, we still have the transactional level data in the data mart. As it can be seen, there are multiple approaches in which data marts can be created. Also data can be stored at different levels of aggregations in data marts. Even in the bottom-up approach discussed earlier, there is nothing that prevents anyone from loading the data mart with most granular (atomic or transactional) level data. A variety of combinations are possible but not many are aware of this fact. And this is why most of the people get confused about the concept of data mart. So, in general, there are two ways of understanding a data mart. Depending on how the data repository is built, it can either be considered as a building block of a data warehouse (in the conformed dimensions approach) or as an extension and subset of the data warehouse when fed from a data warehouse as in the case of the topdown approach. A data mart may contain most granular data or summarized data depending on the approach.
266
Business Intelligence Demystified
Data warehouse Now that the concept of data mart is clear, let’s delve into the concept of data warehouse and clear some of the prevailing misconceptions. According to Bill Inmon, “A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions. A data warehouse contains granular corporate data”. Ralph Kimball and Joe Caserta, a data warehousing veteran, defined data warehouse in the book The Data Warehouse ETL Toolkit as “A data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making”. As we can notice from the above two definitions given by the gurus of data warehousing, there are clear differences in what is called as a data warehouse. And in practice too, if you were to go through the architecture diagrams of multiple data warehouse implementations across organizations, you will notice that there are several variations of how organizations have implemented data warehouses. No wonder people are confused about what exactly should be called as a data warehouse. But one thing which is common is that a data warehouse is a data repository that supports decision making. In simple words, we could define a data warehouse as a data repository in which granular data from various data sources are integrated, versioned (historized), and arranged (presented) in a specialized way such that it is easy and performant for reporting and analytics purposes to support decision making. The entire process of designing and building a data warehouse, developing the ETL processes (sources to staging, and staging to integration layer and/or presentation layer), and data modeling of multiple layers including the presentation layer (usually dimensionally modeled) is referred to as data warehousing. Some people use terms business intelligence and data warehousing interchangeably. This is not fully correct. While a data warehouse certainly brings several advantages as previously explained, not all BI solutions need to have a data warehouse. Now let’s look at what most organizations try to achieve with a data warehouse as this will help you in getting a better understanding of data warehouse. 1. Data is centralized: As shown in Figure 10.2 and Figure 10.5, organization’s data for reporting and analytics purposes is centralized. Centralized does not mean that all of the data has to be physically located in the same place, it only refers to logical centralization. How does centralization help? Essentially, users don’t have to know where exactly the data is coming from and where it is physically stored, they don’t have to think twice on where to look for data for any of their data needs, they can access the data warehouse and make use of the available and permitted data. For example, the marketing department of Walget doesn’t have to know which system actually stores the in-shop purchases and which system stores data of online purchases. They can access the data warehouse and get the necessary information.
Demystify Tech, Tools and Concepts in BI
267
Without data centralization users would have to move from pillar to post searching for the required data, and thereby losing time and delaying the decision-making process. Centralization of data also helps in applying the data security policies and governing the data better. 2. Historical data is preserved: The transactional (operational) systems usually don’t have a requirement to store historical data. And additionally, to ensure that the operational systems performance is not degraded it is required to store only minimum data. However, for reporting and analytics purposes historical data is highly valuable. Therefore, by preserving the historical data in the data warehouse instead of operational systems there are dual benefits: 1) Data retention for legal, regulatory and analytics purposes 2) Avoidance of performance degradation of operational systems. For example, at Walget, ex-employees’ data is not required in the Employee Management System for current managers, however, for legal, regulatory and analytical purposes exemployees’ data needs to be retained. Ex-employees’ data can be offloaded from the Employee Management System, transformed and loaded into the data warehouse. 3. Data is integrated: The data in the transactional systems (source systems) store data in a way that works best for applications. Data in the transactional systems are application oriented. For example, at Walget, the purchases at physical stores are captured in one database, online purchases are captured in another database, promotions or campaign data is stored in another database, employee data is captured in another database, and so on. To answer questions that require data from more than one database, such as how many employees are also customers of Walget or how many customers shop both online and offline and take part in promotions? Data has to be integrated, this data integration is achieved in a data warehouse. Data integration is essential to enable users to get a consolidated view of the business. Just storing data from multiple data sources in one single database is not data integration, records need to be accurately connected for data integration. Different source applications may retain data for different lengths of time (for example, application A has data for the last 5 years whereas application B has data for last 10 years), updates record at different frequencies (for example, application A updates data on transactional basis whereas application B updates data on daily basis), stores data based on different specification versions (for example, application A stores data conforming to specification version 1.0 whereas application B stores data conforming to specification version 2.0), and stores data at different levels of granularity. As part of data integration in the data warehouse, data inconsistencies are removed, and necessary steps are implemented to make data coming from different applications compatible such that users are able to carry out meaningful analysis.
268
Business Intelligence Demystified
4. Enables analysis on large sets of data: In transactional systems, data is optimally stored to enable faster processing of operations (inserts, updates, and deletes) at individual record/row level. Whereas for reporting and analytics purposes, the queries or other programs are run on large sets of data in a data warehouse to be able to derive information and insights. For this purpose, data is modeled, especially dimensionally modeled accordingly. 5. Frees up transactional systems: Transactional system (source systems) are freed up from all analytics and reporting loads and therefore the performance of the transactional systems is not degraded. Maintenance of the transactional systems becomes easier as it can remain light weight and not be burdened with non-transactional (reporting and analytics) requirements. To call a RADR as a data warehouse or a data mart, we must first understand the approach that is used for building the RADR and understand the scope of the data it contains. A data warehouse may exist without a data mart, that is, in the topdown approach during the starting phase. But when a data mart exists, whether in top-down approach or in bottom-up approach, a data warehouse exists. In case of bottom-up approach, the first data mart is itself the data warehouse, and in case of top-down approach, the data warehouse is available before the first data mart is created. After having gone through the details of data mart and data warehouse, let’s again look at the misconception that was discussed in the last chapter, is a data warehouse mandatory for a BI solution? Even with the all the characteristics and benefits mentioned in the aforementioned paragraphs we cannot conclude that a data warehouse is mandatory for a BI solution. But we can surely say that a data warehouse is a very important component within a BI solution, and there are several advantages of having a data warehouse in the background of the BI solution.
Data lake
At the time of writing data lake is one of the most popular topics and one that most organizations are either in the process of initiating projects or are already in progress in building it. As with anything else related to data and BI, the term data lake also has several interpretations, lots of confusion and there’s quite a lot of misinformation spread about data lake. For example, when data lake was becoming popular, there were many claims that data lakes will replace data warehouses, now many years later, data warehouses still exists and many more are being built. Also, to showcase data lake as a solver of all problems, quite a lot of “non-existing” limitations were forced upon data warehouse. So, clearly there is a need to demystify what exactly is data lake and clarify how it does not replace a data warehouse in an unbiased way.
Demystify Tech, Tools and Concepts in BI
269
What exactly is data lake? Simply put, a data lake is a data repository that is expected to store a wide variety and large amount of data, mostly in its raw or original form, with means to authorize user access for various purposes. Now, let’s take a closer look and understand it without prejudice. What we can figure out is that data lake is actually not a new concept. It is simply a new name for an old concept, or at the most it is an extension of existing concept. It is based on the combination of the existing concepts of staging area and active archive area of a data warehouse. As shown in Figure 10.6 the staging area/landing zone and archive area that was earlier within the boundary of data warehouse.
Figure 10.6: Landing zone / Staging area and Archive area are part of data warehouse
270
Business Intelligence Demystified
Are now included as part of data lake as shown in Figure 10.7
Figure 10.7: Landing zone / Staging area and Archive area are part of data lake
Therefore, the boundary of a data warehouse has been shrunk and limited to include only presentation data or prepared data layer. When we look at the use cases of data lake in the following paragraphs it will become even more evident that data lake is a new name for staging area in a data warehouse environment in most cases.
Use cases of data lake Use cases of data lake is provided below. As most, if not all, of the use cases of data lake were already handled by a combination of staging area and archive area of a data warehouse, for each use case of a data lake, explanation is also provided as how it was already handled. 1. Data lake as a landing zone: This is one of the use cases of a data lake. Data from various data sources is first landed onto the data lake for further processing. Compare that to staging area in a data warehouse environment, a staging area fulfils multiple use cases, one of which is the landing zone use case. In all of the architectures which has a staging area, it is an intermediate layer between data source and the data warehouse presentation layer and is used as a landing zone. 2. Data lake as a centralized data repository: Data lake is used as a centralized (logically) data repository. Data from various sources is stored, or retained in its raw or original form. Compare that to DWH environment, in staging area of a data warehouse too, data from various data sources (files, databases,
Demystify Tech, Tools and Concepts in BI
271
message queues, and so on) is first loaded into staging area, and then the data is retained there, therefore it also acts as the centralized (logical) data repository. 3. Data lake stores the data forever. Data lake is meant to store data forever unless there are any requirements for deleting the data from the data lake. Compare that to DWH environment, in case of staging area in a data warehouse, for exactly how long data is retained depends on the requirements and is a design decision. In some of the designs, data in staging area is stored only up until end of current run or beginning of next run of the ETL jobs. In some designs, data in the staging area is retained for a temporary period, for example, for a month or so, in some other designs, data is never (unless there is a requirement) deleted from the staging area. So, the last design technique of staging area already handled this specific use case of the data lake. 4. Data lake for future data needs too. Data lake is used for storing data that is not only required now but also those that may be required in the future, that is, there may not be a clear and specific requirement as of now for some of the data, but it is expected that some requirements may come up in the future which may require that data. Let’s compare that to DWH environment. If you have worked in a data warehouse project, you would know, this is similar to how we have been taking advantage of the staging area of a data warehouse. Once a data source has been identified, for example, a database schema, all or most of the tables from the schema are extracted and loaded into the staging area whereas only those required tables and columns that are relevant for reporting and analytics are brought to the presentation layer (data mart) of the data warehouse. Selective extraction from staging area to presentation layer is done to minimize storage requirements and also to not degrade the report refresh time or query performance because of currently irrelevant data. When new requirements come up which require data currently not in the presentation layer but available in the staging area, these are then modeled accordingly and added to the presentation layer. 5. Data lake is accessible for users for analysis. Users are permitted to carry out analysis directly on the raw data available in the data lake and therefore users don’t have to wait until it is transformed and/or cleaned and made available in a data mart. Compare that to DWH environment, yes, it is true that usually business users were not provided access to the staging area mainly because of data quality issues such as reliability and inconsistency, however, if users want to access the data in the staging area being fully aware of the data quality issues that they may face, access can be permitted, there is no reason why the policies would allow access when it is called data lake and not allow when it is called staging area when both store raw data. In fact, in a couple of projects, some users (not only data analysts but also
272
Business Intelligence Demystified
business users) were permitted to access the staging area as they had good understanding of the business and the technical side. Basically, the point is that there is no hard rule that staging area shouldn’t be exposed to users who know how to use it. Staging area can also be made available to users for analysis purposes. 6. Older data from data warehouse is offloaded to the data lake. One of the use cases of data lake is its usage as an active archive. That is, the set of data that is older than the relevant time period is moved out from the presentation layer and maintained in the data lake to keep it for any ad hoc analysis purposes that may come up. Actually, this use case is also handled already in a data warehouse environment. In a data warehouse environment, any data that has to be removed from the presentation layer of the data warehouse is either saved in a separate table called archive table in the same layer or moved to another storage layer within the data warehouse infrastructure. Based on the above points, we can conclude that data lake is a new name for an old concept and the use cases of data lake are already addressed by the combination of staging area and active archive area in a data warehouse environment as long as data warehouse is not limited to a tabular or relational analytical data store. However, as the term data lake has already gained enough popularity and acceptance, we cannot simply ignore it, so it is important to have a good understanding and be aware about the realities. It is also important to be aware of some of the myths about data lake and data warehouse as we will see in the next section.
Myths about data lake and data warehouse The six myths noticed about data lake are listed below, and then each one is explained in detail. 1. Data lake contains most granular data whereas data warehouse contains only summarized data. 2. Data lake is a data repository that stores structured, semi-structured, and unstructured data whereas data warehouse stores only structured data. 3. Data lake is always cloud-based. 4. Data lake can be used to store entire organization’s data for future purposes without any pre-processing. 5. No vendor lock-in in case of data lake but data warehouse has vendor lockin. 6. Data lake replaces data warehouse.
Demystify Tech, Tools and Concepts in BI
273
Myth 1: Data lake contains most granular data whereas data warehouse contains only summarized data.
This is not true. Both are logical structures. What organizations store within these structures depends on requirements. Moreover, it depends on where you draw the boundary for a data warehouse. As seen earlier in Figure 10.2 and Figure 10.5, either the staging area (in case of Kimball approach) or staging and integration area (in case of Bill Inmon approach) contains the most granular data. And these areas are part of the data warehouse architecture. If the boundary of the data warehouse is limited to the presentation layer, the statement may be correct sometimes because the presentation area may or may not contain most granular data, but a welldesigned data warehouse is not only presentation layer. So, it’s incorrect to state that data warehouse contains only summarized data and only data lake contains most granular data. Data warehouse can contain most granular data too. Myth 2: Data lake is a data repository that stores structured, semi-structured, and unstructured data whereas data warehouse stores only structured data. Firstly, the terminologies such as semi-structured and unstructured data are misnomers because data in formats such as xml files, json files, emails, audio, video, images, tweets, sensor data, etc., none of these are unstructured, each of these data has a structure to which the data has to conform. For example an email has a subject field, sender email id, recipient id/s, message, delivery time, etc. Similarly, in a tweet, there is a specification to which any tweet has to adhere to, for example, currently the number of characters in a tweet should not be more than 280 characters. So there definitely is a structure, just that the structure is not a relational or tabular (rows and columns) structure. It would have been better to use terminologies such as tabular data and non-tabular data instead of structured and unstructured data respectively.
Now, coming to the original point, back in the days when the only type of data that was used for analysis was in a tabular structure, the data warehouse environment consisted of a relational database that could store the data in tables in rows and columns. Now that data is available in wide variety of data formats, a storage layer is required which can consume all varieties of data and support data analysis to be carried out. Moreover, with the speed at which data is generated from various data sources, and the increasing volume, there is a need to store more data in less time and of course at cheaper cost. So, while the tools and technologies change, the concept still remains. A data warehouse, especially the staging layer of a data warehouse, can contain all varieties of data. There is no restriction. Many people have wrongly understood a data warehouse as a physical, fixed structure and a relational database and therefore arrive at the myth that data warehouse can store only tabular data. Once, we understand the concept that data warehouse is actually a system and a logical structure, it becomes clear that data warehouse can store all varieties of data and is not anyway limited to tabular data.
274
Business Intelligence Demystified
Myth 3: Data lake is always cloud-based.
Some of the definitions of data lake by cloud-based data lake solution vendors mislead by implying that data lake has to be cloud-based. Whether data lake should be built on cloud or on-premises depends on the requirements of the organization in consideration with the estimated costs associated with the initiative. The costs would depend on how the organization plans to use the data lake. It is more of a financial decision and not a technology specification. It is absolutely incorrect to state that data lake has to be always cloud-based. Myth 4: Data lake can be used to store an entire organization’s data for future purposes without any pre-processing. With regulations such as General Data Protection Regulation (GDPR) in EU and similar ones in other locations already in force, it is almost impossible to not preprocess (for example, filter, mask, pseudo anonymize, anonymize, etc.) data before data is stored in a data lake. So, the notion that all data from all of the applications in an organization can be stored as-is (raw form) in a data lake is good only theoretically and actually not practical legally. For example, GDPR mandates that users’ consent should be actively sought specifying the purpose when user data is processed. Let’s say if Walget wants to store customer data for analytical purposes then it must actively seek consent from the customers for that specific purpose, and if some of the customers do not provide their consent for using their data for analytics purposes, then Walget will have no choice but to filter out those customers data before it is loaded into the data lake from the operational/transactional systems. This is just one of the many examples of why we cannot store all of the data without any processing in the data lake. There is always some processing that is required before data can be used for other purposes than the original purpose.
Myth 5: No vendor lock-in in case of data lake but data warehouse has vendor lock-in This again is a false statement that with data warehouse there is vendor lock-in and with data lake there is no vendor lock-in. Both data lake and data warehouse are logical structures/concepts. How organizations implement these will determine if there will be vendor lock-in or not. For example, if a data warehouse is built based on open-source software such as PostgreSQL or Greenplum Database how will that result into vendor lock-in? On the other hand if a data warehouse appliance is chosen or only proprietary software is chosen then yes there is vendor lock-in. Whether to use proprietary software or open-source is a choice and is equally applicable for both data lake and data warehouse. Myth 6: Data lake replaces data warehouse
No, data lake doesn’t replace data warehouse. It must be clear with the explanations about data warehouse and data lake, data lake is a replacement for what was earlier called as the staging area and archive area within the data warehouse architecture.
Demystify Tech, Tools and Concepts in BI
275
It is not a new concept, it’s a new name for an old concept. A data lake complements a data warehouse or it can be considered as a component within the entire data warehouse architecture. It is obvious that no important decisions can be made based on poor quality data (raw data), for many use cases where cleaned reliable data is required, a data warehouse is still very much required.
Machine learning usage in BI
Machine learning (ML) is a branch of Artificial Intelligence (AI). The concept of machine learning is that machines/systems can learn, that is, identify trends, patterns, and relationships from data independently without human intervention once data is provided for processing. ML models, simply put, are relationships or mathematical equations between data entities. ML algorithms creates models based on the existing/known sets of data. Once the model is created it can be used to predict results for future/unknown sets of data. As ML models are created based on existing data, when a new set of data is added, the ML models adapt accordingly. And then the prediction will be based on the new ML model. There are several applications of ML, usage of ML in BI process is one of them. Most people are still not aware that ML capabilities can be used in BI and that it’s actually already in use. In fact, many vendors have already added ML capabilities to their set of tools both in data integration tools and in BI RAP tools. Based on the trends we can assume that soon all BI tools will include ML capabilities. By incorporating ML capabilities into these GUI-based tools, BI users don’t need to know the details of the working of the machine learning models or how to build models, instead they can leverage the capabilities to discover hidden insights and focus their efforts on analyzing the insights rather than spending efforts generating it. They can use ML capabilities by simply using the easy to use drag and drop and point and click features of the tool. As mentioned earlier, automated insight generation capability is one such feature of usage of ML in BI. Hope this clarifies that you don’t need to be a data scientist or a developer to use ML capabilities, and that ML capabilities are very much in use within BI too.
Conclusion
BI has become a standard and essential tool for every organization. From large companies to SME businesses depend on BI for their information needs, for discovering insights, and supporting decision-making. BI will continue to exist as long as businesses exists because decision makers’ need information and insights. Regarding technologies and tools, there are several possibilities within the BI framework. It’s best for organizations to look into their own real needs and limitations and choose the best fit technologies and tools rather than copying what is most hyped or marketed. I am aware about companies that claim that they possess
276
Business Intelligence Demystified
advance analytics, machine learning capabilities, and being data-driven, and so on but in reality, they don’t even have a basic data infrastructure (not even a proper data warehouse) or basic descriptive analytic capabilities, forget about predictive and prescriptive analytics. I hope that this chapter together with all the previous chapters has given you a new perspective on how to go about in your BI journey, given you the clarity that was earlier missing, cleared all of your confusions about BI and has increased your knowledge about BI to the extent that you can now make right choices on anything related to BI.
Points to remember
Some key points to remember are listed as follows: • There are plenty of options with respect to technologies and tools that can be used in BI solutions. • Core concept of BI is to enable business to take better decisions. • There is pressure on BI teams from all sides such as, management, users, source system support teams, etc. • There are multiple types of data marts, and there are different schools of thoughts on this topic, and hence there is a confusion. • Data warehouses and data marts are logical structures. • RAP tool is much more than a DV tool. • Data lake doesn’t replace data warehouse.
Multiple choice questions 1. What is the core concept of BI?
a) To collect and organize data b) To execute changes in processes c) To enable business users to take better decisions d) To improve the quality of data 2. Which one of the below is a technology? a) Data warehouse appliance b) Data profiler
Demystify Tech, Tools and Concepts in BI
277
c) Data modeler d) Data generator 3. Which one of the below is a tool? a) Machine learning b) Data mining c) RAP d) Semantic layer 4. Which of these is not a known type of a data mart? a) Independent data mart b) Dependent data mart c) Conformed data mart d) Ready to use data mart 5. What is true about dependent data mart? a) Data from data warehouse is then loaded to dependent data mart b) Data from staging area is loaded to dependent data mart c) Data from source systems is loaded into dependent data mart d) All are correct 6. Which one of these below is true? a) Data mart has only summarized data b) Data mart is not part of a data warehouse c) Data mart can contain both summarized and most granular data d) Data cannot be built without building a data warehouse 7. Which of these is not correct? a) Data warehouses is only for structured data b) Data lakes have several use cases common with staging area of a data warehouse c) Data lake can be built on on-premises or in the cloud d) Data lake is used as a landing zone
278
Business Intelligence Demystified
8. Which of these is a RAP tool? a) Panorama Necto b) MicroStrategy Desktop c) Power BI Desktop d) MS Excel
Answers 1. c
2. a 3. c
4. d 5. a 6. c
7. a 8. a
Questions
1. Why selecting right tools and technologies in BI is a difficult task? 2. What is the fundamental challenge in BI? 3. Why BI teams are under pressure from all sides? 4. What is the point in collecting data that may provide interesting but not necessarily currently useful information? 5. Why is there still confusion about data mart, data warehouse and the relation between the two? 6. What is conformed dimension approach? Explain with an example. 7. Is data warehouse mandatory for a BI solution? Explain.
Abbreviations
279
Abbreviations T
here are only a few new abbreviations used in the book, rest of it are already in use. However, it has been anyway documented here to ensure that there is no confusion and to explicitly list the specific full form or expanded form that I have considered. ATM: Automated teller machine B2B: Business-to-business B2C: Business-to-consumer B2B2C: Business-to-business-to-consumer BA: Business analyst BI: Business intelligence BIaaS: Business intelligence as a service BIBA: Business intelligence business analyst BICC: Business intelligence competency center BO: Business Objects CAPEX/CapEx: Capital expenditure
280
Business Intelligence Demystified
CBIO: Chief business intelligence officer CAO: Chief analytics officer CDAO: Chief data and analytics officer CDO: Chief data officer CEO: Chief executive officer CFO: Chief financial officer CGCE: Centralized governance and centralized execution CGDE: Centralized governance and decentralized execution CIO: Chief information officer COE: Center of excellence or center of expertise CPO: Chief product officer CXO: Any of the C-level executives such as CEO, CIO, etc., not Chief experience officer CRM: Customer relationship management CTO: Chief technology officer DEV or Dev: Development (referring to development environment or dev team) DMRM: Dimensions and metrics requirements matrix DWBI or DW/BI: Data warehousing and Business intelligence DWH: Data warehousing / Data warehouse DBA: Database administrator DR: Disaster recovery DSS: Decision support system DV: Data visualization DVT: Data visualization tool EIS: Executive information system ETL: Extract, transform, and load ERP: Enterprise resource planning EU: The European Union
Abbreviations
FTE: Full-time equivalent FTP: File transfer protocol GDPR: The General data protection regulation KPI: Key performance indicator HR: Human resources IT: Information technology MDM: Master data management MDS: Management decision system MIS: Management information system MUS: Minimum usable solution NGDE: No governance and decentralized execution ODS: Operational data store OKR: Objectives and key result OLAP: Online analytical processing OLTP: Online transactional processing OOTB: Out-of-the-box, refers to the vanilla solution OPEX/OpEx: Operating expense PI: Performance indicators Prod: Production, usually refers to the production environment POC: Proof of concept POS: Point of sale RADR: Reporting and analytics data repositories RAM: Random-access memory RAP: Reporting and analytics platform RDBMS: Relational database management system ROI: Return on investment SaaS: Software as a service SFTP: Secure file transfer protocol
281
282
Business Intelligence Demystified
SGDE: Some governance and decentralized execution SLA: Service level agreement SME: Small and medium enterprises SSBI: Self-service business intelligence TCO: Total cost of ownership UAT: User acceptance testing USP: Unique selling proposition VP: Vice president
References
283
References 1. Power, Daniel J., "Decision Support Systems: Concepts and Resources for Managers" (2002). Faculty Book Gallery. 67. https://scholarworks.uni.edu/ facbook/67 2. Power, D.J. A Brief History of Decision Support Systems. DSSResources. COM, World Wide Web, http://DSSResources.COM/history/dsshistory. html, version 4.0, March 10, 2007 3. https://business-intelligence.financesonline.com/#history 4. https://www.cio.com/article/3290407/history-of-business-intelligence. html 5. Cyclopaedia of Commercial and Business Anecdotes https://play.google. com/books/reader?id=vqBDAAAAIAAJ&hl=en&pg=GBS.PA210 6. The BANKER’S MAGAZINE AND Statistical Register, 1850 https://play. google.com/books/reader?id=9S9AAQAAMAAJ&hl=en&pg=GBS.PA547 7. A Business Intelligence System by H.P Luhn https://web.archive.org/ web/20080913121526/http:/www.research.ibm.com/journal/rd/024/ ibmrd0204H.pdf 8. https://hexaware.com/blogs/the-business-intelligence-chasm/
284
Business Intelligence Demystified
9. http://wseas.us/e-library/conferences/2012/Porto/AEBD/AEBD-18.pdf 10. The Data Warehouse Toolkit, Second Edition, The Complete Guide to Dimensional Modeling by Ralph Kimball and Margy Ross 11. The Data Warehouse Toolkit - The Definitive Guide to Dimensional Modeling, Third Edition, by Ralph Kimball and Margy Ross 12. https://www.kimballgroup.com/1997/08/a-dimensional-modelingmanifesto/ 13. Building the Data Warehouse, Third Edition by W.H. Inmon 14. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by Paulraj Ponniah 15. https://en.wikipedia.org/wiki/Blind_men_and_an_elephant 16. https://www.forrester.com/report/Topic+Overview+Business+Intelligen ce/-/E-RES39218 last accessed on 30.03.2020 17. https://www.akvkbi.com/2016/10/all-about-business-intelligence.html last accessed on 30.03.2020 18. https://www.gartner.com/en/information-technology/glossary/businessintelligence-bi last accessed on 30.03.2020 19. https://en.wikipedia.org/wiki/Business_intelligence last accessed on 30.03.2020 20. https://www.quora.com/unanswered/Are-there-any-companies-thathave-over-100-million-US-dollar-yearly-revenue-that-does-not-usebusiness-intelligence-data-analytics-business-analytics-data-science last accessed on 03.04.2020 21. https://www.linkedin.com/posts/anoopkumarvk_do-you-know-ofa-company-in-this-generation-activity-6650771654681124864-vYtc last accessed on 03.04.2020 22. https://www.akvkbi.com/p/bi.html 23. https://www.sisense.com/glossary/prescriptive-analytics/ last accessed on 17.05.2020 24. https://support.office.com/en-us/article/excel-specifications-and-limits1672b34d-7043-467e-8e27-269d656771c3 25. https://www.publicbi.com/euproc last accessed on 7.06.2020 26. https://www.bbc.com/news/technology-52808177 last accessed on 10.06.2020
References
285
27. https://www.gartner.com/en/newsroom/press-releases/2018-12-06-gartnerdata-shows-87-percent-of-organizations-have-low-bi-and-analyticsmaturity 28. https://fadv.com/solutions/analytics-and-reporting/ last accessed on 28.06.2020 29. https://hbr.org/2006/01/competing-on-analytics last accessed on 03.04.2020 30. https://hbr.org/2007/03/my-big-bet-on-analytics last accessed on 03.04.2020 31. https://hbr.org/video/2386816175001/business-analytics-defined last accessed on 03.04.2020 32. https://corporate.target.com/article/2014/04/meet-the-target-analyticsnetwork last accessed on 08.07.2020 33. http://graphics.eiu.com/files/ad_pdfs/eiu_intel_sap_Business_ intelligence_.pdf 34. https://blog.zanzivar.com/2018/05/29/buscando-una-metodologiaagil-para-proyectos-de-bi-kabi-una-excelente-opcion/ last accessed on 08.07.2020 35. http://www.kimballgroup.com/wp-content/uploads/2013/08/2013.09Kimball-Dimensional-Modeling-Techniques11.pdf last accessed on 08.07.2020 36. https://www.akvkbi.com/2020/05/businesses-should-classify-data-not-it. html last accessed on 08.07.2020 37. https://www.akvkbi.com/2016/09/kabi-new-agile-methodology-for-bi. html last accessed on 08.07.2020 38. https://www.akvkbi.com/search/label/IBI last accessed on 08.07.2020 39. Inmon, W.H., Claudia Imhoff, and Ryan Sousa. Corporate Information Factory: Third Edition. New York: John Wiley & Sons. 2000.
286
Business Intelligence Demystified
Index
287
Index A
cloud-based solutions 176
Amazon Web Services (AWS) 177
data quality 178
Agile BI 79
copy, avoiding 178
analytics
about 38
versus BI reporting 41, 43
versus data mining 41, 43
Analytics and Business Intelligence (ABI) 6
analytics platform 255, 256 approach, BI management
alternative solutions 175
duplicate solutions 175, 176 ROI 178
Artificial Intelligence (AI) 275 B
BI architecture about 232
examples 232
BI architecture 1
benefits 178
about 233, 234
business-related names 175
pros 234
build solutions 176 buy and build 177 buy-in from 174
buzzwords, avoiding 177, 178
cons 234
BI architecture 2a about 234 cons 235 pros 235
288
Business Intelligence Demystified
BI architecture 2b about 235 cons 236 pros 236
BI architecture 3 about 236 cons 237 pros 236
BI architecture 4 about 237 cons 237 pros 237
BI architecture 5 about 238 cons 238 pros 238
BI architecture 6 about 239 cons 240
BI concept
about 259
usage 259-262
BI cost
about 148, 149
people cost 149-152
system cost 152-155
total cost of ownership 155
BI exclusions about 141
data migration engineer 141 data steward 141
project manager 141
suffixes and prefixes 142
BI implementation Agile BI 79
out-of-the-box BI (OOTB BI) 80 self-service BI 82 varieties 79
pros 240
BI management
about 240
approach 174
BI architecture 7a cons 241 pros 241
BI architecture 7b about 241 cons 242 pros 241
BI architecture 8 about 242 cons 243 pros 243
BI as a service (BIaaS) 14, 68 BI boundary 254
about 174
prioritization 196, 197
solution approach 183-196 team setup 179-183 teams ideas 183
BI organizational model about 120, 121
centralized governance and centralized execution (CGCE) 123 centralized governance and decentralized execution (CGDE) 124
no governance and decentralized execution (NGDE) 121
some governance and decentralized execution (SGDE) 122
Index BI reporting
about 36, 255, 256
versus analytics 41, 43
versus data mining 41, 43
289
Business Intelligence Architect 127 Business Intelligence as a service (BIaaS) 156 Business Intelligence (BI)
versus operational data transfer 37, 38
about 2, 3, 54
about 157
building 20
side benefits 168, 169
coinage, defining 20, 21
time saving, example 166, 167
decision making 32
about 125
defining 7
technical roles 126
example 3, 4
BI ROI
benefits 47, 48
complex challenge 158-162
business performance management 34
simple challenge 163-166
concept 13, 14
BI roles and responsibilities
decision making, example 33
management roles, in BI 137
evolution 53-56
techno-functional roles 133
key inputs 15
about 200
opportunities, finding 35
development approach 200-205
problem solving 14
about 249, 251
reasons 4-7
BI teams
data quality 209-213 BI technologies
need for 44-46
problems, identifying 35 realities 13
usage 249-254
sector 50, 51
about 249 usage 251-254
solutions 16, 17
implementing 197-199
users, defining 17-19
BI tools
BI users
bug fixes 98
business analyst (BA) 3
Business Intelligence Administrator 126-129 Business Intelligence Analyst about 133-135
responsibilities 134
versus Business Intelligence Business Analyst (BIBA) 136, 137
side benefits 48-50 usage 31, 32
user purpose 51-53 working 12, 13
Business Intelligence (BI), inputs business strategy 15 experience 15 intuition 15
Business Intelligence (BI), key inputs 16
290
Business Intelligence Demystified
Business Intelligence (BI), key terms business 10 data 8, 9
data-driven 10 defining 7
efficient 11
improve 10, 11 information 9 insight 9
intelligence 10 on-going 11 process 7
scalable 11
Business Intelligence (BI), phases about 93, 94
implementation phase 96 initiation phase 94, 95 live phase 97
Business Intelligence (BI), processes
departments 77 parameters 62 sectors 76, 77
software license 70
solution hosting 66-68
solution ownership 68, 69
Business Intelligence Business Analyst (BIBA) about 134-136
versus Business Intelligence Analyst 136, 137
Business Intelligence Competency Center (BICC) 121
Business Intelligence Developer 129-131 Business Intelligence Quality Assurance Engineer 132
Business Intelligence Team Lead 140 Business Objects (BO) 3
business-to-business (B2B) 33, 51
about 36
business-to-businessto-consumers (B2B2C) 51
analytics, versus BI reporting 41, 43
C
analytics 38, 39
analytics, versus data mining 43 BI reporting 36
BI reporting, versus analytics 41, 43 BI reporting, versus data mining 41, 43 data mining 40
data mining, versus analytics 43 data mining, versus BI reporting 41, 43
Business Intelligence (BI) types analytics 63-65
BI integration approach 77-79 data freshness 74 decisions 65, 66
business-to-consumer (B2C) 51
capital expenditure (CapEx) 20, 46, 148 Center of Excellence (COE) 121, 183 Center of Expertise (COE) 121
centralized governance and centralized execution (CGCE) about 120, 123
characteristics 123, 124
centralized governance and decentralized execution (CGDE) about 120, 124, 125 characteristics 124
C-Level role 139
commercial-off-the-shelf (COTS) 194 conformed data mart 264, 265
Index context
setting 118, 119
custom-built BI
data mining about 40
versus analytics 41, 43
versus BI reporting 41, 43
data repository-based BI architecture about 238
D
daily demo and sync-up (DDAS) 203, 208, 209
data and information (DI) challenges about 103
data acquisition challenges (DI, TE) 104
cons 239 pros 238
data steward 141
data visualization tool (DV tool) about 254, 255 features 256
data governance challenges (DI, PE, PR) 104
data privacy and security (DI, PE, PR, TE) 104
versus reporting and analytics platform (RAP) 254
data warehouse
about 262, 266 data, centralizing 266
data quality issues (DI, PE, PR) 103 lack of business knowledge (DI, PE, PR) 105
data, integrating 267, 268
historical data, preserving 267
lack of information (DI, PE, PR) 105 people challenges 105
database administrator (DBA) 3, 119 data-in-place BI architecture about 233 cons 233
myths 272-274
Data Warehousing (DW) 36, 54
Decision Support System (DSS) 54 dependent data mart 262, 263 E
pros 233
embedded BI
about 268-270
ETL tool
data lake
myths 272-274
use cases 270-272
data mart 262
data mart, types
conformed data mart 264, 265 dependent data mart 262, 263 independent data mart 263
291
data migration engineer 141
versus out-of-the-box BI (OOTB BI) 80
customer relationship management (CRM) 97
versus out-of-the-box BI (OOTB BI) 81 about 257
cons 258, 259 pros 257, 258
Executive Information System (EIS) 54 extract, load, and transform (ELT) approach 67
extract, transform, and load (ETL) approach 67, 257
292
Business Intelligence Demystified
G
J
H
K
Google Cloud Platform (GCP) 177 hardware options 156
Head of Business Intelligence 140 I
implementation phase, challenges about 102, 103
Data and information (DI) challenges 103 process challenges 106
technology challenges 107
independent data mart 263, 264
Individual Business Intelligence (IBI) about 220-222 connections 222, 223 generic step 225 initiating 224
learnings 226, 227 points 222, 223
specific step 225, 226 triggering 223, 224
infrastructure as a service (IaaS) 67 initiation phase, Business Intelligence (BI) triggering 95, 96
initiation phase, challenges about 99
job application system (JAS) 97 KABI
benefits 205-207 history 208
L
live phase, Business Intelligence (BI) development 97
enhancement 97 incident 98
maintenance and support 97 migration 98
live phase, challenges about 108
by BI technical team 109 by BI users 108, 109
by technical team 110
technical team 111, 112 M
machine learning (ML) about 275
usage, in BI 275
Management Decision System (MDS) 54 Management Information System (MIS) 3, 54 management roles, in BI
BI initiative, resistance 99-101
about 137, 138
business prioritize 102
C-Level role 139
business case, building for BI 101
Business Intelligence Team Lead 140
sponsors and promoters, acquiring 101
Head of Business Intelligence 140
minimum usable solution (MUS) 97 mixed-source BI 71
Index N
no governance and decentralized execution (NGDE)
about 106
coordinated delivery (PR) 107 lack of funding (PR) 106
characteristics 121, 122
prioritization challenges (PR) 107
non-profit organizations (NPOs) 51
project complexity (PE, PR) 107 project over product approach (PE, PR) 107
O
online analytical processing 250
open-source BI (OSBI) about 70
wrong processes applied (PE, PR) 106
project manager 141
proof of concept (POC) 176 proprietary BI
about 71 advantages 72, 73
advantages 72, 73
disadvantages 72, 73
operational data transfer
versus BI reporting 37, 38
Operational expenditure (OpEx) 148 out-of-the-box BI (OOTB BI) about 80 versus custom-built BI 80 versus embedded BI 81
out of the box (OOTB) 149
disadvantages 72, 73 R
real-time BI
concept 75, 76
Real Time Reports (RTR) 75
reporting and analytics data repositories (RADR) 262
reporting and analytics platform (RAP) about 6, 236 features 256
P
patches 98
versus data visualization tool (DV tool) 254
people challenges about 105
executive sponsor continuity (PE) 105 inexperienced leadership (PE) 105 petty politics (PE, PR) 105 team challenges (PE) 106
people cost 149-152
platform as a service (PaaS) 67 point of sale (POS) 97 prefixes 142
prescriptive analytics 40
293
process challenges
about 120, 121
Online Transaction Processing (OLTP) 141
request for information (RFI) 95 request for proposal (RFP) 95 request for quote (RFQ) 95
return on investment (ROI) 40, 157 S
sample BI architecture 243, 244 self-service BI
about 82 concept 83, 84 remark 84, 85
294
Business Intelligence Demystified
small and medium enterprise (SME) 46, 119 software as a service (SaaS) 33 software options 156
some governance and decentralized execution (SGDE) about 120, 122
characteristics 122, 123
suffixes 142
system cost 152-155 T
technical roles, in BI about 126
Business Intelligence Administrator 126-129
Business Intelligence Architect 127 Business Intelligence Developer 129-131
Business Intelligence Quality Assurance Engineer 132, 133
techno-functional roles, in BI about 133
Business Intelligence Analyst 133-135
Business Intelligence Analyst, versus Business Intelligence Business Analyst (BIBA) 136 Business Intelligence Business Analyst (BIBA) 135, 136
technology challenges about 107
dependencies (PR, TE) 108
performance challenges (TE) 108
technologies and tools distractions (PE, PR, TE) 108
timelines for deliverable (TE, PE) 108 variety of requirements (TE) 108
total cost of ownership (TCO) about 148-156
hardware options 156
software options 156, 157 team options 156
typical BI team structure about 119, 120
BI organizational model 120 U
unique selling propositions (USP) 158 user acceptance testing (UAT) 126 W
Walget 28-30