Machine Learning Governance for Managers 9783031318047, 9783031318054

Machine Learning Governance for Managers provides readers with the knowledge to unlock insights from data and leverage A

140 89 5MB

English Pages 134 [126] Year 2023

Table of contents :
Introduction
Chapter 1—Understanding Business Goals
Chapter 2—Measuring What Is Relevant
Chapter 3—Searching for the Right Algorithms
Chapter 4—Operationalizing Your Machine Learning Solution
Chapter 5—Unifying Organizations’ Machine Learning Vision
Reference
Contents
1: Understanding Business Goals
1.1 Different Types of Goals
1.2 Translating a Goal to a Measurable Outcome
1.3 Building the Relevant Objectives and Key Results (OKRs) Based on the Business Goals and Outcomes
1.3.1 General Business Metrics
1.3.2 Marketing Metrics
1.3.3 Customer Success Metrics
1.3.4 Sales Metrics
1.3.5 Developer Metrics
1.3.6 Human Resource Metrics
1.4 DevOps for Data Science Project and Metric Enhancement
1.5 Summary
References
2: Measuring What Is Relevant
2.1 Performance Metrics
2.1.1 Model Metrics
2.1.2 Business Metrics
2.1.3 ML Operational Metrics
2.2 Causal vs. Correlated Metrics
2.3 Summary
References
3: Searching for the Right Algorithms
3.1 Understanding Algorithms and the Business Questions Algorithms Can Answer
3.2 Generative AI Models
3.3 Defining Business Metrics and Objectives
3.4 Establishing Machine Learning Performance Metrics
3.4.1 Decide What to Measure
3.4.2 Decide How to Measure It
3.4.3 Define Success Metrics
3.5 Architecting the End-to-End Machine Learning Solution
3.6 Summary
References
4: Operationalizing Your Machine Learning Solution
4.1 What Is AI
4.2 Why Successful Model Deployment Is Fundamental for AI-Driven Companies
4.3 How to Select the Right Tools to Succeed with Model Deployment and AI Adoption
4.4 Why MLOps Is Critical for Successful Maintenance of AI Applications
4.5 Summary
References
5: Unifying Organizations’ Machine Learning Vision
5.1 The Challenges of Working in Data
5.1.1 Scalability
5.1.2 Development Environment for Data Scientists
5.1.3 Getting the Right Talent
5.1.4 Privacy and Legal Considerations
5.2 Managing ML/AI Projects Globally and Remotely
5.2.1 Remote Talent
5.2.2 Strong Infrastructure with Footprint in Multiple Regions
5.3 A Guide to Data Team Structures with Examples
5.3.1 Applied Data Science Team
5.3.2 ML Research Team
5.3.3 MLOps Team
5.3.4 BI Team for Reporting and Dashboarding
5.3.5 Program Management Team (Inbound and Outbound)
5.3.6 Data Engineer
5.4 Breaking Communication Barriers with a Universal Language
5.4.1 Strive for Clarity
5.4.2 Communicate Often
5.4.3 Encourage Active Listening
5.4.4 Promote Transparency
5.4.5 Allow for Emotions
5.4.6 Insist on Face-to-Face
5.4.7 Understand Diversity
5.5 How Data Storytelling Can Make Your Insights More Effective
5.5.1 Making Sure There Is an End-to-End Story
5.5.2 Data Visualization
5.6 Summary

Recommend Papers

Mathematics for Machine Learning

1,202 89 6MB Read more

Mathematics for Machine Learning

900 101 16MB Read more

Practical Machine Learning for Computer Vision: End-to-End Machine Learning for Images 1098102363, 9781098102364

By using machine learning models to extract information from images, organizations today are making breakthroughs in hea

3,950 515 53MB Read more

Machine Learning for Text 9783319735313, 3319735314

Text analytics is a field that lies on the interface of information retrieval,machine learning, and natural language pro

535 73 6MB Read more

Machine Learning for Cyber Security 9783110766745, 9783110766738

This book shows how machine learning (ML) methods can be used to enhance cyber security operations, including detection,

209 52 3MB Read more

Machine Learning for Kids 9781718500570, 2020033935, 2020033936

540 101 50MB Read more

Mathematics for Machine Learning 1108679935, 9781108679930

Table of contents : Foreword Part I Mathematical Foundations 1 Introduction and Motivation 1.1 Finding Words for Intuiti

114 69 16MB Read more

Mathematics for Machine Learning 9781108470049, 9781108455145, 9781108679930

The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matr

102 24 17MB Read more

Probabilistic Machine Learning for Civil Engineers 9780262358019

606 100 13MB Read more

Mathematics for machine learning [draft ed.]

655 16 3MB Read more

Machine Learning Governance for Managers
9783031318047, 9783031318054

Author / Uploaded
Francesca Lazzeri
Alexei Robsky

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Francesca Lazzeri · Alexei Robsky

Machine Learning Governance for Managers

Machine Learning Governance for Managers

Francesca Lazzeri • Alexei Robsky

Machine Learning Governance for Managers

Francesca Lazzeri Microsoft Corporation Newton, MA, USA

Alexei Robsky Google Sammamish, WA, USA

ISBN 978-3-031-31804-7 ISBN 978-3-031-31805-4 (eBook) https://doi.org/10.1007/978-3-031-31805-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Introduction

In today’s digital age, machine learning has become a game-changer for businesses looking to gain a competitive advantage. From predictive analytics to chatbots and robotic process automation, machine learning algorithms are being applied to various domains, including finance, healthcare, e-commerce, and customer service. By automating routine tasks and analyzing vast amounts of data, machine learning can help businesses optimize their operations, increase efficiency, and provide better customer experiences. However, as machine learning becomes more prevalent, the need for effective governance of these systems becomes more critical. The consequences of poorly governed machine learning can be severe, from biased decision-making to privacy violations. Without proper governance, machine learning can also learn the wrong outcomes, leading to unintended consequences that can undermine the trust and reputation of your organization and hurt business success. Most organizations have access to more data than ever before. However, the sheer volume and complexity of data can be overwhelming, and many organizations struggle to unlock insights from the data. Even when insights are identified, many organizations struggle to operationalize them and put them into action. This is where machine learning can come in. By leveraging algorithms and automation, machine learning can help organizations make sense of their data, identify patterns and trends, and make more informed decisions. However, implementing machine learning is not without its challenges. One of the biggest challenges is operationalizing machine learning models and incorporating them into business operations and decision-making processes. Many organizations struggle with this step because it requires not only technical expertise but also a deep understanding of the business context and v

vi Introduction

goals and the ability to communicate insights effectively to stakeholders. Furthermore, even if insights from machine learning models are successfully operationalized, organizations still need to leverage them effectively to improve customer experience, optimize their services, automate their operational processes, and realize bottom-line gains. This requires a framework, which is described in this book, to leverage AI solutions, which includes not only technical considerations but also business strategy, organizational culture, and change management. In particular, 87% of organizations struggle to maintain a sustainable machine learning model lifecycle. This includes everything from model selection to training, validation, and testing, as well as ongoing monitoring and maintenance. Moreover, deploying machine learning models can be a time- consuming process; for 64% of organizations, it takes a month or longer to deploy a single model (Algorithmia Survey Report 2021) and build consumable and scalable AI applications on top. This can be attributed to a variety of factors, including the need for extensive testing and validation, the complexity of the model architecture, and the lack of a standardized deployment process. Furthermore, once a model has been deployed, organizations need to ensure that it is scalable and can be integrated into existing systems and processes. To address these challenges, organizations need to think more organically about the end-to-end data flow and architecture that will support their data science solutions. Organizations need to ensure their machine learning solutions are aligned with their business goals, they have the right talent and resources to support them, and they have a clear understanding of the regulatory considerations of using AI. As organizations increasingly rely on data and AI to drive their business operations and outcomes, the role of data science or business managers has become more critical. Managers in that field are responsible for overseeing the entire machine learning process, from identifying business requirements to model deployment and management. However, managing the end-to-end machine learning process can be challenging, particularly when it comes to ensuring that machine learning solutions are scalable, sustainable, and aligned with existing IT and privacy policies. Our framework provides a portfolio of methodologies, technologies, and resources that will assist managers in scaling their machine learning initiatives and becoming more data and AI driven. This includes tools for data generation and acquisition, model selection and development, and model deployment and management. Our framework emphasizes the importance of testing all models, creating the right documentation, and monitoring models and their results and causal business impact.

Introduction

vii

Versioning models is essential for maintaining a historical record of all changes made to the model, which helps to ensure reproducibility and traceability. Creating the right documentation, including model architecture and design, can help managers to communicate effectively with stakeholders and ensure that models are understandable and transparent. Lastly, monitoring models and their results can help to identify potential issues and ensure that models are delivering the expected business impact. Effective AI/ML model governance is essential for organizations that want to maximize the benefits of their AI investments. Organizations that effectively implement all components of AI/ML model governance can achieve a fine-grained level of control and visibility into how models operate in production while unlocking operational efficiencies that help them scale and achieve higher ROI with their AI investments. By tracking, documenting, monitoring, versioning, and controlling access to all models, these organizations can closely control model inputs and understand all the variables that might affect their results. What is our framework about? By using this framework (Fig. 1), managers will learn:

Fig. 1 ML governance framework

viii Introduction

• How to formulate business objectives and translate to measurable outcomes? • How to establish performance metrics that are linkable to business objectives? • How to leverage machine learning open-source frameworks and toolkits to accelerate the model lifecycle? • How to design end-to-end machine learning solutions by making ML technologies, programming languages, or frameworks compatible and integrated into one architecture? • How to implement a machine learning model governance to control access, implement policy, and track activity for models? • How to unify organizations’ machine learning vision? Successful AI initiatives require organizational alignment across multiple decision-makers and business functions.

Chapter 1—Understanding Business Goals Business goals are an essential part of establishing priorities and setting your company up for success over a set period. Taking the time to set goals for your business and create individual objectives to help you reach each goal can greatly increase your ability to achieve those goals. In this chapter, we explore the different ways to understand and act upon business goals. One might say that each business knows their own goals and has a solid strategy to achieve those goals. Setting goals without a clear plan can result in wasted resources and ineffective efforts. One common issue that companies face is the tendency to target everything and nothing all at once. This can happen when there are too many competing priorities or when teams try to tackle too many objectives at once. This approach can lead to diluted efforts, which ultimately hinder progress toward achieving the overall business goals. It is not always easy to establish the right goals. Leaders may have different opinions on what the priorities should be, or external factors may change, requiring a shift in the goals. However, this is not to discourage leaders from establishing goals but rather to provide a different perspective on measuring what is right. It is crucial to continuously evaluate the goals and adjust them as necessary to remain aligned with the company’s overall strategy. Successful companies have a clear understanding of their purpose and direction. Goals and objectives play a crucial role in providing that clarity. Goals are general statements of desired achievement, while objectives are the specific steps or actions that a company takes to reach their goal. In other

Introduction

ix

Table 1 Dimensional overview of goals and objectives Dimension

Goal

Objective

Scope Foundational process Magnitude

End result Vision

A mean to an end Data and facts

Large

Evaluation

Goals are usually inspirational and intangible

Many metrics to support the larger goal Specific data and measurable outcomes

words, goals establish the direction and vision of the company, while objectives provide the necessary action plan to achieve those goals. Both goals and objectives should be specific and measurable. It is essential to establish goals and objectives that can be quantified and tracked through data and data science solutions. By doing so, a company can establish a clear baseline for their progress toward achieving their goals and objectives, identify areas of improvement, and adjust their strategy accordingly. Table 1 summarizes different dimensions for which goals can be established. The dimensions can range from profitability and growth to customer service and employee satisfaction. For example, a company may establish a goal to increase profitability. The objective to achieve that goal may include increasing sales revenue or reducing costs. Goals and objectives can be established at different levels of the organization. For instance, a company may have a company-wide goal to increase revenue, with department-level objectives to achieve that goal, such as increasing product sales or improving marketing campaigns.

Chapter 2—Measuring What Is Relevant As discussed in the previous chapter, it is essential for a business to understand its goals, tie them to measurable outcomes, and establish the right monitoring, such as OKR, to evaluate Data Science and machine learning efforts. In this chapter, we go deeper on performance measurements and metrics and emphasize on understanding second-level metrics. Measurement is the process of associating numbers with physical quantities and phenomena. In the context of data science and machine learning, measurement is fundamental to evaluating the accuracy and usefulness of models and algorithms. Measuring the performance of models and algorithms requires identifying the appropriate metrics to use.

x Introduction

Metrics are quantitative measures used to evaluate performance. Choosing the right metrics is critical to accurately assessing the performance of data science and machine learning efforts. While many organizations focus on primary metrics such as revenue, customer acquisition, or cost savings, it is equally important to understand second-level metrics. Second-level metrics are those that are directly linked to primary metrics and provide insights into how well the business is performing in a particular area. For example, if a business’s primary metric is revenue, the second-level metrics could be average order value, customer lifetime value, or conversion rate. By understanding these second-level metrics, businesses can identify the factors that contribute to the primary metric and make data-driven decisions to improve performance. The relevance of data is essential for any business or organization that aims to leverage its data and make informed decisions. It can be summarized in three dimensions, as discussed before: 1. Appropriateness of business concepts and their measurement through data. Data scientists need to ensure that the data they use aligns with the business concepts that the organization aims to measure. In other words, they need to ensure that they are measuring the right things and that the measures are relevant to the business context. This requires continuous review and revision of concepts and measures as problems change, and as analysis reveals weaknesses in current measures and suggests alternate measures that might better capture information of use to constituents. 2. Ability to link data. Data scientists need to ensure that they are using data that can be linked to other data sources, both internal and external. The ability to link data collected through various instruments and to external data sources increases the breadth and depth of data, and thereby, the ability of data scientists to use them to address current issues. This means that data scientists need to ensure that the data they use is structured in a way that enables easy linkage with other data sources. 3. Data currency. This refers to whether data reflect current conditions, such as seasonality, trends, accuracy, and speed of the events under study. Data scientists need to ensure that the data they use is up-to-date and accurately reflects the current conditions that they are studying. This requires regular monitoring and updating of data to ensure that it remains current and relevant.

Introduction

xi

Chapter 3—Searching for the Right Algorithms Machine learning is a powerful tool that can help organizations transform their business operations, improve decision-making, and create new revenue streams. In this chapter, we will learn what machine learning is and how being a machine learning organization implies embedding machine learning teams to fully engage with the business and adapting the operational support of the company. Most importantly, we will learn the following four dimensions that companies can leverage to become machine learning-driven: • Understanding Algorithms and the Business Questions that Algorithms can answer. We discuss the importance of understanding algorithms and how they can be used to solve specific business questions. Machine learning algorithms are designed to learn from data and make predictions or decisions based on that data. To leverage the full potential of machine learning, companies need to understand the various types of algorithms and their applications to different business problems. • Defining Business Metrics and Business Impact. We focus on defining business metrics and measuring the impact of machine learning on business outcomes. Business metrics are key performance indicators (KPIs) that can be used to measure the success of machine learning initiatives. It is important for companies to define these metrics and track them over time to understand the impact of machine learning on their business. • Establishing Machine Learning Performance Metrics. We will delve into establishing machine learning performance metrics. These metrics are used to evaluate the performance of machine learning models and ensure they are meeting the required accuracy, reliability, and efficiency standards. Companies need to establish clear performance metrics and continuously monitor and optimize their machine learning models to ensure they are delivering the expected results. • Architecting the End-to-End Machine Learning Solution. We discuss the importance of architecting an end-to-end machine learning solution. A successful machine learning implementation requires a holistic approach that considers all stages of the machine learning lifecycle, including data acquisition, model development, deployment, and monitoring. By considering the end-to-end process, companies can ensure that their machine learning solution is scalable, efficient, and can deliver the expected results.

xii Introduction

Moreover, in this chapter, you will learn how to navigate the data science and machine learning tool landscape. The data science and machine learning tool landscape is constantly evolving and there are several important aspects of tools to consider when building a machine learning organization. One is the category. There are two important categories of data science tools we highlight: open-source versus proprietary code and no/low code versus code. Many of the most popular data science tools are open source, meaning that they are free to use and the underlying code is available for anyone to view and modify. However, there are providers that use proprietary code. Even providers that rely on proprietary software open source either some parts of their solution or launch initiatives to attract data scientists that would like to work on an open- source platform. Another important distinction to consider is whether a tool requires coding skills or can be used by individuals with no or low coding experience. No/low code tools often provide a user-friendly interface that allows users to conduct data analysis, experimentation, and machine learning without needing to write complex code. This can be beneficial for individuals with less technical expertise or those who need to quickly iterate on experiments and projects. We discuss some data science tools that data scientists use to conduct data analysis, experimentation, and machine learning, and we will understand the main features of the tools, their benefits, and the comparison of different data science tools.

Chapter 4—Operationalizing Your Machine Learning Solution Machine learning and data science are tools that offer companies the possibility to transform their operations: from applications able to predict and schedule equipment’s maintenance, to intelligent R&D systems able to estimate the cost of new drug development, to HR AI-powered tools able to enhance the hiring process and employee retention strategy. However, to be able to leverage this opportunity, companies must learn how to successfully build, train, test, and push hundreds of machine learning models in production, and to move models from development to their production environment in ways that are robust, fast, and repeatable. In this chapter, we explore some common challenges of machine learning model deployment and we discuss the following points in order to enable you to tackle some of those challenges:

Introduction

xiii

• Why successful model deployment is fundamental for data-driven companies? Deploying a model successfully means ensuring that the model produces accurate and reliable results in production. • Why companies struggle with model deployment, such as difficulties in scaling the models, ensuring reliability, and dealing with technical debt? • How to select the right tools to succeed with model deployment? The right tools help companies streamline the deployment process, minimize human error, and automate repetitive tasks. We examine the features that companies should look for when selecting tools for model deployment, such as support for different programming languages, scalability, ease of use, and integration with other tools. A few key aspects that companies need to solve to overcome the barriers in terms of ML operationalization are: 1. Data Accessibility—a crucial factor for companies to succeed in today’s data-driven world. A well-defined data and machine learning strategy can help companies to define where business data is stored, in what format, which quantity, and how to get access to these in an automated environment in production. A clear data strategy can provide a roadmap for accessing data in an automated environment, thereby increasing productivity and efficiency. Third-party data is an excellent source of information for companies looking to geo-enrich and improve their machine learning models’ accuracy. Socio-demographics and segmentation data, for example, can be used to create more precise customer profiles, which can then be used to personalize marketing efforts. Similarly, property data can help companies understand the value of real estate and the potential for expansion, while social expenditure data can be used to determine consumer spending habits. Traffic and weather data can also be incredibly valuable, particularly for companies in transportation, logistics, and supply chain industries. By incorporating third-party data into their machine learning models, companies can gain a competitive edge and achieve better results. However, it is essential to ensure that the data being used is accurate, reliable, and obtained legally. Companies should also invest in robust data security measures to protect sensitive information and ensure data privacy. With a well-crafted data and machine learning strategy and the right data sources, companies can unlock new insights and make informed decisions that drive business success. 2. Data security and governance—major concerns for organizations in today’s digital age. Due to the nature of their work, data scientists used to have

xiv Introduction

access to all data without restrictions. However, today’s organizations are more cautious about providing access to sensitive data to protect their customers’ privacy and comply with regulations. To ensure that data scientists have access to the data they need while safeguarding privacy and security, companies must establish clear guidelines for data access. This may include creating policies around who can access data, when they can access it, and how it should be handled. Non-disclosure and non-dissemination contracts can help to protect sensitive information and ensure that data scientists only use data for its intended purpose. Under which circumstances can they have access to data? Which non-dissemination and nondisclosure contracts still allow them to feed machine learning models? The process of establishing these guidelines can be complicated, requiring communication and collaboration between different departments. It may take weeks or even months to develop a data governance framework that balances the needs of data scientists with the need to protect customer data. Once all barriers concerning data accessibility and governance have fallen, the structure of the data itself and its quality come into play. Having the required data is one thing, but it is totally useless if its quality is questionable. Data science managers should keep asking themselves: Are the datasets complete? Are there representative samples? Are there missing records? 3. Data processing capabilities—important aspect in the development and production stages of machine learning models. In the development stage, it is possible to optimize computing power and processing to minimize the execution time and maximize efficiency. Data scientists can use tools such as parallel computing and distributed computing to speed up the training process of machine learning models. This allows data scientists to quickly experiment with different models and hyperparameters and optimize their accuracy before deploying them in production. In production, with large amounts of data, even multi-core servers are not enough to retrain a model in a reasonable time frame. As a result, companies need to leverage cloud computing and big data technologies to enable big data APIs mixed with machine learning algorithms. Cloud environments such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP) provide scalable and flexible infrastructure to process large amounts of data and run machine learning models in production. Companies can use big data technologies such as Apache Hadoop, Apache Spark, and Apache Flink to process and analyze large volumes of data in real time. These tools enable distributed processing of large data sets across clusters of computers, allowing data scientists to quickly process, transform, and analyze data.

Introduction

xv

Data scientists can leverage these big data tools to train machine learning models on large datasets and make predictions in real time.

Chapter 5—Unifying Organizations’ Machine Learning Vision Successful data initiatives require organizational alignment across multiple decision-makers and business functions. Establishing a vision is perhaps the most important step in implementing a new technology. It is not any different for machine learning. Business and IT must work together to establish a vision and define clear objectives for an ML implementation. The objectives could be as simple as improving the accuracy of the fraud detection system all the way to improving overall operational efficiency—but it needs business and IT alignment and the agreement to work toward a common goal. To achieve organizational alignment, it is important to involve decision- makers and representatives from various business functions early in the process. This ensures that everyone has a say in the vision and objectives, and that everyone is on the same page. By involving decision-makers from different functions, the company can ensure that the ML implementation aligns with the overall business strategy and goals. The vision for the ML implementation should be clearly defined and communicated to everyone in the organization. This can include details such as the expected benefits, the timeline for implementation, and the resources required to achieve the objectives. Having a clear vision and objectives can help keep the team focused and motivated throughout the implementation process. A few key questions that managers should ask themselves are: • What problem do we solve for customers with data and machine learning? • What are the data and actionable insights that we need to drive our customer success? • How can we use data and machine learning to learn from the past and innovate in the future? • Can we predict with our data where our industry will be in the next 12 months? Good data science founding visions are about solving real-world problems, rather than proposing pre-determined solutions that may become outdated as

xvi Introduction

Fig. 2 Bottom-up approach for data science managers

markets and technologies evolve. In other words, it is important to focus on the underlying business problem or opportunity that data science can help address, rather than starting with a specific solution in mind. By focusing on the problem, organizations can leverage data science to generate insights and solutions that are both relevant and flexible enough to adapt to changing circumstances. Finally, managers can’t assume that their company visions will be communicated from the top-down. They must install it in all employees from the bottom up (Fig. 2).

Reference Algorithmia 2021, Algorithmia 2021_Enterprise_ML_trends, https://www. datarobot.com/algorithmia/

Contents

1 U nderstanding Business Goals 1 1.1 Different Types of Goals 2 1.2 Translating a Goal to a Measurable Outcome 9 1.3 Building the Relevant Objectives and Key Results (OKRs) Based on the Business Goals and Outcomes 12 1.3.1 General Business Metrics 15 1.3.2 Marketing Metrics 15 1.3.3 Customer Success Metrics 16 1.3.4 Sales Metrics 16 1.3.5 Developer Metrics 17 1.3.6 Human Resource Metrics 17 1.4 DevOps for Data Science Project and Metric Enhancement 20 1.5 Summary 23 References 24 2 M easuring What Is Relevant 25 2.1 Performance Metrics 25 2.1.1 Model Metrics 26 2.1.2 Business Metrics 33 2.1.3 ML Operational Metrics 35 2.2 Causal vs. Correlated Metrics 37 2.3 Summary 39 References 40

xvii

xviii Contents

3 Searching for the Right Algorithms 41 3.1 Understanding Algorithms and the Business Questions Algorithms Can Answer 45 3.2 Generative AI Models 56 3.3 Defining Business Metrics and Objectives 58 3.4 Establishing Machine Learning Performance Metrics 59 3.4.1 Decide What to Measure 60 3.4.2 Decide How to Measure It 62 3.4.3 Define Success Metrics 63 3.5 Architecting the End-to-End Machine Learning Solution 64 3.6 Summary 70 References 70 4 Operationalizing Your Machine Learning Solution 73 4.1 What Is AI 74 4.2 Why Successful Model Deployment Is Fundamental for AI-Driven Companies 76 4.3 How to Select the Right Tools to Succeed with Model Deployment and AI Adoption 80 4.4 Why MLOps Is Critical for Successful Maintenance of AI Applications 83 4.5 Summary 92 References 92 5 Unifying Organizations’ Machine Learning Vision 93 5.1 The Challenges of Working in Data 93 5.1.1 Scalability 93 5.1.2 Development Environment for Data Scientists 94 5.1.3 Getting the Right Talent 96 5.1.4 Privacy and Legal Considerations 97 5.2 Managing ML/AI Projects Globally and Remotely 98 5.2.1 Remote Talent 98 5.2.2 Strong Infrastructure with Footprint in Multiple Regions 98 5.3 A Guide to Data Team Structures with Examples 99 5.3.1 Applied Data Science Team 99 5.3.2 ML Research Team 100 5.3.3 MLOps Team 101 5.3.4 BI Team for Reporting and Dashboarding 101

Contents

5.3.5 Program Management Team (Inbound and Outbound) 5.3.6 Data Engineer 5.4 Breaking Communication Barriers with a Universal Language 5.4.1 Strive for Clarity 5.4.2 Communicate Often 5.4.3 Encourage Active Listening 5.4.4 Promote Transparency 5.4.5 Allow for Emotions 5.4.6 Insist on Face-to-Face 5.4.7 Understand Diversity 5.5 How Data Storytelling Can Make Your Insights More Effective 5.5.1 Making Sure There Is an End-to-End Story 5.5.2 Data Visualization 5.6 Summary

xix

102 102 103 103 103 104 104 104 105 105 105 105 107 107

1 Understanding Business Goals

In this chapter, we explore the different ways to understand and act upon business goals. One might say that each business knows its own goals and has a solid strategy to achieve those goals. However, it is very easy to drown in the weeds of metrics and goals and target everything and nothing all at once. Now, this is not to discourage and accuse leaders from establishing wrong goals, but more to provide a different view to measuring what is right. Let’s consider an example of a social media company that wants more users to use their product or, in other words, increase engagement. Engagement in many cases is a revenue-generating signal through ads or subscription conversion models; however, one must ask the question, “What stage is this company at?” Is the company just starting the journey, is in its initial growth phase, or is it an established platform that contains millions of users, and the goal is more around identifying new users or revenue opportunities? To achieve the business goal of increasing engagement, data-driven metrics and analysis are essential. These metrics will help to measure the effectiveness of different strategies and inform decision-making. For example, metrics such as user customer acquisition cost (CAC), customer lifetime value (CLV), churn rate, and engagement rate can help to track the progress toward the goal and identify areas for improvement. There are a lot of elements to strategy definition for companies in various stages, and we are not going to go deep in this book, but rather understand metrics that help form the strategy and definitions. In the next few paragraphs, we will take a closer look at the following strategic definitions to outline metrics that are relevant and accurate for your business growth:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 F. Lazzeri, A. Robsky, Machine Learning Governance for Managers, https://doi.org/10.1007/978-3-031-31805-4_1

1

2

F. Lazzeri and A. Robsky

• Different types of goals: For instance, there are outcome goals and performance goals. Outcome goals refer to the end result a company is aiming to achieve, such as increasing revenue or expanding the customer base. Performance goals, on the other hand, refer to the process of achieving those outcomes, such as increasing the number of sales calls made by the sales team. • Translating a goal to a measurable outcome: This means defining specific metrics that can be tracked to determine progress toward the goal. For instance, if the goal is to increase revenue, the measurable outcome could be the revenue growth rate or the average revenue per user. • Building the relevant objectives and key results (OKRs) based on business goals and outcomes: OKRs are a goal-setting framework that helps companies define and measure progress toward specific objectives. Each objective should have clear and measurable key results that are directly linked to the overall business goal.

1.1 Different Types of Goals As mentioned before, each business can be in different development stages. There are several definitions of growth states. According to The Five Stages of Small Business Growth (1983 Neil C. Churchill and Virginia L. Lewis), the five stages are existence, survival, success, take-off, and resource maturity. As a side note, there are also other definitions that contain seven stages, but they are mostly based on the same concepts (see Fig. 1.1) Existence Stage In the first stage of business growth, a company is a true startup in every sense of the word. In this stage, the owner is usually the primary driver of the company’s operations, and the company’s success relies heavily on their abilities. All businesses in the existence stage face a few common challenges, which include: Determining whether their product or service will be accepted and/or desired by enough customers to remain viable. This involves conducting market research, identifying customer needs, and developing a product or service that meets those needs. • Finding out whether the company will be able to create strong enough processes to deliver their products or services to an acceptable standard. This involves developing standard operating procedures and quality control measures to ensure that the company is delivering a consistent product or service to customers.

1 Understanding Business Goals

3

Fig. 1.1 Seven business stages

• Determining whether the company will be scalable and able to meet the customer demand as it increases. This involves developing a business plan that outlines the company’s growth strategy and identifies the resources needed to support that growth. In this stage, the business owner is responsible for just about everything, from creating the product or service to marketing and sales. Even if they can hire a few employees to assist with these early processes, the owner is the driving force behind the company’s growth. In this stage, the business faces a few common challenges, such as determining whether their product or service will be accepted and/or desired by enough customers to remain viable. The business also needs to find out whether it will be able to create sound enough processes to deliver its products or services to an acceptable standard. Additionally, the company needs to determine whether it will be scalable and able to meet the customer demand as it increases. The goal during the existence stage is simply to survive because many do not. In this initial phase, it’s easy to fail because there are so many reasons why brand-new businesses do not survive past the startup phase. The business might never gain enough traction with customers, the business could run out of operating capital, or the owner might simply wear out under the immense demands on finances, time, and energy in addition to the pressure of attempting to run a startup. Therefore, during this stage, the business owner must be very careful and focus on building a strong foundation for the business, laying a solid groundwork that will enable the company to move on to the next stage of growth.

4

F. Lazzeri and A. Robsky

This includes developing a sound business plan, establishing a clear brand identity, and building a strong team. Survival Stage Businesses that survive the existence stage move into phase two, survival. During the survival stage, a business has already demonstrated that its products or services are viable, that customers want them, and that customers return. However, the company has not yet demonstrated the ability to balance revenue with expenses in a successful manner. During the survival stage, your business is tested by determining, first, whether you are able to break even and, second, whether you are able to generate profits sufficient for reinvesting and growing the business. In addition to financial stability, businesses in the survival stage must also focus on streamlining their operations and improving their processes. This may involve reducing waste, improving efficiency, and finding ways to increase productivity. It is also important for the business to continue to refine its product or service offering, meet the evolving needs of its customers, and stay ahead of competitors. In the survival stage, it is critical to establish a strong customer base and build relationships with them. This means providing excellent customer service, staying in touch with customers, and continually soliciting feedback to improve the product or service offering. At the same time, the business must also maintain focus on its financial goals and avoid overspending or overextending itself. Ultimately, the goal of the survival stage is to reach a point where the business is generating sustainable profits and is able to reinvest those profits back into the business for growth. Once the business has achieved this level of financial stability, it can begin to think about expanding into new markets or launching new products or services. Success Stage When a company survives the survival stage, it moves into the third stage, success. In the success stage, a business has demonstrated its ability to generate enough revenue to cover its expenses and produce a profit. The business owner is no longer struggling to survive and can now focus on expanding the company or pursuing other interests. This stage is marked by strong financial performance and stability, and the company is likely a recognized leader in its industry. The challenge of a business owner with a successful company is deciding what they want to do with their success. Business owners basically have two options:

1 Understanding Business Goals

5

(A) Use the business as a platform for growth and continue reinvesting, expanding, and even using the business’s success and assets to finance and fund additional growth. This could involve launching new products or services, expanding into new markets, or acquiring other companies to integrate into the existing business. (B) Maintain the status quo, keep the company operational, and use the profits to fund other pursuits or interests. Regardless of the path chosen, the business owner must carefully consider the risks and benefits associated with each option and make a strategic decision that aligns with their long-term goals and objectives. Take-Off Stage The take-off stage marks a critical moment for businesses, where rapid growth requires a focus on scaling processes, optimizing resources, and adapting to changing market conditions. As revenue and profits increase, businesses must also deal with expanding expenses, including costs associated with scaling operations, marketing, and hiring new talent. In order to manage these costs, businesses must focus on improving operational efficiency through the use of automation, outsourcing, and other strategies. This can involve investing in new technology, reengineering workflows, and streamlining production processes to ensure that the business is operating at peak efficiency. Additionally, businesses must be able to effectively manage personnel, delegate tasks and responsibilities to ensure that resources are used effectively, and ensure employees are able to focus on high-value tasks that contribute to overall growth. Free cash flow is also an important concern during this stage, as businesses may need to invest in new infrastructure or technologies to support continued growth and must be able to manage liabilities and debt to avoid overextending themselves financially. Ultimately, businesses that successfully navigate the take-off stage are able to achieve sustained growth and profitability and establish themselves as leaders in their respective industries. Resource Maturity Stage A small business has already survived the take-off stage and has established a place in the market. At this point, the focus is on strategic planning and resource management to improve the company and ensure its continued growth. Business owners in this stage should work on streamlining operations by setting up a budgeting and strategy system that allows them to make informed decisions about resource allocation. By focusing on resource allocation, they can minimize waste and increase efficiency. Additionally, it’s important for business owners to maintain their entrepreneurial spirit and

6

F. Lazzeri and A. Robsky

continue setting short-term and long-term goals to prevent the company from becoming stagnant. This can involve identifying new opportunities for growth and innovation, investing in employee development, and improving processes and systems to ensure continued success. In order to effectively navigate this stage, business owners need to be flexible, adaptable, and willing to take calculated risks in order to keep their business relevant and competitive. In each stage, the goals could be the same or different. Let’s explore a few common metrics for each stage and further discuss in the next chapter how to translate the goal to a measurable outcome. Existence Stage As this step is a very critical initial step for a startup, there are many moving parts as mentioned before around acquiring and maintaining customers and cash flow. One of the most important metrics to consider during this stage is the instrumentation of data within the business, product, and customers. This involves collecting, analyzing, and utilizing data to make informed decisions about the business. In order to do this effectively, it’s crucial to establish a solid foundation for data quality. Metrics such as data coverage, availability, and latency are important to track during this stage. Data coverage refers to the percentage of data that is collected and analyzed, while data availability refers to how easily accessible the data is. Data latency refers to how quickly data can be processed and analyzed. By focusing on these metrics, business owners can ensure that they have a solid data foundation that will be essential for future stages of growth. Additionally, business owners must also focus on other metrics such as customer acquisition, retention, and revenue growth to ensure that the business can survive and thrive in the existence stage. By carefully tracking and analyzing these metrics, business owners can make informed decisions about the future of the business. Survival Stage Now that a business proved its product market fit (Bussgang 2013), it is critical to measure continuous flow of revenue and balance the expenses. In every business, revenue can be derived from different sources, namely subscriptions, services payments, ads, interests, etc. It is important to establish a metric that can demonstrate leading indicators of revenue. For example, one effective way to measure leading indicators of revenue is to track customer acquisition, which is a measure of the number of new customers who are signing up for the product or service. This metric can be used to forecast future revenue growth potential. Another useful metric to track during this stage is customer retention rate, which indicates how many customers are returning to the product or service. A high retention rate can help balance the customers coming in and those returning and provide a better sense of customer satisfac-

1 Understanding Business Goals

7

tion and loyalty. By tracking these metrics, a business can identify areas where improvements can be made to optimize revenue and maintain profitability, which are crucial for sustaining the business during this stage. Success Stage As the business succeeds and metrics look greener, one must not overlook customers and their evolving needs and requirements. The company has established itself as a viable business with a good customer base and reliable revenue streams. However, this is not a time to rest on one’s laurels, as competition is always present and customer preferences can change rapidly. To stay ahead of the competition and keep customers engaged, it is important to measure how users interact with the company’s products or services. This can be done by analyzing customer feedback, tracking usage patterns, and monitoring customer behavior on the company’s website or other digital platforms. By understanding how customers are engaging with the company’s offerings, a business can identify areas for improvement and potential growth opportunities. For example, if customers are struggling to navigate the website or find what they are looking for, this could be a signal to improve the user experience or offer more personalized recommendations. By pivoting the business to meet evolving customer needs, a company can stay relevant and avoid being overtaken by more agile and adaptable competitors. In addition to measuring user engagement, it is still crucial to continue measuring leading indicators for revenue and retention. Customer acquisition and retention rates can still be a key metric for growth and success, but they should be complemented with a deeper understanding of how users are engaging with the company’s products or services. This deeper understanding can be a valuable asset in driving growth and maintaining a successful business over the long term. Take-Off Stage At this growing stage, as mentioned before, naturally, expenses grow, and there is a call for efficiency in operations. In order to do this effectively, it is essential to establish metrics that track operating costs and help identify areas where expenses can be cut without sacrificing user experience. Metrics such as operating margin, cost per acquisition, and lifetime value of a customer can be useful in determining which expenses are providing the most value and which are not. It is also important to consider the impact of cost-cutting measures on the company’s ability to compete in the marketplace. For example, decreasing computing and storage capacity might result in slower processing times, leading to customer frustration and increased churn. Therefore, it is crucial to test against the success and survival stages to

8

F. Lazzeri and A. Robsky

ensure that any cost-cutting measures do not harm the customer experience or lead to long-term negative consequences. Another key challenge during this stage is to improve operational efficiency to keep up with the growth in demand. Metrics such as task completion time, cycle time, and defect rate can be used to identify bottlenecks in the business process and improve efficiency. Additionally, it is important to track metrics related to staffing, such as employee turnover rate and employee satisfaction, to ensure that the company has the necessary human resources to support its growth. Ultimately, the take-off stage requires a careful balancing act between managing expenses, improving operational efficiency, and maintaining a high level of customer satisfaction. By establishing the right metrics and testing against previous stages, a company can successfully navigate this critical phase of growth and position itself for long-term success. Resource Maturity Stage After completing the previous stage of cost efficiency, there is an opportunity for innovations with resource allocations. This stage is characterized by a company’s ability to allocate resources efficiently and effectively while continuing to develop new products, services, or revenue streams. One key metric to track in this stage is the return on investment (ROI) of new initiatives. By measuring the effectiveness of new investments, a company can decide where to allocate resources in the future. Additionally, it’s important to measure customer satisfaction and engagement as the company introduces new products or services. Customer feedback can help identify areas for improvement or future opportunities for growth. As a company continues to grow and expand, it’s crucial to maintain the entrepreneurial spirit that got it to where it is today. This may involve taking calculated risks and being open to new ideas. In this stage, companies have the opportunity to create mini-startups within the business, growing them into additional revenue centers. With the right metrics in place, a company can continue to evolve and innovate, staying ahead of the competition and maintaining its success.

1 Understanding Business Goals

9

1.2 Translating a Goal to a Measurable Outcome A common metric used for counting users and their retention is daily active users (DAU) or monthly active users (MAU). Depending on the frequency of the usage of the product, one could select DAU or MAU. For example, in a social media company, where there is a desire for users to come back every day, DAU works better. And in another case where there is a hardware device that doesn’t need to be used every day, MAU might be a better metric. To elaborate further, when a business is counting other units than users, such as devices, the metric can be changed to daily active devices (DAD) or monthly active devices (MAD). In our previous example, the social media company is well established and is looking for new user growth opportunities. However, they look at all-up DAU as their north-star metric. As this metric looks intuitive for user counts, it has an inherent bias for heavy usage users who continue to come back to the platform and hide any effect from initiatives aimed toward light or new users. To address this issue, the business should also track other metrics, such as new user acquisition rate or user engagement rate, to get a more complete picture of user behavior and identify areas for improvement. Furthermore, the business should establish a balanced set of metrics that can provide a holistic view of its performance and ensure that it is on track toward achieving its long-term goals. For instance, if the DAU for the social media company from before is 100 M and 70 M (70%) of those are heavy users who come back to the platform multiple times per day (see Fig. 1.2), it could be very hard to have an impact on DAU since the majority are already coming back to the platform multiple times per day, so having them to come more won’t change the metric. Therefore, the company should focus its efforts on acquiring and retaining light or new users who do not use the platform as frequently. This would have a greater impact on the DAU metric and would help the company achieve its growth goals. To do so, the company might need to adjust its product strategy, marketing campaigns, or user acquisition channels to attract and retain these types of users. This can be solved by looking at different segments or cohorts of the users and doing multiple deep dives. The issue here is the near-to-endless analytics work for identifying users and their usage patterns in each stage. See Fig. 1.3 for an illustration of the heavy users and their daily return breakdown. Thus, it is very important to understand from the beginning the business goals and how to measure them to prevent unnecessary analytics costs at later stages.

10

F. Lazzeri and A. Robsky

Fig. 1.2 DAU breakdown

Fig. 1.3 Heavy users’ daily return breakdown

In some cases, and in our example, it might be more appropriate to use other metrics besides DAU to measure the success of a business. For example, conversion rates could be used to measure how many users are converting from free to paid users, indicating the success of a business’s monetization strategy. Acquisition rates could also be a useful metric, measuring the rate at which new users are being acquired, which is important for growth. Additionally, measuring the amount of time each segment of users spends using the platform in a meaningful way could provide valuable insights into user engagement and satisfaction.

1 Understanding Business Goals

11

Choosing the right metrics is crucial for ensuring that a business is measuring the right things at the right time. It’s important to understand what stage the business is in and what its goals are, and then choose metrics that align with those goals. For example, if a business is in the survival stage, it might prioritize metrics related to revenue and customer retention. If it’s in the success stage, it might prioritize metrics related to user engagement and satisfaction. Ultimately, the right metrics can help a business make informed decisions, drive growth, and achieve its goals. Setting up realistic and measurable outcomes is a crucial step for any business to ensure that they are on track to achieve their objectives. However, it is not enough to simply set goals such as increasing retention, DAU, or revenue. It is important to also define how those goals will be achieved and to establish realistic and measurable outcomes that can be used to track progress. One common mistake that businesses make is investing in projects that may not have a significant impact on the metrics they are trying to improve. For example, if a business is trying to increase DAU, they might invest heavily in new features or marketing campaigns, only to find that these efforts do not have a significant impact on the desired outcome. This is why it is important to set up realistic and measurable outcomes that can be used to track progress and ensure that efforts are focused on activities that will have a meaningful impact on the business. One way to establish realistic and measurable outcomes is to use the specific, measurable, achievable, relevant, and time-bound (SMART) criteria. By setting goals that meet these criteria, businesses can ensure that they are realistic and measurable and that they align with the broader objectives of the company. Additionally, it is important to regularly review and adjust these outcomes as needed to ensure that they remain relevant and achievable given changing market conditions and business priorities. As discussed in the example in the first part of this chapter, if 70% of DAU are heavily engaged users who won’t “contribute” to more DAU, then a measurable outcome should be defined differently. For example, increase of engagement per user or revenue per user. When a business has a goal of growth and increased customer acquisition of a new service, tracking new sign-ups broken down by segments might shed light on what outcome and where is the largest opportunity to capture. In this case, the sign-ups by segments are realistic and measurable. Another example is retention, which has become a very popular subject in many companies. The definition of retention can also vary from company to company and from product to product. The most popular definition is users who come back to the platform every x days. We can always measure this by DAU, MAU, etc. or

12

F. Lazzeri and A. Robsky

actually how many users were active in a particular month (i.e., specific cohort) and track them over 3 months, for instance, and see how many are still active. As before, in this case, retention is a huge buzzword that got broken down to a measurable and realistic outcome. Lastly, a company might want to monetize specific users who use a specific service. So the goal is monetization, and the measurable outcome can be an increase of paid users (if there are free and paid services) or even just usage in a service with a price increase. Again, tangible outcomes can be tied to the goal. In the previous example, we understand that most customers are already coming back every day, and the goal might be to provide them with more value for each visit or increase revenue per user/session. If a user comes five times per day but only stays for 10 s each time, is that better or worse than a user who comes five times and stays 2 min each time? These are the types of questions that would generate more suitable business goals and thus metrics. We have explored retention metrics such as DAU and MAU as examples; however, there are many more metrics to consider for goals. They can be generic as DAU or very specific such as “customers pressing the blue button.” It all depends on the business, strategy, and the goals.

1.3 Building the Relevant Objectives and Key Results (OKRs) Based on the Business Goals and Outcomes As you go through this book, you will understand the importance of establishing a good framework for machine learning operations (MLOps) in an organization. As we saw in the beginning of this chapter, it is very important to set aside time and investment to establish goals and measurable outcomes. Those outcomes could be the key performance indicator (KPI) or objective key result (OKR) (Doerr 2018). For example, increasing sign-ups from customers in the United Kingdom can be a good measurable outcome. However, there is still a gap between achieving the outcome and efforts that led to success. Data science and machine learning play a significant role in the following processes (Fig. 1.4): • Identification of opportunities through data mining and user research: This involves collecting and analyzing large amounts of data to uncover patterns, trends, and insights that can inform business decisions. Machine

1 Understanding Business Goals

•

•

•

•

13

learning algorithms can be used to identify correlations and relationships that might not be immediately apparent to human analysts. Definition of success metrics: These metrics are critical in determining whether or not a particular initiative or investment is yielding the desired results. Data science can be used to identify and track the key performance indicators that are most relevant to the business goal. For example, if the goal is to increase user engagement, the relevant metrics might include the number of sessions per user, the length of time spent on the platform, or the number of shares or likes. Hypothesis on what could help bring more value to users: Data science can be used to identify patterns in user behavior, preferences, and needs, which can then be used to generate ideas for new features or improvements to existing ones. Machine learning algorithms can be used to predict which features are most likely to be successful and which are not. Building minimum viable product (MVP) or proof of concept (POC) using Data Science (DS)/Machine Learning (ML)/Artificial Intelligence (AI) solutions: This involves developing a prototype that demonstrates the viability of the idea and shows how it can add value to the user. Data science and machine learning can be used to develop algorithms that support new features or improvement, such as recommendation engines or predictive models. Testing the outcome with experimentation when applicable or observational analysis: This involves running experiments to determine whether or not the new feature or improvement is effective and whether it is generating

Fig. 1.4 Utilization of data science and machine learning on several metrics and business impact measurement scenarios

14

F. Lazzeri and A. Robsky

the desired results. Data science and machine learning can be used to analyze the data generated by these experiments and to make informed decisions about the next steps. • Deploying solutions to production environments and setting up the right monitoring processes. This involves ensuring that the new feature or improvement is stable, secure, and scalable. Data science and machine learning can be used to monitor user behavior and track performance metrics to ensure that the new feature or improvement meets business goals and delivers value to users. In many of the steps, there is a likelihood for failure and pivoting to a new or changed effort. And in many of those efforts, the effort is what matters. In many cases, it is very hard or almost impossible to attribute DS/ML projects to high-level goals. Employees might feel discouraged working on projects with no clear measurable impact on metrics, but efforts are accumulating. Google’s CEO, Sundar Pichai (Luna 2022), and other leaders are now pivoting to rewarding efforts and not just outcomes for that particular reason. Let’s look at each of these phases in more detail: Identification of opportunities through data mining and user research: Data mining involves exploring and analyzing large blocks of information to glean meaningful patterns and trends. It can be used in a variety of ways, such as database marketing, credit risk management, fraud detection, spam email filtering, or even discerning the sentiment or opinion of users. The data mining process breaks down into five steps. First, organizations collect data and load it into their data warehouses. Next, they store and manage the data, either on in-house servers or the cloud. Business analysts, management teams, and information technology professionals access the data and determine how they want to organize it. Then, the application software sorts the data based on the user’s results, and finally, the end user presents the data in an easy-to- share format, such as a graph or table. Definition of success metrics: A business success metric is a quantifiable measurement that business leaders track to see if their strategies are working effectively. Success metrics are also known as key performance indicators (KPIs). There is no one-size-fits-all success metric; most teams use several different metrics to determine success. When the right metrics are properly tracked, leaders can use these metrics as a benchmark for how well the business is performing. It’s important to set the metrics before initiatives start to see progress from beginning to end.

1 Understanding Business Goals

15

Each team in your business is there to achieve a different goal, so it only makes sense for different teams to have different success metrics. Here are a few examples of success metrics by company function:

1.3.1 General Business Metrics • Gross profit margin: Gross profit margin is measured by subtracting the cost of goods sold from the company’s net sales. • Return on investment (ROI): The ratio between the income and investment. ROI is commonly used to decide whether or not an initiative is worth investing time or money into. When used as a business metric, it often tracks how well an investment is performing. • Productivity: This is the measurement of how efficiently your company is producing goods or services. You can calculate this by dividing the total output by the total input. • Total number of customers: A simple but effective metric to track. The more paid customers, the more money earned for the business. • Recurring revenue: Commonly used by SaaS companies, this is the amount of revenue generated by all of your current active subscribers during a specific period. It’s commonly measured either monthly or annually.

1.3.2 Marketing Metrics • Daily web traffic users: This is the number of users who visit your website daily. • New web traffic users: This is the number of users who visit your website and who have never visited your website before. More broadly, this can be a set of metrics from each user interface funnel such as landing page, signup page, account page, and payment page. • Email open rates: This metric is particularly important for email marketing teams. Email open rates measure the percentage of your audience who has opened your marketing email. • Number of leads generated: Particularly good for the marketing teams that work cross-functionally with sales, this metric measures the number of qualified leads that the marketing team generates and passes over to the sales team. Note that the definition of a qualified lead can vary depending on your team’s goals.

16

F. Lazzeri and A. Robsky

1.3.3 Customer Success Metrics • Net promoter score (NPS): This metric is one of the most common measurements of customer loyalty and satisfaction and is sometimes referred to as a customer satisfaction score. It’s a numerical value in response to the question, “How likely is it that you would recommend [your product or service]?” You can calculate NPS by subtracting the percentage of individuals who voted between 0 and 6 from the percentage of individuals who voted 9–10. • Customer retention rate: This metric measures how many of your customers remain customers over a set period of time. It’s up to your team to determine what timeframe makes sense for your business and industry. • Customer churn rate: This is the opposite of the retention rate. Customer churn rate measures how often your customers stop doing business with your company. It’s up to your team to determine what period of time makes the most sense for your business and industry. • Customer feedback: While not a quantitative measure, anecdotal customer feedback can be extremely valuable to your company and can be used for testimonials and marketing strategy. Your customer experience is something that your team can curate, and the better experience they have, the longer they stay as customers. In more advanced and high-volume scenarios, this can be converted to a quantitative measure by extracting the sentiment score (−1 is bad, +1 is good) and trending it over time to capture customers’ feelings. Also, similar to NPS, in many cases, a customer received a survey around satisfaction from the service or product. Usually, it’s a rank between 1 and 5 and aggregated to an average. This is called customer satisfaction (CSAT) score. • Average customer lifetime: This is the average length in which a customer stays your customer. This metric is used to calculate customer lifetime value. • Customer lifetime value or lifetime value (CLTV or LTV): This is the amount of profit a company expects to earn from a specific customer over the average lifetime of a customer relationship.

1.3.4 Sales Metrics • Qualified leads: A qualified lead is an individual who exhibits all of the characteristics that your team identifies as the ideal individual to sell to. This could include demographic, role, company size, or any other important qualities.

1 Understanding Business Goals

17

• Lead to customer conversion rates: This is a good metric to identify because it can give both your sales and marketing teams some insight to the audience you’re targeting. If the conversion rate is high, you’re targeting the right audience, and your team is focusing on the right priorities. Low conversion rates indicate that potential customers are leaving somewhere in the pipeline. • Customer acquisition cost (CAC): This is how much your team spends on both marketing and sales strategies to convert a lead into a customer. Ideally, you want this number to be as close to zero as possible. • Total new customers: Tracking this metric can give you an indicator of how quickly your customer base is growing.

1.3.5 Developer Metrics • Product uptime: This metric measures the time that your software is working over a given period of time. • Bug response time: This is how quickly your team takes to identify a bug, find a patch solution, and push the fix into production. Issues can range from quick, 5-min fixes to full-fledged projects. • Daily active users: This is the number of users who use your software daily. This can help you understand how many of your customers actually use and value your software. If there is a large gap between the number of customers and the number of daily active users, then your customers may not be finding value in your product. • Cycle time: The time taken for a specific project to go from the very beginning to implementing the strategy into production. This is good to measure because it can help project managers get a sense of how long certain projects will take. • Throughput: The measure of total work output a specific team develops. This includes anything that is ready for quality assurance (QA) and pushed into production.

1.3.6 Human Resource Metrics • Employee satisfaction: Similar to a net promoter score and customer satisfaction score, an employee satisfaction score indicates how likely your employees would recommend your company as an employer to a friend or colleague. This is an important metric for human resource (HR) teams because it can surface issues with company culture and policies that can be resolved.

18

F. Lazzeri and A. Robsky

• Employee retention rate: Similar to a customer retention rate, the employee retention rate measures how many of your employees stay with your company over a determined period of time. This is often measured annually. • Employee feedback: Anecdotal employee feedback is just as valuable as customer feedback, if not more so. Employee feedback gives your team the opportunity to offer suggestions to help your company become a better employer, and in turn, increase the employee retention rate. Hypothesis on what could help bring more value to users: A value hypothesis proposes an assumption on how a product is valuable to potential customers even though your “potential customers” could be an assumption as well. A value hypothesis is more on the market now. It is a hypothesis that contains the exact value you would give to potential clients. At this point, you have to think about how to offer something really valuable that solves a client’s problems and desires, and also they can use and buy. It is related to the traditional market study that you have to make at the beginning of the development of your idea. But in this case, you make suppositions based on studies and the reach of your business. These suppositions can give you the conversion percent you will have in a short and medium period of time, and the best part is that you can probe them making exercises and experiments with people. Building minimum viable product (MVP) or proof of concept (POC) using DS/ML/AI solutions: proof of concept, or proof of principle, is typically an internal project that helps you to verify that your theory has the potential for real-world applications. However, we’ve taken proofs of concept to market in a controlled test environment for clients. In testing your proof concept, you’ll be able to determine two important things: a) whether people need your product and b) whether you have the capabilities to build it. Minimum viable product, on the other hand, helps you learn how to build that product in the most sustainable way, with the least amount of effort (Ries 2011). Through marketplace testing, you’ll learn how people react to different iterations of your product and its potential features. The MVP process enables you to determine precisely what it is your customers want, so you can add only those features needed to make it marketable. Testing the outcome with experimentation when applicable or observational analysis: Running experiments have become a growing, popular trend and a necessity to develop high-quality features and products. Such experiments are key in helping you uncover usage patterns and giving you insight into how your users interact with your products. Therefore, experiments are a great way, particularly for product managers and product teams, to validate product quality and to ensure that a product aligns with business objectives.

1 Understanding Business Goals

19

To measure the outcome of your experiments, metrics can be used to help gauge how your customers are reacting to the new feature and whether it meets their expectations. This means that experiments help you build and optimize your products so you can make sure that you’re releasing products that can guarantee customer satisfaction. There is also a way to capture long- term outcomes as well by establishing a long-term holdback, which means that there is always going to be a portion of customers on a very old version of the product and will demonstrate the “what-if ” all the new features were not existing. Experiments are also a great way to learn and prioritize resources so that product teams can focus on the most impactful areas for further iteration. Experiments can come in different forms, and these include tests such as A/B testing and multiarmed bandits (Kohavi et al. 2020). The goal of running experiments is to improve your product for your customers. The results gathered should give you sufficient data to enable you to make informed decisions to optimize your products. To be able to obtain relevant data, you will need to have a specific goal or objective that will lead you to create a viable hypothesis that you can prove (or disprove). This is why having a roadmap, as mentioned previously, will be important to allow you to focus your tests so you can get the right data with statistically significant results. Also, remember that it’s not always possible to test everything. This means you will need to channel your testing energy into running experiments that are relevant to your goals and objectives. Additionally, some companies may not have a high volume of traffic or users to be able to test everything. This is especially true for feature experiments. A feature needs to receive enough traffic when running A/B tests on this feature in order to generate efficient results. In sum, good tests or experiments should be focused enough that they give you relevant results and data to improve your products to ultimately ensure customer satisfaction. Deploying solutions to production environments and setting up the right monitoring processes: A deployment pipeline typically follows three main steps (though you may also have more): build, test, and deploy. This is the pipeline that supports your ability to automate the deployment process and ensures that code moves from being committed to deployment quickly. • Build: A developer commits code to a software repository. Code changes should be integrated into environments that match the production environment. • Test: A deployment automation tool, such as Jenkins or Ansible, will see the new code and trigger a series of tests. Once a build has passed all of the

20

F. Lazzeri and A. Robsky

tests, it can be released to production. Without a deployment automation process, this step happens manually. • Deploy: In this stage, the application is deployed to production and available to users.

1.4 DevOps for Data Science Project and Metric Enhancement Central to DevOps is the idea of continuous delivery and shipping code as fast, small, frequent deployments. These smaller deployments make it easier to test and release code. By adopting DevOps, companies aim to maintain or increase their rate of deployment over time. Measuring deployment frequency correlates with continuous delivery and the comprehensive use of version control, meaning it offers an insight into the efficacy of DevOps practices within a company. Organizations can measure deployment frequency to compare their deployment speed over an extended period, helping to map a company’s velocity and growth. By identifying specific periods where code deployment is delayed, teams can determine if there are problems in the workflow that are causing delays. Delays in workflow could result from a process that includes unnecessary steps or by a team not using the right tools, for example. Ideally, by monitoring your metrics, you will see an increase in deployments, faster deployments, reduced failures after deployment, and quicker recovery from failures. DevOps is both a philosophy and a set of practices, including: • Automate everything you can. • Get feedback on new ideas fast. • Reduce manual handoffs in your workflow. In a typical data science project, we can see some applications: • Automate everything you can. Automate parts of your data processing, model training, and model testing that are repetitive and predictable. • Get feedback on new ideas fast. When your data, code, or software environment changes, test it immediately in a production-like environment (meaning, a machine with the dependencies and constraints you anticipate having in production).

1 Understanding Business Goals

21

• Reduce manual handoffs in your workflow. Find opportunities for data scientists to test their own models as much as possible. Don’t wait until a developer is available to see how the model will behave in a production-like environment (O’Brien 2020). In general (not only for data science and machine learning) DevOps is a process that is important for the following reasons: 1. Dramatically reduces the recovery time and chances of failure

The primary reason behind the failure of the teams is programming defects. With a limited development cycle, DevOps promotes regular code versions. This makes it quite simple to identify the defective codes. With this, the team can utilize their time to lower the chances of implementation failure using robust programming principles. Recovery time matters the most while working on a project.

2. Better cooperation and effective communication It has been observed that DevOps helps in better cultural development. In an organization, culture doesn’t focus on personal goals; it focuses on overall performance. When there is a maximum level of trust in a team, all the team members can experience and develop quite effectively. 3. Reduced development cycle and repaid innovation When you get some biased responses from operation and development teams, it will be quite confusing to tell whether a particular application is functional or not. When developers submit a request, this extends the cycle times. But with the help of DevOps, such issues can be prevented. DevOps ensures the maximum level of smoothness in the process. This is a major significance of DevOps. The team can work best to bring the products to the market faster. DevOps makes the entire process transparent, and this, in turn, motivates the staff to work toward a common goal. That’s why there is a need for DevOps in every organization that deals with cloud services (Dharmalingam 2019). If the DevOps benefits and processes are applied to data science projects and machine learning applications, it becomes machine learning operations (MLOps): MLOps synchronizes cadences between the application and model pipelines and enhances the lifecycle of a traditional machine learning model, including evaluation and retraining. MLOps is not just the routine deployment of machine learning models but also the continuous retraining, automated updating, synchronized development, and deployment of more complex machine learning models and metrics.

22

F. Lazzeri and A. Robsky

Let’s consider an example of a typical data science project that aims to do an opportunity sizing on potential engineering investments. There was already a significant amount of resource time spent on preparing the proposal, building a quick POC, and discussing hours of meetings. Now, a potential outcome of the data science work is that the opportunity is not big enough to justify the work. In this case, the outcome is actually sunk cost (Johnson 2016) and is not going to produce revenue even though it would save money in the future. The project and the employees who worked on it should be rewarded for the insights and learning acquired. In another example, let’s consider an machine learning (ML) model that predicts the satisfaction level of a customer. This model runs in products and scores each customer on their satisfaction level. The output is fed to the sales team, and their action on the recommendation brings up the aggregated satisfaction score. In this scenario, if we see an increase in satisfaction in aggregate level, is it because of the ML model? It’s hard to say. There are so many confounding factors that might affect satisfaction: personality of the customer, personality of the salesperson, experience of the customer with the product, whether the customer had his coffee this morning, etc. So we won’t be able to attribute all the impact to the model; however, the model created new insights that the sellers didn’t have and might have helped them to operate. In both of the examples, the efforts led to actions being taken on the insights or learnings. Also, in both cases, we see a lag effect of the impact. In other words, impact in many projects could be seen and realized months and sometimes years after the project is done and delivered. Leaders should not be discouraged from enabling those types of projects as long-term gain, and impact could be as important as short-term impact. Therefore, a good OKR to establish for DS, ML, AI, business analysts, and other teams is whether actions have been taken on their recommendations. If the recommendation is not actionable, then the stakeholders are not going to act, encouraging the science teams to adapt and be flexible on their solutions to make them actionable. By getting feedback and working in collaboration with the stakeholders, it is almost guaranteed that it will work. Of course, teams need to be incentivized to collaborate, and that’s a different story. There is going to be a feedback loop between the stakeholders, the action takers, and the science teams, and the builders, until they reach a good state of actionability. By creating the OKRs around the “number of actions the stakeholders took upon your recommendations,” we are creating a green field for collaboration, feedback, and outcome-based rewards that would facilitate learning and growth. See Fig. 1.5 for the illustration of the ML/DS building process.

1 Understanding Business Goals

23

1.5 Summary • Understanding your business goals by setting aside some time to define the goals clearly is a foundational step that should not be overlooked. Part of that exercise is to map the business to a growth state: existence, survival, success, take-off, and resource maturity. For each state, define the goals and metrics to track the goals. • Tying realistic and measurable outcomes to the goals is key to prioritize the right investments to achieve goals from step 1. Measurable outcomes could be a form of DAU or MAU in some cases and also paid usage; it depends on the stage in which the company is at and the goals to achieve. • Data science and machine learning nowadays is a key contributor to the success of a product, team, or company. Establishing the right OKRs that empowers teams to work on both long-term and short-term efforts would lead to great accomplishments. OKRs such as profits and seats can be tied to a measurable outcome and goal; however, some OKRs around the operation and overhead of the company are also important, such as productivity, development throughput, employee retention, and employee feedback.

Fig. 1.5 ML/DS project building cycle

24

F. Lazzeri and A. Robsky

References C. Johnson, HOW UNDERSTANDING SUNK COSTS CAN HELP YOUR EVERYDAY DECISION-MAKING PROCESSES, https://online.hbs.edu/blog/ post/how-understanding-sunk-costs-can-help-your-everyday-decision-making- processes HBS, 2016 Dharmalingam, What is the Significance of DevOps in Data Science?, https://www. whizlabs.com/blog/significance-of-devops-in-data-science/, 2019 E. Ries, The Lean Startup, Currency, 2011 J. Bussgang, You Found Your Product-Market Fit. Now What? https://hbr. org/2013/07/three-ways-to-scale-b2b-sales, HBR, 2013 J. Doerr, Measure what matters, Portfolio, 2018 J. Luna, Sundar Pichai: “Reward Effort, Not Outcomes” https://www.gsb.stanford. edu/insights/sundar-pichai-reward-effort-not-outcomes,Stanford GSB, 2022 Neil C. Churchill and Virginia L. Lewis, The Five Stages of Small Business Growth https://hbr.org/1983/05/the-five-stages-of-small-business-growth, HBR, 1983 O’Brien E., What data scientists need to know about DevOps, https://iterative.ai/ blog/devops-for-data-scientists/, 2020 R. Kohavi, D. Tang, and Y.Xu, Trustworthy Online Controlled Experiments, Cambridge University Press, 2020

2 Measuring What Is Relevant

In the previous chapter, we saw how important it is for a business to understand its goals, tie them to measurable outcomes, and establish the right monitoring, such as objectives and key result (OKR), on data science and machine learning efforts. In this chapter, we will go deeper into performance measurements and metrics and emphasize understanding second-level metrics.

2.1 Performance Metrics Performance can be measured in many ways: impact, revenue, accuracy, surveys, feedback, etc. It really depends on what we are trying to achieve and measure, as we discussed in the previous chapter. In this section, we explore three categories of metrics that are important to the development of machine learning (ML) at scale. Performance metrics, as shown in Fig. 2.1, refer to a set of quantifiable measurements used to quantify a company’s overall long-term performance. These metrics specifically help determine a company’s strategic, financial, and operational achievements, especially compared to those of other businesses within the same sector. Business metrics can be financial, including net profit (or the bottom line and gross profit margin), revenues minus certain expenses, or the current ratio (liquidity and cash availability). Business metrics can generally center on per-customer efficiency, customer satisfaction, and customer retention. Operational metrics aim to measure and monitor operational performance across the organization.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 F. Lazzeri, A. Robsky, Machine Learning Governance for Managers, https://doi.org/10.1007/978-3-031-31805-4_2

25

26

F. Lazzeri and A. Robsky

Fig. 2.1 Performance metrics categories

2.1.1 Model Metrics When designing and building ML models, there are model-specific metrics to capture its performance and success. In most supervised models, we set aside a known data set with true labels and try to predict the labels. This gives us an understanding on how good the model predicted on known data. In other scenarios (not supervised) where we don’t have true labels, we try to look at other metrics such as model conversion (in Bayesian models), variability of the predictions, and stabilization of loss function. We will not go deeper here in the various models and ways to measure them, but rather we will explore the differences between model performance metrics, business related metrics, and operation metrics. The top five most popular ML models are: • • • • •

Linear regression Logistic regression Support vector machine Decision trees Random forest

Those models might be solving regression problems, when we can have a continuous outcome (e.g., price of a house) or classification (e.g., picture has a cat vs. not). In each problem, the performance of the model is measured differently.

2 Measuring What Is Relevant

27

Linear Regression Linear regression is commonly used for regression problems, such as predicting the price of a house. Once we predict on a hold-out set for testing and comparing the distance from what was predicted and what was the actual can give a good visibility to how the model performs. For example, if the model predicted a house would cost $1 M, but the house’s price is actually $950 K. That might not be a bad prediction from a high level. However, there are many factors that go into evaluating a regression model like that; for example, the variance of the residuals (how much the difference between predicted and actuals varies from example to example). Root-mean-squared error (RMSE) is a good calculation of the average distance between what the models predicted and the actual predictions. Essentially, the linear regression model is trying to fit a linear relationship line (Fig. 2.2) between the predictors and the outcome. R^2 (R-square) or adjusted-R^2 can provide a single number from 0 to 1, depending on how good the fit was, where 1 is a perfect fit and 0 is a random model. Logistic Regression Without going into the definitions of the mode, logistic regression is similar to linear regression but tries to perform a binary classification using logit function (hence the name logistic regression). In Fig. 2.3, you could see that instead of having a linear line (Fig. 2.2), we get a curved logit line. A binary classification model can be described with the following example given a picture, does it contain a cat (positive label) or not (negative label).

Fig. 2.2 Linear regression line

28

F. Lazzeri and A. Robsky

Fig. 2.3 Logistic regression plot

You can think of this as whether the model was able to hit a target or not. So simply enough we can calculate how many times the model hit the target vs. attempts, and how many times it hit the target vs. missed. You can already see that there is a problem with this approach; it depends on the size of the target. If the target is big, then it’s easier to hit it; if it’s small, it’s much harder. The size of the target can represent the total examples of positive labels we have in our data. If we have 95% of pictures without a cat and 5% of pictures with a cat, we can build a simple model that will always say there is no cat and be 95% correct. However, we are also 0% correct in identifying the cats. A good example of a model performance metric for these classification scenarios is area under the curve (AUC) for binary classifiers. A binary classifier is a model that predicts one out of two outcomes: True/False, Yes/No, 1/0, etc. There could be several ways to measure the performance of this classifier. For instance, looking at the false positive rate (when the model said “True” on an instance, but the real label is “False”). Obviously, we want to minimize the false positive rate; however, if we have a model that predicts in all instances “False” (Table 2.1), we get a model with 0 false positives, since the model never predicted positives. On the other hand, the false negative rate (saying that it’s false when it’s actually true) is probably going to be very high. In Table 2.2, we can see a more balanced prediction where the prediction is both on true and false labels.

2 Measuring What Is Relevant

29

Table 2.1 Confusion metrics for a model that predicts only false labels

True label

TRUE FALSE

Predicted label TRUE 0 0

FALSE 55 1234

Table 2.2 Confusion metrics for a model that predicts only false labels

True label

TRUE FALSE

Predicted label TRUE 45 200

FALSE 10 1034

Therefore, there is a metric that balances false negatives, false positives, true negatives, and true positives into a single performance metric called AUC. If you were to draw a graph of the true positive rate and false positive rate based on each prediction threshold (at what probability threshold we say it’s a positive prediction and at what is a negative one), you would get a graph that is similar to the one in Fig. 2.4. The AUC is calculating the area under that curve and could tell you if the classification model is doing well. The higher AUC the better the model performance. Perfect performance is when AUC equals to 1. AUC equals to 0.5 means that the model is predicting random results regardless of the inputs. There are many considerations about which metric to use and how to balance it with the appropriate guardrail metric; for example, if we have a model that predicts fraud accounts and automatically blocks them. The cost of blocking a legit account (false positive) in most cases is higher than allowing a fraud account into the system (false negative). In this case, it is important to understand what cost or other guardrail metrics are needed to be established. In other models such as support vector machine, decision trees, and random forest, the performance metrics could be the same as we saw before. Essentially, you could consider a model as a black box and you are trying to evaluate whether the black box did the job right or not. In many machine learning operations (MLOps) scenarios, you might want to constantly evaluate different types of models against each other, so having a consistent performance framework across the models is key for selecting the best model and operationalizing it. Techniques such as automated machine learning (AutoML) are doing exactly that. They have a set of predefined models, and they apply them on a problem and see who got the best R^2 or AUC. Incorporating AutoML into an MLOps environment might be a good solution if handled with extra

30

F. Lazzeri and A. Robsky

Fig. 2.4 AUC calculation curve

caution since the risk of fitting a wrong model can be very costly to the business. Automated Machine Learning What if a developer or data scientist could access an automated service that identifies the best machine learning pipelines for their labelled data? Automated machine learning (AutoML) is a capability that does exactly that. It empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for different problems (from classification, to regression, time series forecasting, and many other scenarios). The approach combines ideas from collaborative filtering and Bayesian optimization. It’s essentially a recommender system for machine learning pipelines. Most importantly, AutoML is designed to not look at the customer’s data. Customer data and execution of the machine learning pipeline both live in the customer’s cloud subscription (or their local machine), which they have complete control of. Only the results of each pipeline run are sent back to the machine learning service, which then makes a probabilistic choice of which pipelines should be tried next. More specifically, it automates the selection, composition, and parameterization of machine learning models. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms. How does the AutoML process work? AutoML is typically a platform or open-source library that simplifies each step in the machine learning process, from handling a raw data set to

2 Measuring What Is Relevant

31

deploying a practical ML model. In traditional machine learning, models are developed by hand, and each step in the process must be handled separately. More specifically, here are some steps in the machine learning process that AutoML can automate, in the order they occur (He et al. 2021): • • • • • • • •

Raw data processing Feature engineering and feature selection Model selection Hyperparameter optimization and parameter optimization Deployment with consideration for business and technology constraints Evaluation metric selection Monitoring and problem checking Analysis of results

A complete AutoML system can make a dynamic combination of various techniques to form an easy-to-use end-to-end ML pipeline system (Fig. 2.5). As Fig. 2.5 shows, the AutoML pipeline consists of several processes: data preparation, feature engineering, model generation, and model evaluation. Model generation can be further divided into search space and optimization methods. The search space defines the design principles of ML models, which can be divided into two categories: traditional ML models and neural architectures. The optimization methods are classified into hyperparameter optimization and architecture optimization, in which the former indicates the training-related parameters and the latter indicates the model-related parameters (He et al. 2021). We use AutoML where we want to train and deploy a model based on the target metric we specify. This is used in various scenarios such as:

Fig. 2.5 An overview of AutoML pipeline

32

• • • •

F. Lazzeri and A. Robsky

Implement ML solutions without extensive programming knowledge Save time and resources Leverage data science best practices Provide agile problem-solving

Currently, there are a few cloud providers that offer AutoML tools. The steps to design and run AutoML are very consistent across multiple platforms: 1. Identify which algorithm best suits the underlying problem the data scientist needs to solve. 2. Choose the tool that data scientists will need to use for deploying a model (Python, Spark, or user-friendly tools on the cloud). 3. Specify the source and format of the training data. 4. Configure the compute targets for model training. 5. Configure AutoML parameters. It involves all the preprocessing, featurization, and number of iterations over different models. 6. Submit the trained model. 7. Review and analyze the score. With AutoML, the necessary steps for creating ML models for an extremely wide range of ML algorithms are performed automatically. Subsequently, the tool automatically determines which algorithm (ML model) most reliably recognizes relevant machine states or process criteria. Finally, the application expert is provided with the best-performing models for implementation. Support Vector Machine Support vector machine (SVM) is a machine learning algorithm that is used mainly for classification and also used for regression. The main idea behind SVM is to separate the training examples by a hyperplane. See illustration for two dimensions in Fig. 2.6. With multiple dimensions, meaning multiple variables or predictors, it could be challenging to find a linear hyperplane that can separate the data in a good way. In that case, there is an option to transform the data point, for example, performing logarithm on the data or applying power, and compute the hyperplane in a transformed space. In some cases, it helps with separating the data better. The performance metrics for this model and algorithm are the same as with regression and classification. In addition, there are model-specific metrics such as the distance between the points and the center of the plane; however,

2 Measuring What Is Relevant

33

Fig. 2.6 Two-dimensional separation hyperplane

when training the model, these come into effect as the model tries to maximize the distance. Decision Trees and Random Forest Decision trees and random forest are examples of nonlinear tree-based models. The process of the mode is to generate a decision tree that with each input you could follow the tree until you reach the leaves. In Fig. 2.7, you can see an example of a simple tree for two variables (X1 and X2). Random forest is an extension of decision trees in a way that the data is split into several decision trees (hence the “forest”) and evaluates the results as an ensemble of trees. In some cases, the evaluation is the top vote. For example, five trees resulted in “Yes” and three in “No” classification prediction, so the top vote is “Yes.” In other cases, a means of the results is taken or other ways to extract a final result from the forest.

2.1.2 Business Metrics Each business has its own metrics to measure its success as described in the previous chapter. As noted, it is very important to tie business goals to metrics. Below are a few metrics that a typical business should track; however, it might vary from business to business. • • • •

Revenue or profit Number of users Retention Satisfaction

34

F. Lazzeri and A. Robsky

Fig. 2.7 Decision tree example

Business metrics could be tied directly to the model metrics from the previous section, but not always. For instance, it is hard to see the improvement of AUC on revenue; further analysis and potentially experimentation can show the causal impact on it. However, in the fraud example given previously, it is clear that Fraud accounts might have a significant impact on profits if not managed properly. So, an increase in AUC for fraud detection would result in higher profits (lower the cost of fraud potentially). The same goes for the number of users and retention. Those business metrics are important, and tying them to ML models is the key to a successful ML operation environment. Another example is around improving the acquisition flow of new users. Doing this the right way could increase the retention of new users and increase the counts of users overall. You can think about different nurturing models such as an ML model that predicts whether a trial customer would convert to a paying customer (in a trial-paid business model). This model could also be a binary classifier that predicts 1/0 with some probability threshold on whether users are going to convert. After those, some nurturing effects might come in place, where those with a potential to convert could receive a special incentive to convert; otherwise, those with no propensity to convert get something to make them happy and convert. As you can see, this ML model has multiple applications, and having the best prediction is key to increasing retention. In this example, we would also want to optimize for higher positive recall, which means, out of those who converted, how many the model actually got right from the testing set? Lastly, the customer satisfaction score (CSAT) in many cases is the ONLY way to get a customer’s voice by getting survey results on the question, “How

2 Measuring What Is Relevant

35

are you satisfied with receiving the service?” It is true that survey response rate is rarely 100% and in many cases could be ~1%. This metric might be biased to those who answer the survey, and there are several ways to mitigate the bias. However, taking a look at the trend of survey responses and aggregated score (mean, median, 75th percentile, etc.), we can get a sense of how customers feel about the service, even when biased. Having survey results directly from customers could be positive labels and an ML (Robsky 2021) model could be trained on trying to extrapolate what the rest of the population thinks and proactively prevent customers from receiving bad service. This ML model can increase the satisfaction of our customers and potentially other business metrics such as revenue. Thus, ML model performance must be tied to the business metric, which must be derived from the strategy and goals.

2.1.3 ML Operational Metrics Once ML models are in the production environment and running on a cadence (hourly, daily, monthly), it is very important to establish operational metrics to make sure the health of the environment is good and whether Service Level Agreement (SLAs) of the models are going to be met in case models are going offline. Below is a list of top metrics to take into consideration when deploying the ML model: • • • • •

Reliability/quality of the data availability Compliance, data governance, data usage, and privacy System uptime Latency Retraining speed

Reliability/Quality of the Data Availability In recent years, data has become the highest currency of companies. Whoever has more data on users can monetize it (ads on a platform based on user actions), use it for predictions (recommend products to a customer based on past purchases), and gain insights (understand why people churn after 3 months). There are more scenarios that can be achieved with data, and the main point here is that if the data is not reliable, all the insights and actions could cause more harm than benefit. For instance, if a company analyzes root causes for churn on data that is not reliable and incorrect, the company could introduce a feature that according to the data can help with churn rate but actually causes the opposite.

36

F. Lazzeri and A. Robsky

Reliability can be defined in many ways in each scenario. Some good questions to ask regarding the data of the business: • Do we actually see all the users who are using the product or feature? This can lead to potential system or telemetry bugs. One way to test it is with a known user base that flows through the system and having unit tests that compare the known user base with the expected one. This can be also applied to features and any telemetry captured by the system. • Do numerical values make sense? In many cases, data is converted to integers and other formats. This can cause dates to be invalid or long usage numbers to be small. Having a map of all the calculations and transformations in the data pipelines is important for understanding data changes. In addition, applying some sanity checks on the common sense columns such as dates, usage, tenure, etc., and visualizing it on a dashboard with monitoring alerts can save a lot of debug time finding the issue. • Are there any gaps in the user journey? For example, if the product has a sign-up funnel, can we see each step and understand what’s happening or there are some gaps that might lead to a lack of information. Compliance, Data Governance, Data Usage, and Privacy With the recent trend of user data, there are many compliances and regulations established to make sure the data is secured and not violating user terms and country-specific regulation. A big change in recent years was the establishment of the General Data Protection Regulation (GDPR), which was established to protect the privacy of European Union (EU) users. However, since the investment from companies in complying with this regulation is big, global users benefited from change as well. Not meeting the compliance can cause severe fines and penalties. Therefore, it is very important to establish the right monitoring and metrics to capture it. One important metric to establish is data annotations and tracking. In other words, data is being created and moved around many times; there must be a process in which each column or table is annotated on its value. For example, a customer’s phone number is personal identifiable information (PII). This data should be stored in a secured location, and access should be carefully managed. However, in a scenario where the data is moved to another table and converted from string to numerical value, one could lose track of the data. It is important for the table creator to annotate whether the column is still PII or not (in the case of an encrypted phone number). Having a metric

2 Measuring What Is Relevant

37

that quantifies the annotations and missed annotations is key for mitigating any audit from EU or US regulations.

2.2 Causal vs. Correlated Metrics Understanding that some metrics can be “just” correlations and not necessarily cause an effect is important in every aspect of building a product, business, or feature. The definition of correlation is a statistical measure of the strength of a linear relationship between two variables. In layman’s terms, if one metric goes up while the other also goes up, they are correlated. A very simple example is sales of cold drinks on hot summer days: When it’s hotter outside, the sales of cold drinks increase as well. In this case, the sales and temperature are correlated. However, in this case, we cannot say that the increase in sales causes the temperature to increase; probably it’s the opposite. A common question in every development is to understand what makes customers come back (retain) or leave (churn). There are metrics or indicators that might look like causing effects, but in fact, they are only correlations; for example, if we observe a decline in the usage of the services that ended with churn. When building a model to predict churn, the decline in usage might be a top influencing input, but it does not cause churn; in fact, the decline in usage might be the ultimate churn and whatever happened before might be the cause of churn. This leads us to a distinguishing element of leading and lagging indicators. Leading indicators are metrics or events that lead to a certain event, such as churn for example. Lagging indicators are things that happen after the event. For example, an increase in customer support calls might be a lagging indicator for issues that happened on the product. As we discussed business goals and tied them to measurable outcomes, tying them to leading indicators to be able to act upon changes early enough is crucial. As mentioned before, leading indicators could be domain-specific metrics, such as customer support volume and usage declines, or outcomes from causal inference analysis that, with some level of certainty, demonstrates casual leading indicators for an outcome or goal. Causality can be measured and observed in many ways and techniques. The most common one is through a controlled A/B experiment. We are not going to go deeper into how to set up an A/B test, but the main idea is to compare a group of users who received a “treatment,” with a comparable group without that “treatment.” If the observed metrics, under certain conditions, are higher

38

F. Lazzeri and A. Robsky

or lower in the treatment group, then we can conclude that the treatment is causing the metric to change. In many scenarios, a controlled experiment is difficult to establish or might take too long to run in order to conclude causality. Other techniques under the umbrella of causal inference analysis take a historical view of the data and try to simulate a controlled experiment and observe the results. These techniques could be expensive as well. One might ask the question “Would correlation be enough to drive some decisions?” The answer lies within the acceptable risk tolerance of the business of pursuing correlative metrics vs. causing ones. For example, a company can build an ML model to detect which customers are having bad customer experiences with the support service. A few of the top features are very intuitive and talk about the duration of responses of the support agent to the customer and the number of emails sent to the customers. The model shows that the longer the customer waits for a response and the more emails are sent, the higher the likelihood of a bad experience. However, are those metrics correlative to the experience or causing the bad experience? Without proper causal analysis via a controlled experiment or causal inference analysis, it is hard to know. The nature of the support business might make it difficult to adopt experiments. The business might set goals to decrease the wait time of the customers and aim to scope the problem in less emails. Even though there is no proof that the metrics are causing or leading indicators, it is somehow important intuitively to reduce wait time and emails. Let’s explore a causal framework for managers who manage ML or DS engineers and need to understand causality vs. correlation. Below is a simplified framework to help understand the different questions a manager needs to ask when dealing with tasks regarding causality: Did we conclude causality? If yes, we are done. If not, we need to check if a controlled experiment can be run here. There might be constraints on privacy, data availability, experiment length might be too long, and lack of experimentation platform that can prevent the experiment from running. Can we run a controlled experiment? If yes, there are plenty of manuals and documentation on best practices for a controlled experiment (Kohavi et al. 2020). If not, one simple approach is to find two similar groups who differ only by the feature or treatment we are trying to learn if it’s casual. This is by using historical data “as-if ” an

2 Measuring What Is Relevant

39

experiment was running. Achieving this might be very hard as there might be multiple confounding variables to consider. For example, in social media, we need the groups to be similar in age, gender, location, maybe socioeconomic status, maybe political inclination, etc. Finding two groups that are similar across all of those variables might even be impossible and failing to take into consideration important variables might result in biased and misleading results. Can we separate the data into two similar groups? If yes, we can run the appropriate statistical tests such as student t-test on the difference between the metrics we want to observe and conclude whether the results are statistically significant. If not, without going deep into the technical details, there are multiple great packages that help in establishing a causal inference framework such as DoWhy by Microsoft.

2.3 Summary There are many different metrics to track on different levels of the business stack: • Model performance metrics, which can be the determining factor on whether the models are performing to a satisfactory level. These metrics help businesses understand whether there is more room for improvement for the models and whether the results can be trusted. • Business metrics that help understand the mechanics of the business such as cash flow and users or customers. These metrics are an integral part of any business, whether technical or not, to achieve sustainability or growth of the business. • ML operational metrics that help with day-to-day operations of ML models such as data availability, compliance, system uptime, latency, and more. As with the model performance metrics, these are crucial to having trustworthy results and can be acted upon. Some metrics can be correlative to each other and can have causality effects as well, the framework outlined in this chapter helps build the right skills and techniques for driving toward causation, which helps to make sure we are measuring the right things and acting upon reliable results.

40

F. Lazzeri and A. Robsky

References He Xin, Zhao Kaiyong, Chu Xiaowen, AutoML: A Survey of the State-of-the-Art, Knowledge-Based Systems, Volume 212, 5 January 2021, 106622 ML and customer support (Part 1): Using Machine Learning to enable world-class customer support, by Alexei Robsky 2021 R. Kohavi, D. Tang, and Y.Xu, Trustworthy Online Controlled Experiments, Cambridge University Press, 2020 GDPR https://gdpr-info.eu/ https://www.wired.co.uk/article/amazon-gdpr-fine https://github.com/py-why/dowhy

3 Searching for the Right Algorithms

In the last few decades, data from different sources have become more accessible and consumable, and companies have started looking for ways to use machine learning (ML) techniques to optimize business metrics, pursue new opportunities, and grow revenues (Lazzeri 2019). Not only has data become more available, but there has also been an explosion of machine learning and artificial intelligence applications that enable companies to build sophisticated, intelligent, and data-driven solutions. Machine learning, a term that encompasses a range of algorithmic approaches from statistical methods such as regressions to neural networks, has rapidly advanced to the forefront of analytics (Tambe 2012). Machine learning is a powerful tool that can help companies make data- driven decisions and gain insights that can drive business value. It involves the use of algorithms that learn from data, identify patterns, and make predictions based on that learning. Essentially, machine learning enables computers to learn from data and improve their performance over time without being explicitly programmed. This means that companies can use machine learning to automate analytical model building and generate predictive results that can inform business decisions (Lazzeri 2019). By adopting machine learning, companies can improve processes, optimize resources, reduce costs, and create new business opportunities. For example, machine learning can be used to analyze customer behavior, predict demand, optimize production processes, and detect fraud. However, it is important for companies to understand the limitations and challenges of machine learning, such as bias and the need for high-quality data. Companies must also ensure that they have the right infrastructure, tools, and expertise to effectively adopt and leverage machine © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 F. Lazzeri, A. Robsky, Machine Learning Governance for Managers, https://doi.org/10.1007/978-3-031-31805-4_3

41

42

F. Lazzeri and A. Robsky

learning for their business. It is a branch of artificial intelligence based on the concept that systems can learn from data, identify patterns, and make decisions with minimal human intervention (Tambe 2012). But what does it really mean to adopt machine learning for data-driven decisions and how can companies take advantage of it to improve processes and add value to their business? In this chapter, you will gain an understanding of what machine learning entails and the benefits it can bring to an organization. Being a machine learning organization means more than just incorporating machine learning algorithms into your workflow; it requires embedding machine learning teams within your company to fully engage with the business units and adapt the operational support of the company to facilitate machine learning (techniques, processes, infrastructures, culture) (Lazzeri 2019). This involves defining clear goals for machine learning initiatives, building appropriate teams and infrastructure, fostering a data-driven culture, and leveraging emerging technologies and practices to drive innovation and growth. Companies that successfully adopt machine learning can achieve significant improvements in efficiency, accuracy, and customer experience and gain a competitive advantage in their respective industries (Russel 2010). In the next few paragraphs, you will learn the following four dimensions that companies can leverage to become machine learning driven (Fig. 3.1): 1. Understanding algorithms and the business questions that algorithms can answer 2. Defining business metrics and business impact 3. Establishing machine learning performance metrics 4. Architecting the end-to-end machine learning solution Machine learning models have become increasingly prevalent in virtually all fields of business and research in the past decade. With all the research that has been done on the training and evaluation of machine learning models, the difficulty for most companies and practitioners now is not to find new algorithms and optimizations in training, but rather how to actually deploy models to production in order to deliver tangible business value (Bijamov et al. 2011). Therefore, the biggest hurdle that companies face today is deploying ML models in production, ensuring their accuracy, and maintaining them effectively. Most companies are still in the very early stages of incorporating machine learning into their business processes (Lushan et al. 2019). The majority of companies are experimenting with machine learning models, but they face challenges in scaling up their implementation and building a robust

3 Searching for the Right Algorithms

43

Fig. 3.1 Machine learning adoption process by companies

production pipeline. Therefore, in this context, it is important for companies to learn the challenges of deploying ML models to production and to develop strategies to overcome those challenges to gain a competitive advantage. The field of software engineering for machine learning systems is still a young and immature knowledge area. One way to see that in the industry is the continuous change in machine learning and data science job titles in established tech companies: data scientist, product analyst, data analyst, quantitative researcher, machine learning engineer, software engineer, and more. Traditional software systems are largely deterministic, computing-driven systems whose behavior is purely code dependent. Machine learning models have an additional data dependency, in the sense that their behavior is learned from data, and they have even been characterized as nondeterministic (Dhankhad et al. 2018). The additional data dependency is one of the factors contributing to the fact that machine learning systems require a great amount of supporting infrastructure (Herath et al. 2018). Additionally, the nondeterministic behavior of machine learning models poses significant challenges in designing and implementing software systems that rely on them. To successfully deploy machine learning models, companies need to adopt software engineering best practices that take into account the data-driven nature of these systems and the supporting infrastructure required to support them.

44

F. Lazzeri and A. Robsky

The concept of DevOps (development and operations) for machine learning, named machine learning and operations (MLOps), is a subset of machine learning and an extension of DevOps (Mishu and Rafiuddin 2017), focusing on adopting DevOps practices when developing and operating machine learning systems (Robles-Durazno et al. 2018). MLOps teams are responsible for managing the entire life cycle of machine learning models, from development to production deployment, and beyond. Moh et al. (2016) define that “MLOps is a cross-functional, collaborative, continuous process that focuses on operationalizing data science by managing statistical, data science, and machine learning models as reusable, highly available software artifacts, via a repeatable deployment process.” This means that MLOps teams must work closely with data scientists and software engineers to ensure that machine learning models are built with production deployment in mind and can be maintained and updated over time. By adopting MLOps practices, companies can ensure that their machine learning systems are built and deployed in a repeatable, scalable, and reliable way, leading to more efficient and effective use of these technologies to achieve business objectives. In recent years, many innovations have been created using machine learning and MLOps systems: autonomous vehicles, data mining, biometrics, and healthcare, among other solutions. As a result, the demand for intelligent systems had a relevant growth in the market and in the scientific field (Lushan et al. 2019). These systems use algorithms for pattern detection that involve various other disciplines. According to Moh et al. (2016), machine learning occurs when the computer learns by improving the performance of a class of tasks, which are measured statistically. The process of learning a computer program involves three steps: 1. The first step is about defining the labels that the computer will learn from. This is typically done in “supervised” learning, where the computer is trained on a dataset with known labels. For example, in image recognition, the labels might be different types of objects that the computer needs to learn to identify. In other cases, unsupervised learning, where the data does not have known labels; this step may be skipped. 2. The second step is the measurement definition that will be performed to identify whether there was an improvement in the performance of the model. This could be something like accuracy, precision, recall, or F1 score, depending on the problem being solved. 3. The third step is about training, evaluating, and deploying the model. Training involves feeding the algorithm a dataset and allowing it to learn

3 Searching for the Right Algorithms

45

from that data. Evaluation involves testing the model on a separate dataset to see how well it performs. Finally, deploying the model involves integrating it into a larger system so that it can be used to make predictions or decisions in real-world situations. Thus, MLOps systems play a critical role in these three steps by providing a way to manage and automate the development and deployment of machine learning models (Ajgaonkar 2021). The implementation of machine learning involves a set of tasks, performance measurement of these tasks, and a set of training to obtain experience in performing these tasks (Herath et al. 2018). The software engineering used in the development of information systems has brought many benefits to companies (Robles-Durazno et al. 2018). However, only a few areas of software engineering have benefited from using machine learning approaches in their tools and processes (Robles-Durazno et al. 2018).

3.1 Understanding Algorithms and the Business Questions Algorithms Can Answer Today, with the rise of big data, Internet of things (IoT), and ubiquitous computing, machine learning has become essential for solving problems across numerous areas (Brynjolfsson et al. 2011), such as: • Computational finance, e.g., machine learning models are able to identify patterns in financial data that can help banks and other financial institutions make more accurate predictions about creditworthiness and market trends. • Computer vision, e.g., machine learning models can recognize and classify objects in images and video streams, making facial recognition and object detection possible. • Computational biology, e.g., machine learning is used to sequence DNA and discover new drugs. • Automotive, aerospace, and manufacturing, e.g., machine learning models can be used for predictive maintenance, detecting when a machine is likely to fail so that it can be repaired before it causes downtime. • Natural language processing (NLP), e.g., machine learning models are used for speech recognition and topic modeling, allowing computers to understand human language and generate insights from unstructured text data.

46

F. Lazzeri and A. Robsky

Machine learning is a field of computer science that was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks; computer science researchers interested in machine learning wanted to see if computers could learn from data and generate predictions based on that data. The iterative aspect of machine learning is important because as models are exposed to new data, they can predict (Davenport and Patil 2012). This ability to learn from previous computations and improve over time makes machine learning an incredibly powerful tool for solving complex problems. As the volume of data continues to grow and the number of applications of machine learning increases, it is clear that this field will play an increasingly important role in the future of computing and problem-solving. While many machine learning algorithms have been around for a long time, the ability to automatically apply mathematical functions to big data is a recent development (Brynjolfsson et al. 2011). Data scientists usually divide the learning and automatic logic of a machine learning algorithm into three main parts, as listed below: • A decision process: In general, machine learning algorithms are used to make a prediction or classification. Based on some input data, which can be labeled or unlabeled, your algorithm will produce an estimate about a pattern in the data. • An error function: An error function evaluates the prediction of the model. In some cases it is called loss function. If there are known examples, an error function can make a comparison to assess the accuracy of the model. • A model optimization process: If the model can fit better to the data points in the training set, then weights are adjusted to reduce the discrepancy between the known example and the model estimate. The algorithm will repeat this “evaluate and optimize” process, updating weights autonomously until a threshold of accuracy has been met (Brynjolfsson et al. 2011). From a data point of view, the machine learning process is usually structured in a few standard phases: 1. The machine learning algorithms are fitted on a training dataset to create a model. In this phase, the algorithm analyzes the training data, finds patterns, and builds a model that can be used to make predictions. 2. As a new testing input dataset is introduced to the trained machine learning algorithm, it uses the developed model to make a prediction.

3 Searching for the Right Algorithms

47

3. The prediction is then checked for accuracy. In this phase, the predictions are compared to the actual results to determine the accuracy of the model. 4. Based on its accuracy, the machine learning algorithm is either deployed or trained repeatedly with an augmented training dataset until the desired accuracy is achieved. The machine learning process can be an iterative process that requires continuous improvement and refinement of the model to ensure its accuracy and reliability. Based on these phases of learning, machine learning is broadly categorized into four main types. Machine learning models fall into three primary categories (Tambe 2012): • Supervised machine learning: Supervised learning is a subfield of machine learning approach that uses labeled datasets to train algorithms to classify data or predict outcomes accurately. As input data is fed into the model, the model adjusts its weights until it has been fitted appropriately (Tambe 2012). This process occurs as part of the cross-validation process to ensure that the model avoids overfitting or underfitting. Overfitting occurs when the model is too complex, and it fits the training data too closely, making it less accurate when predicting new data. Underfitting occurs when the model is too simple, and it doesn’t fit the training data accurately, resulting in poor accuracy when predicting new data. Supervised learning helps companies solve a variety of real-world problems at scale, such as classifying spam in a separate folder from your inbox, identifying fraudulent transactions, and predicting customer churn. Supervised learning algorithms include neural networks, which mimic the workings of the human brain; naïve Bayes, which uses probability theory to predict outcomes; linear regression, which predicts a numerical value; logistic regression, which predicts a binary outcome; random forest, which combines multiple decision trees; and support vector machine (SVM), which separates data into different classes by finding the optimal boundary between them. As mentioned before, supervised learning is a machine learning technique where the algorithm learns from labeled data to predict future outcomes or classify new data into specific categories. The primary objective of the supervised learning technique is to map the input variable with the output variable. Supervised machine learning is further categorized into two broad categories: • Classification: These refer to algorithms that address classification problems where the output variable is categorical (e.g., whether an email is spam or not). Some known classification algorithms include the random

48

F. Lazzeri and A. Robsky

forest algorithm, decision tree algorithm, logistic regression algorithm, and support vector machine algorithm (Brynjolfsson et al. 2011). These algorithms are trained on labeled data and use statistical methods to classify new, unlabeled data based on previously learned patterns. • Regression: Regression algorithms handle regression problems where input and output variables have a linear relationship. These are known to predict continuous output variables (e.g., predict house price). Popular regression algorithms include the simple linear regression algorithm, multivariate regression algorithm, decision tree algorithm, and lasso regression. Supervised learning algorithms can be prone to overfitting, where the model performs well on the training data but poorly on new, unseen data, or underfitting, where the model is too simple and does not capture the complexity of the underlying data. Therefore, cross-validation techniques are used to evaluate and fine-tune the models to prevent overfitting or underfitting. • Unsupervised machine learning: Unsupervised learning is a powerful technique that is used to extract meaningful insights and patterns from unlabeled data. Unlike supervised learning, unsupervised learning does not require labeled datasets to train the model. Instead, it relies on algorithms that can identify hidden patterns and structure within the data. One of the primary goals of unsupervised learning is to uncover the underlying structure of the data, which can help businesses and organizations gain a better understanding of the relationships and dependencies that exist between different data points. This can be particularly useful in exploratory data analysis, where the goal is to identify patterns and relationships that were previously unknown. Unsupervised learning can also be used for cross- selling strategies and customer segmentation, which can help businesses optimize their marketing and sales efforts. Another important application of unsupervised learning is in image and pattern recognition, where it can be used to identify and categorize objects based on their visual characteristics. It can also be used to reduce the number of features in a model through dimensionality reduction, which can help improve the accuracy and efficiency of machine learning algorithms. Unsupervised machine learning is further classified into two types: • Clustering: Clustering is an unsupervised machine learning technique that is commonly used to identify patterns within datasets. It involves dividing the data sets into a specific number of clusters in such a manner that the data points belonging to a cluster have similar characteristics. In other

3 Searching for the Right Algorithms

49

words, clusters are groups of data points such that the distance between the data points within the clusters is minimal. Clustering is useful in identifying patterns and trends within large datasets that may be difficult to identify through manual analysis. One of the most popular clustering algorithms is the K-means clustering algorithm. This algorithm begins by randomly selecting K number of cluster centroids from the dataset. Each data point is then assigned to the nearest centroid, and the centroids are recalculated as the mean of all the points assigned to each cluster. This process is repeated until the centroids no longer move or until a maximum number of iterations is reached. Another commonly used clustering algorithm is the mean- shift algorithm, which is used for clustering and feature reduction. This algorithm starts with a set of data points and moves each point to a higher- density region until convergence. The final density region represents the cluster. Principal component analysis (PCA) is a clustering algorithm that is often used for dimensionality reduction. PCA is used to identify the most important features in a dataset and to reduce the number of dimensions while preserving the most important information. Independent component analysis (ICA) is another clustering algorithm that is used for feature extraction. ICA separates a multivariate signal into independent, non-Gaussian components, which can be used to identify unique features within the data. Overall, clustering is a powerful technique for identifying patterns within complex datasets, and these algorithms can be adapted to a wide range of applications in various industries. • Association: Association rule learning is a type of unsupervised machine learning algorithm used for identifying patterns in a large dataset. The goal of association rule learning is to discover this kind of relationship and identify the rules of their association. These rules can then be used to identify patterns in the data, make predictions, and provide recommendations to users. The association rule learning technique is widely used in many applications, including web usage mining, market basket analysis, and customer behavior analysis. Popular algorithms obeying association rules include the Eclat algorithm, which is one of the popular algorithms used for association rule learning. This algorithm uses a depth-first search technique to discover frequent item sets. It finds the support count of all possible item sets and then prunes the ones that do not meet the minimum support threshold. The algorithm repeats this process until all frequent item sets are discovered. Eclat is particularly useful when the dataset has a high number of transactions and a low number of items. The frequent pattern (FP)-growth algorithm is another popular algorithm used for association rule learning. It is a faster

50

F. Lazzeri and A. Robsky

and more memory-efficient algorithm than Eclat. It generates a compact data structure called a frequent pattern tree (FP-tree), which summarizes the input data and eliminates the need for a database scan. The algorithm recursively builds an FP-tree by finding frequent item sets and then projecting them onto the tree structure. The algorithm then extracts the frequent item sets from the FP-tree. FP-growth is particularly useful when the dataset has a low number of transactions and a high number of items. Overall, the association rule learning algorithm is a powerful tool for identifying patterns and relationships in a large dataset. The output of the algorithm can be used to make predictions and recommendations, which can be used to improve the performance of a wide range of applications. • Reinforcement learning (RL): RL is a type of machine learning approach that focuses on the concept of learning through a series of trial-and-error interactions with an environment. The RL problem involves a model ingesting an unknown set of data to achieve a goal or to produce an accurate prediction. In RL, an agent interacts with an environment by taking actions and receiving feedback in the form of rewards or punishments. The objective of the agent is to learn to take actions that maximize its cumulative rewards over time. The RL algorithm learns by exploring the environment and taking actions based on the current state. The agent receives feedback in the form of rewards or punishments, which help it learn the optimal actions for each state. The agent updates its behavior through a process known as policy improvement, which involves updating the policy to increase the expected reward. RL is a popular approach for problems that involve decision-making, such as game-playing, robotics, and autonomous vehicles (Bahrammirzaee 2010). The RL framework can be applied to a wide range of problems, from simple games such as tic-tac-toe to complex tasks such as playing the game of Go or driving a car in traffic. There are several key components of RL, including the agent, the environment, the state, the action, the reward, and the policy. The agent is the learner that interacts with the environment. The environment is the context in which the agent operates. The state is a representation of the environment that the agent observes. The action is the decision that the agent takes based on the current state. The reward is the feedback that the agent receives for taking an action. The policy is the strategy that the agent uses to select actions based on the current state. The formal framework for reinforcement learning borrows from the problem of optimal control of Markov decision processes. Popular RL algorithms include Q-learning, state–action– reward–state–action (SARSA), and deep reinforcement learning using neural networks. These algorithms are used in a wide range of applications,

3 Searching for the Right Algorithms

51

including game playing, robotics, and finance. RL is a promising area of research, and it is expected to have a significant impact on the development of intelligent systems in the future. Reinforcement learning is applied across different fields such as game theory, information theory, and multi-agent systems (Brynjolfsson et al. 2011), and it is divided into two types of methods: • Positive reinforcement learning: This refers to adding a reinforcing weight after a specific behavior of the agent, which makes it more likely that the behavior may occur again in the future. Positive reinforcement is the most common type of reinforcement used in reinforcement learning, as it helps models maximize the performance on a given task. • Negative reinforcement learning: Negative reinforcement learning is a particular type of reinforcement learning algorithm that helps the learning process avoid a negative outcome, it learns from it (through historical data) and improves its future predictions. Industry verticals handling large amounts of data have realized the significance and value of machine learning technology (Tambe 2012). As machine learning derives insights from data in real time, companies using it can work efficiently and gain an edge over their competitors. In the next paragraphs, the most common machine learning algorithms are listed and explained: • Linear regression: Linear regression is a popular statistical technique used in various fields, such as finance, economics, social sciences, and engineering, to model the relationship between two variables. The technique assumes that there is a linear relationship between the dependent and independent variables. The dependent variable is the output variable, and the independent variable is the input variable (Brynjolfsson et al. 2011). The linear regression model fits a straight line to the data points that best represents the relationship between the two variables. The line is expressed as a linear equation, y = mx + b, where y is the dependent variable, x is the independent variable, m is the slope of the line, and b is the intercept. The slope represents the change in y for a unit change in x, while the intercept represents the point where the line crosses the y-axis. The linear regression model is used to make predictions by plugging in values for the independent variable into the linear equation to obtain an estimated value for the dependent variable. The accuracy of the model is assessed using various metrics, such as the R-squared value, which measures how well the model fits the data, and the mean squared error, which measures the average of the squared differences between the predicted values and the actual values. Linear regression can be further classified into simple linear regression and

52

F. Lazzeri and A. Robsky

multiple linear regression, depending on the number of independent variables used in the model. • Logistic regression: This type of machine learning algorithm is often used for classification and predictive analytics. It is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome, which is a binary variable. In other words, logistic regression models the relationship between a binary dependent variable and one or more independent variables by estimating the probability that a future event will happen. Since the result is a probability, the dependent variable is bounded between 0 and 1, representing the likelihood of the event occurring. Logistic regression can be used for both binary and multi-class classification problems. For binary classification problems, the output is either 0 or 1, representing the absence or presence of a certain characteristic or event, respectively. In multi-class classification, logistic regression can be used to classify data into multiple classes, each with its own probability estimate. Logistic regression is often used in predictive analytics and is a popular algorithm for estimating probabilities in risk management, healthcare, and finance, among others. The algorithm can also be used to evaluate the importance of different independent variables in predicting the outcome. For example, logistic regression can be used to identify the key factors that influence customer churn, predicting whether a customer will leave a company or not. • Decision tree: Decision tree is a popular machine learning algorithm that builds models in the form of a tree structure. This algorithm is useful in both regression and classification problems. The learning process divides the original dataset into smaller and smaller subsets, and the result is a tree with decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the tested variable. The leaf node represents a decision on the numerical target. The topmost decision node in a tree corresponds to the best predictor, and it is called the root node. The decision tree algorithm can be used for both binary and multi-class classification problems, and for both continuous and categorical data. The advantage of decision trees is that they are easy to interpret and can handle missing data without affecting the accuracy of the results. Decision trees can also be used for feature selection, where the most important features are selected to build the model. However, one limitation of decision trees is that they are prone to overfitting, which occurs when the model is too complex and fits the training data too closely, resulting in poor performance on new data.

3 Searching for the Right Algorithms

53

• Support vector machines (SVMs): SVMs are a popular algorithm for solving classification problems. SVM creates the best decision boundary to segregate n-dimensional space into classes so that we can easily input the new data in the correct category in the future. This best decision boundary is called a hyperplane. Support vector machine chooses the extreme points/ vectors that help in creating the hyperplane. These extreme cases are called support vectors, and hence the algorithm is termed as support vector machine (Brynjolfsson et al. 2011). SVMs are particularly useful when the data is not linearly separable, as they can transform the input data into a higher-dimensional space where the classes can be separated by a hyperplane. SVMs have been successfully used in a variety of applications such as image classification, text classification, and bioinformatics. SVMs can also be extended to solve regression problems using the support vector regression (SVR) algorithm, which finds a hyperplane that best fits the data. • K-nearest neighbors (KNNs): The K-nearest neighbors algorithm is used for both classification and regression problems. It stores all the known use cases and classifies new data points by segregating them into different classes. This classification is accomplished based on the similarity score of the recent use cases to the available ones. The KNN is a supervised machine learning algorithm, wherein “K” refers to the number of neighboring points we consider while classifying and segregating the known n groups (Tambe 2012). The KNN algorithm is widely used in various fields such as image recognition, natural language processing, and recommendation systems. It is also used in data analysis to identify patterns and clusters in data. The algorithm has several advantages, such as its simplicity, efficiency, and robustness to noisy data. However, it also has some limitations, such as the need for high memory usage, sensitivity to irrelevant features, and difficulty in handling high-dimensional data. • K-means: K-means is a widely used unsupervised machine learning algorithm that aims to cluster a dataset into a certain number of groups or clusters, where each data point belongs to the cluster whose centroid is closest to it. K-means is a clustering algorithm that computes the centroids and iterates until it finds an optimal centroid. It assumes that the number of clusters is already known. In this algorithm, the data points are assigned to a cluster in such a manner that the sum of the squared distance between the data points and the centroid would be minimum (Brynjolfsson et al. 2011). It is to be understood that less variation within the clusters will lead to more similar data points within the same cluster. The number of clusters, k, is a hyperparameter that the user must specify beforehand. The algorithm assumes that the number of clusters is already known and tries

54

F. Lazzeri and A. Robsky

to find the best way to group the data points into those clusters. One way to determine the optimal value of k is by using the elbow method, which plots the number of clusters against the sum of squared errors (SSEs) and selects the point where the SSE starts to flatten out, indicating that adding more clusters will not significantly improve the performance of the algorithm. K-means is a simple yet powerful algorithm that is widely used in various fields, such as image segmentation, customer segmentation, and anomaly detection. However, it has some limitations, such as sensitivity to initial centroids, difficulty in determining the optimal number of clusters, and being prone to local optima. Therefore, researchers have proposed various extensions and modifications to the algorithm, such as K-medoids, hierarchical K-means, and fuzzy C-means, to overcome these limitations and improve the performance of the algorithm in specific use cases. • Random forest algorithm: The random forest algorithm is a widely used machine learning technique that is known for its high accuracy, efficiency, and versatility. A random forest is a machine learning technique that is used to solve complex regression and classification problems, where other techniques may not work effectively. It utilizes ensemble learning, which is a technique that combines many classifiers to provide solutions to complex problems. A random forest algorithm consists of many decision trees, and it establishes the outcome based on the predictions of the decision trees. It predicts by taking the average or mean of the output from various trees. Increasing the number of trees increases the precision of the outcome (Tambe 2012). The algorithm has several advantages, including the ability to handle missing data and noisy data, the ability to handle a large number of input features, and the ability to avoid overfitting. Overfitting occurs when a model is too complex and fits the training data too closely, resulting in poor performance when applied to new data. The random forest algorithm avoids overfitting by using multiple decision trees, each trained on a random subset of the data. This reduces the chance of any one decision tree being biased by a particular subset of the data, leading to more accurate predictions. Another advantage of the random forest algorithm is its ability to identify the most important input features for making predictions. The algorithm can measure the importance of each input feature by calculating how much the accuracy of the predictions decreases when that feature is removed. This allows data scientists to focus on the most important features when building models and to improve the accuracy of their predictions. • Artificial neural networks (ANNs): ANNs are a type of machine learning algorithm modeled on the biological neural networks of the human brain. ANNs consist of multiple interconnected layers in their computational

3 Searching for the Right Algorithms

55

model that process the input data. The first layer is the input layer or neurons that send input data to deeper layers. The input is then processed through one or more hidden layers that contain one or more nodes that perform mathematical transformations. The components of these layers change or tweak the information received through various previous layers by performing a series of data transformations (Tambe 2012). The output layer provides the results of the neural network’s processing, such as a classification or regression result artificial neural networks can be trained using a variety of algorithms, including supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training the neural network on labeled input–output pairs, where the output is known for a given input. Unsupervised learning involves training the neural network on unlabeled data, with the network discovering hidden patterns and relationships in the data on its own. Reinforcement learning involves training the neural network to take actions in an environment to maximize a reward signal. ANNs are used in a wide range of applications, such as image recognition, speech recognition, natural language processing, and even game playing. ANNs have demonstrated high levels of accuracy in many areas, including medical diagnosis, fraud detection, and predictive maintenance. As a result, artificial neural networks have become a popular tool for data analysis and decision-making in many industries. • Recurrent neural networks (RNNs): Recurrent neural networks refer to a specific type of artificial neural networks that process sequential data. Traditional feedforward neural networks are limited in their ability to process sequential data because they are designed to process fixed-size inputs and do not have the capability to remember the context or the sequence in which data was presented. Here, the result of the previous step acts as the input to the current step. The output of the current step is then used as an input to the next step and so on. By doing this, the RNN is able to maintain a memory of what was previously calculated (Tambe 2012) and use this information to inform future calculations. This ability to maintain a memory of previous inputs makes RNNs particularly useful for tasks that involve processing sequential data such as natural language processing, speech recognition, and time series prediction. However, RNNs are also susceptible to certain limitations such as the vanishing gradient problem, which can occur when the gradient used to update the network parameters becomes very small, leading to difficulties in training the network. To address these limitations, various modifications and improvements to RNNs have been proposed, such as long short-term memory (LSTM) networks and gated recurrent units (GRUs).

56

F. Lazzeri and A. Robsky

3.2 Generative AI Models Generative artificial intelligence (AI) models are a subset of deep learning models that can produce new content based on what is described in the input. The OpenAI models are a collection of generative AI models that can produce language, code, and images. OpenAI is a company focused on AI research and development. For businesses, organizations, and individuals, generative pretrained transformer (GPT), the powerful natural language model implementation, developed by OpenAI, became an instant hit, because of its potential to transform the businesses. There are three main categories of capabilities found in OpenAI models (summarized in Table 3.1): In the last several years, there have been major breakthroughs in how we achieve better performance in language models, from scaling their size to reducing the amount of data required for certain tasks. Large language models (LLMs) are foundation models that utilize deep learning in natural language processing (NLP) and natural language generation (NLG) tasks. For the purpose of helping them learn the complexity and linkages of language, large language models are pretrained on a vast amount of data. Using techniques such as: • Fine-tuning • In-context learning • Zero-/one-/few-shot learning Hidden Markov models (HMMs) became popular in the 1970s. Their internal representation encodes the grammatical structure of sentences (nouns, verbs, and so on), and they use that knowledge when predicting new words. However, because they are Markov processes, they only take into consideration the most recent token when generating a new token. N-grams became Table 3.1 Main categories of capabilities found in OpenAI AI models Capability

Examples

Generating natural language Generating code

Summarizing complex text for different reading levels, suggesting alternative wording for sentences, and much more Translating code from one programming language into another, identifying and troubleshooting bugs in code, and much more Generating images for publications from text descriptions and much more

Generating images

3 Searching for the Right Algorithms

57

popular in the 1990s, and unlike HMMs, they’re capable of taking a few tokens as input. However, N-grams don’t scale well to a larger number of input tokens (Stollnitz 2023). Then in the 2000s, recurrent neural networks (RNNs) became quite popular because they’re able to accept a much larger number of input tokens. In particular, LSTMs and GRUs, which are types of RNNs, became widely used and could generate fairly good results. However, RNNs have instability issues with very long sequences of text. The gradients in the model tend to grow exponentially (called “exploding gradients”) or decrease to zero (called “vanishing gradients”), preventing the model from continuing to learn from training data (Stollnitz 2023). In 2017, the paper that introduced transformers was released by Google, and we entered a new era in text generation. The architecture used in Transformers allowed a huge increase in the number of input tokens, eliminated the gradient instability issues seen in RNNs, and was highly parallelizable, which meant that it was able to take advantage of the power of graphics processing unit (GPUs). Transformers are based on the “attention mechanism,” where the model can pay more attention to some inputs than others, regardless of where they show up in the input sequence. Transformers are still widely used today, and they’re the technology chosen by OpenAI for their latest text generation models (Stollnitz 2023). Language models are also opening new possibilities for businesses, as they can: • • • •

Automate processes Save time and money Drive personalization Increase accuracy in tasks

Large language models are first pretrained so that they learn basic language tasks and functions (Dilmegani 2023). Pretraining is the step that requires massive computational power and cutting-edge hardware. The general architecture of large language models is the following: 1. Input embedding: The input sequence is first transformed into a dense vector representation, known as an embedding, which captures the relationships between words in the input. 2. Multi-head self-attention: The core component of the transformer block architecture is the multi-head self-attention mechanism, which allows the

58

F. Lazzeri and A. Robsky

model to attend to different parts of the input sequence to capture its relationships and dependencies. 3. Feed-forward network: After the self-attention mechanism, the output is fed into a feed-forward neural network, which performs a nonlinear transformation to generate a new representation. 4. Normalization and residual connections: To stabilize the training process, the output from each layer is normalized, and a residual connection is added to allow the input to be passed directly to the output, allowing the model to learn which parts of the input are most important (Dilmegani 2023). Training of an LLM consists of two parts: pretraining and task-specific training. Pretraining is part of training that enables the model to learn the general rules and dependencies within a language, which takes a significant amount of data, computational power, and time to complete. To make large language models more accessible for enterprises, LLM developers are offering services for enterprises looking to leverage language models (Dilmegani 2023).

3.3 Defining Business Metrics and Objectives For most companies, lack of data is not a problem. In fact, it is the opposite; there is often too much information available to make a clear decision (Lazzeri 2019). With so much data to sort through, companies need a well-defined strategy to clarify the following business aspects: • How can machine learning help companies transform business, better manage costs, and drive greater operational excellence? • Do companies have a well-defined and clearly articulated purpose and vision for what they are looking to accomplish? • How can companies get the support of C-level executives and stakeholders to take that data-driven vision and drive it through the different parts of a business? In short, as we saw in Chap. 1, companies need to have a clear understanding of their business, decision-making process, and a better machine learning strategy to support that process (Lazzeri 2019). With the right machine learning mindset, what was once an overwhelming volume of disparate information becomes a simple, clear decision point. Driving digital transformation requires that companies have a well-defined and clearly articulated purpose and vision for what they are looking to accomplish. It often requires the

3 Searching for the Right Algorithms

59

support of a C-level executive to take that vision and drive it through the different parts of a business. Companies must begin with the right questions. Questions should be measurable, clear, and concise and directly correlated to their core business. In this stage, similar to what we described in Chap. 2, it is important to design questions to either qualify or disqualify potential solutions to a specific business problem or opportunity. For example, start with a clearly defined problem: A retail company is experiencing rising costs and is no longer able to offer competitive prices to its customers (Lazzeri 2019). One of many questions to solve this business problem might include the following: Can the company reduce its operations without compromising quality? There are two main tasks that companies need to address to answer these types of questions: • Define business goals: One of the primary tasks that companies need to undertake is to define their business goals. This involves working with business experts and other stakeholders to identify business problems that need to be solved. By identifying the areas where data and machine learning can provide the most value, companies can then develop a strategy to tackle these issues. • Formulate right questions: Another important task is to formulate the right questions that define the business goals that the machine learning teams can target. This involves identifying the relevant data and features that are needed to answer these questions. It is essential to ensure that the questions are specific, measurable, and achievable, as this will provide clarity on the outcomes that the machine learning algorithm needs to deliver. Formulating the right questions will also help to identify the necessary data inputs and outputs required to support the machine learning algorithm.

3.4 Establishing Machine Learning Performance Metrics To continue the metrics discussion from Chap. 2, we will dive deeper into the various performance metrics by each ML problem. To successfully translate this vision and business goals into actionable results, it is important to establish clear performance metrics (Lazzeri 2019). In the table below, we provide a summary of 20 metrics used for evaluating machine learning models: we group these metrics into different categories based on the machine learning application they can support (Table 3.2):

60

F. Lazzeri and A. Robsky

Table 3.2 Twenty metrics used for evaluating machine learning models Machine learning problem Classification models Classification models

Performance metric

Description

Classification accuracy Precision

It is the number of correct predictions divided by the total number of predictions How precise is the model with its positive predictions: True positive/(true positive + false positive) Classification Recall How many of the positive labels were the model able models to capture: True positive/(true positive + false negative) Classification F1 score It is the percentage of correct predictions that a models machine learning model has made Classification Sensitivity It is the metric that evaluates a model’s ability to models predict true positives of each available category Classification Specificity It is the proportion of actual negatives, which got models predicted as the negative (or true negative) Classification Receiver operating It is a plot that shows the performance of a binary models characteristic curve classifier as a function of its cut-off threshold. It essentially shows the true positive rate against the false positive rate for various threshold values Classification Area under the curve It is the metric that calculates the area under the ROC models curve, and therefore it is between 0 and 1. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example Regression Mean squared error It is the average squared error between the predicted and actual values Regression Mean absolute error It is the average absolute distance between the models predicted and target values Regression Inlier ratio metric It is the percentage of data points that are predicted models with an error less than a margin Regression Root mean squared It is the square root of the average of the squared models error difference between the target value and the value predicted by the regression model Ranking models Mean reciprocal rank It is the average of the reciprocal ranks of the first relevant item for a set of queries Ranking models Precision at k It is the proportion of recommended items in the top-k set that are relevant Ranking models Normalized It is the metric for measuring ranking quality discounted cumulative gain General Pearson correlation It is the metric that measures the strength of the statistical coefficient association between two continuous variables models General Coefficient of It is the proportion of the variance in the dependent statistical determination variable that is predictable from the independent models variable ROC, receiver operating characteristic; AUC, area under the ROC curve

3 Searching for the Right Algorithms

61

Identifying the right set of metrics is a crucial step in the process of implementing machine learning in a business. To begin with, companies need to decide what exactly they want to measure. For example, if a company wants to evaluate the performance of its customer churn prediction model, it needs to decide what aspect of the model’s performance they want to evaluate— whether it is the accuracy of the predictions, the recall of the model, or the precision of the model. Once they have decided what to measure, they need to determine how to measure it (Lazzeri 2022). This involves choosing an appropriate evaluation metric that aligns with the business goals and the specific problem being addressed. Finally, it is essential to define the success metrics, which will help the company determine whether the machine learning model is performing satisfactorily. Success metrics could be defined in terms of a specific value or range of values for the evaluation metric. By focusing on these analytical aspects, companies can identify the right set of metrics that will help them make informed decisions and track the progress of their machine learning initiatives.

3.4.1 Decide What to Measure Let’s take predictive maintenance, a technique that has a wide range of end goals, each with its own set of performance metrics. For instance, predicting the root causes of failure may require monitoring and analyzing various variables and signals, such as vibration, temperature, and pressure, to identify patterns and anomalies. The performance metrics for such a use case may include accuracy, precision, recall, F1 score, and area under the curve (AUC) of the receiver operating characteristic (ROC) curve (Bose 2007). Similarly, predicting which parts will need replacement and when may require analyzing the condition of various components, such as bearings, belts, and motors and comparing their current state to their expected life span. The performance metrics for this use case may include meantime between failures (MTBF), meantime to repair (MTTR), and mean time between replacements (MTBR). Providing maintenance recommendations after the failure happens may require analyzing the root cause and identifying the best course of action to minimize downtime and reduce costs. The performance metrics for this use case may include downtime reduction, cost savings, and customer satisfaction (Bose 2007). The success of any predictive maintenance project relies heavily on the quality and availability of data. Without sufficient data, it is difficult to build accurate models that can make meaningful predictions about future failures. Many companies today are attempting predictive maintenance and have piles

62

F. Lazzeri and A. Robsky

of data available from all sorts of sensors and systems. But, too often, companies do not have enough data about their failure history and that make it very difficult to do predictive maintenance—after all, models need to be trained on such failure history data in order to predict future failure incidents (Lazzeri 2019). This data is critical for training machine learning models that can predict future failures. So, while it’s important to lay out the vision, purpose, and scope of any analytics projects, it is critical that you start off by gathering the right data. It is important to note that the lack of failure history data is not the only challenge in predictive maintenance. There are also many other factors that need to be considered, such as the complexity of the machines and the systems, the accuracy of the sensors, and the availability of maintenance resources.

3.4.2 Decide How to Measure It Thinking about how companies measure their data is also important, especially before the data collection and ingestion phase. Key questions to ask for this sub-step include: • What is the time frame? This could be daily, weekly, monthly, or even annually depending on the specific goals of the project. • What is the unit of measure? For example, if a company is measuring sales data, will they use dollars, units sold, or some other metric? • What factors should be included in the measurement? This is particularly important for complex systems where there may be multiple factors that contribute to the final outcome. • How reliable is the data? Is it accurate and consistent over time? This is essential to ensure that the data collected can be trusted and used to inform important business decisions. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. For example, if a company is looking to improve its sales forecasting, then the key variables might include historical sales data, customer demographics, and marketing expenditures. Another example is if a company is trying to detect fraudulent orders, then the key variables might include order history, user behavior, and payment information.

3 Searching for the Right Algorithms

63

Once these key variables have been identified, the next step is to define the metrics that will be used to evaluate the success of the project. These metrics will depend on the specific problem being solved and the type of model being used.

3.4.3 Define Success Metrics After the key business variables identification, it is important to translate your business problem into a machine learning question and define the metrics that will define your project’s success. Companies typically use machine learning to answer five types of questions: • How much or how many? (regression) reminder: Regression is a type of predictive modeling that estimates the relationship between independent and dependent variables. Regression models can be used to forecast future values of a dependent variable based on historical data. For example, a company may use regression analysis to predict future sales revenue based on historical sales data. • Which category? (classification) reminder: Classification is a type of supervised learning where the machine learning algorithm is trained on labeled data to predict the class of new, unlabeled data. For instance, a company may use classification to predict whether a customer is likely to churn or not based on their transaction history. • Which group? (clustering) reminder: Unlike classification, clustering is a type of unsupervised learning, which means the machine learning algorithm is not given labeled data to learn from. Instead, it identifies patterns or groups in the data based on similarities or dissimilarities. For example, a company may use clustering to segment its customers into different groups based on their purchase behavior. • Is this weird? (anomaly detection) reminder: Anomaly detection is useful in detecting fraudulent activities, unusual customer behavior, or system failures. It involves detecting data points that do not conform to the normal pattern of the data. • Which option should be taken? (recommendation) reminder: Recommendation engines use historical data and machine learning algorithms to provide personalized recommendations to users. For instance, a company may use a recommendation engine to suggest products to customers based on their purchase history or browsing behavior.

64

F. Lazzeri and A. Robsky

Determining which of these questions companies are asking and how to answer them achieves business goals and enables measurement of the results (Lazzeri 2019). At this point, it is important to revisit the project goals by asking and refining sharp questions that are relevant, specific, and unambiguous. For example, if a company wants to achieve a customer churn prediction, they will need an accuracy rate of “x” percent by the end of a 3-month project. With this data, companies can offer customer promotions to reduce churn.

3.5 Architecting the End-to-End Machine Learning Solution In the era of AI, there is a growing trend of accumulation and analysis of data, often unstructured, coming from applications, web environments, and a wide variety of devices. In the third step, companies need to think more organically about the end-to-end data flow and architecture that will support their machine learning solutions (Lazzeri 2019). Data architecture is a crucial aspect of any data-driven organization, as it involves the process of planning and designing the collection, storage, processes, and management of data. The process of data architecture begins with identifying the types of data that are needed to support the organization’s objectives and goals, This involves the definition of the information to be collected, the data standards and norms that will be used for its structuring, and the tools used in the extraction, storage, and processing of such data. This stage is fundamental for any project that performs data analysis, as it is what guarantees the availability and integrity of the information that will be explored in the future (Lazzeri 2019). To do this, you need to understand how the data will be stored, processed, and used, and which analyses will be expected for the project. Once the data requirements have been identified, the next step in the data architecture process is to determine how the data will be extracted, transformed, and loaded (ETL) into the organization’s data storage systems. This may involve using specialized tools and technologies, such as data integration software or data warehouses, to manage the flow of data between different systems and applications. Data architecture also involves the design of data models and schemas that describe the structure of the data being collected, as well as the relationships between different data elements. This can include creating entity-relationship diagrams, data dictionaries, and other types of documentation that help to ensure that the data is accurately and consistently captured and stored—all of these with considerations to privacy and legal. This may include implementing data validation rules, data

3 Searching for the Right Algorithms

65

access controls, and data retention policies to ensure that the data is accurate, secure, and compliant with relevant regulations and standards. It can be said that at this point, there is an intersection of the technical and strategic visions of the project, as the purpose of this planning task is to keep the data extraction and manipulation processes aligned with the objectives of the business. There are mainly seven stages of building an end-to-end pipeline in machine learning: • Data ingestion: The initial stage in every machine learning workflow is transferring incoming data into a data repository. The vital element is that data is saved without alteration, allowing everyone to record the original information accurately. You can obtain data from various sources, including pub/sub requests. Also, you can use streaming data from other platforms. Each dataset has a separate pipeline, which you can analyze simultaneously. The data is split within each pipeline to take advantage of numerous servers or processors. This reduces the overall time to perform the task by distributing the data processing across multiple pipelines. For storing data, use NoSQL databases as they are an excellent choice for keeping massive amounts of rapidly evolving organized/unorganized data. They also provide storage space that is shared and extensible (Lazzeri 2019). • Data processing: This time- and resource-consuming phase entails taking input, unorganized data, and converting it into data that the models can use. During this step, a distributed pipeline evaluates the data’s quality for structural differences, incorrect or missing data points, outliers, anomalies, etc., and corrects any abnormalities along the way (Lazzeri 2019). This stage also includes the process of feature engineering. Once you ingest data into the pipeline, the feature engineering process begins. It stores all the generated features in a feature data repository. It transfers the output of features to the online feature data storage upon completion of each pipeline, allowing for easy data retrieval. • Data splitting: The primary objective of a machine learning data pipeline is to apply an accurate model to data that it hasn’t been trained on, based on the accuracy of its feature prediction. To assess how the model works against new datasets, you need to divide the existing labeled data into training, testing, and validation data subsets at this point. Model training and assessment are the next two pipelines in this stage, both of which should be likely to access the application programming interface (API) used for data splitting. It needs to produce a notification and return with the dataset to protect the pipeline (model training or evaluation) against selecting values that result in an irregular data distribution (Lazzeri 2019).

66

F. Lazzeri and A. Robsky

• Model training: This pipeline includes the entire collection of training model algorithms, which you can use repeatedly and alternatively as needed. The model training service obtains the training configuration details, and the pipeline’s process requests the required training dataset from the API (or service) constructed during the data splitting stage. Once it sets the model, configurations, training parameters, and other elements, it stores them in a model candidate data repository, which will be evaluated and used further in the pipeline. Model training should take error tolerance, data backups, and failover on training segments. For example, you can retrain each split if the latest attempt fails, owing to a transitory glitch. • Model evaluation: This stage assesses the stored models’ predictive performance using test and validation data subsets until a model solves the business problem efficiently. The model evaluation step uses several criteria to compare predictions on the evaluation dataset with actual values. A notification service is broadcast once a model is ready for deployment, and the pipeline chooses the “best” model from the evaluation sample to make predictions on future cases. A library of multiple evaluators provides the accuracy metrics of a model and stores them against the model in the data repository (Lazzeri 2019). • Model deployment: Once the model evaluation is complete, the pipeline selects the best model and deploys it. The pipeline can deploy multiple machine learning models to ensure a smooth transition between old and new models; the pipeline services continue to work on new prediction requests while deploying a new model. • Monitoring model performance: The final stage of a pipeline in machine learning is model monitoring and performance scoring. This stage entails monitoring and assessing the model behavior on a regular and recurring basis to gradually enhance it. Models are used for scoring based on feature values imported by previous stages (Lazzeri 2022). When a new prediction is issued, the performance monitoring service receives a notification, runs the performance evaluation, records the outcomes, and raises the necessary alerts. It compares the scoring to the observed results generated by the data pipeline during the assessment. You can use various methods for monitoring, the most common of which is logging analytics. In some cases, it is not easily possible to calculate model performance if some actions were taken on the predictions that might bias the training; for instance, fraud detection—if a model detects fraud, and the user gets banned from using the platform. However, can we be certain that the model accurately predicted fraud? We won’t know for sure unless the user files a support ticket. In this case and similar cases, a long-term holdback is necessary to continue

3 Searching for the Right Algorithms

67

measuring the outcomes of the model running the predictions, but no actions are applied. In this holdback, we can measure the metrics without worrying that the training/testing/validation data is biased. It is now necessary to select the right tools that will allow an organization to actually build an end-to-end machine learning solution. Factors such as volume, variety of data, and the speed with which they are generated and processed will help companies identify the types of technology they should use (Lazzeri 2019). Among the various existing categories, it is important to consider: • Data collection tools: These are the ones that will help us in the extraction and organization of raw data. Examples of data collection tools include web scraping software, APIs, sensors, and surveys. • Storage tools: These tools store data in either structured or unstructured form and can aggregate information from several platforms in an integrated manner. Structured data is data that can be easily organized into a format such as tables, while unstructured data includes things such as images, videos, and social media posts. Storage tools such as databases, data warehouses, and data lakes are used to store the collected data in a manner that enables easy retrieval and analysis. • Data processing and analysis tools: These tools include data mining, machine learning algorithms, and statistical analysis tools. With these, we use the data stored and processed to create a visualization logic that enables the development of analyses, studies, and reports that support operational and strategic decision-making. Data visualization tools, such as Tableau and PowerBI, enable the creation of dashboards and reports that visually represent data in a clear and concise manner. • Model operationalization tools: After a company has a set of models that perform well, they can operationalize them for other applications to consume. Depending on the business requirements, predictions are made either in real time or on a batch basis. To deploy models, companies need to expose them with an open API interface. The interface enables the model to be easily consumed from various applications (Lazzeri 2019). The tools can vary according to the needs of the business but should ideally offer the possibility of integration between them to allow the data to be used in any of the chosen platforms without needing manual treatments (Lazzeri 2019). This end-to-end architecture will also offer some key advantages and values to companies, such as:

68

F. Lazzeri and A. Robsky

• Accelerated deployment & reduced risk: An integrated end-to-end architecture can drastically minimize cost and effort required to piece together an end-to-end solution, and further enables accelerated time to deploy use cases. Rather than investing time and resources into developing custom integrations between disparate tools, an integrated architecture provides a pre-built solution that is optimized for seamless interoperability. With a pre-built solution, companies can deploy their use cases more quickly than they would be able to with a custom solution. This means that they can start realizing the benefits of their analytics projects sooner, including increased efficiency, improved decision-making, and more. • Modularity: Modularity is the ability of an architecture to break down into smaller, independent, and reusable modules that can be combined in various ways to create a larger system. In the context of an end-to-end architecture for data analysis, modularity is crucial because it allows companies to start at any part of the end-to-end architecture with the assurance that the key components will integrate and fit together. This means that companies can begin by implementing one module of the architecture and gradually add more modules as data analysis evolves, without worrying about the compatibility of the different components. This not only reduces the cost and effort required to implement a new data analysis solution but also reduces the risk of incompatibility issues that can arise when trying to piece together a solution from different components. This modularity approach also allows businesses to more easily replace or update components that become obsolete or less effective over time. • Flexibility: Runs anywhere including multi-cloud or hybrid-cloud environments. Multi-cloud environments refer to the use of multiple cloud computing services from different providers, while hybrid-cloud environments refer to a combination of cloud and on-premises computing. It allows companies to choose the cloud computing service that best suits their needs, based on factors such as cost, security, and scalability. Companies can also avoid vendor lock-in, where they are tied to a specific cloud computing service, by choosing an architecture that can run on multiple platforms. A flexible end-to-end architecture can make it easier for companies to migrate from one cloud computing service to another, or to move between on- premises and cloud computing. This can be important in cases where a company needs to scale up or down its computing resources, or where it needs to move its data and applications to a different location for regulatory or compliance reasons.

3 Searching for the Right Algorithms

69

• End-to-end analytics & machine learning: Enables end-to-end analytics from edge to cloud, with the ability to push machine learning models back out to the edge for real-time decision-making. For example, companies can deploy machine learning models directly at the edge devices, allowing for faster decision-making and more efficient resource allocation. This is particularly important in use cases where real-time decision-making is critical, such as in the context of autonomous vehicles or industrial automation. End-to-end analytics and machine learning also enable companies to gain a more holistic understanding of their data and their business. By analyzing data across their entire ecosystem, companies can identify patterns, trends, and correlations that might not be apparent in isolated data silos. This can lead to more accurate predictions, better insights, and improved business outcomes. • End-to-end data security & compliance: Pre-integrated security and manageability across the architecture including access, authorization, and authentication. This means that data is protected from unauthorized access, modification, and destruction throughout the entire data processing and analysis process. An end-to-end architecture with integrated security and compliance features also ensures that companies meet regulatory requirements and data privacy laws. It can help them maintain a comprehensive and auditable view of the data lineage, including who accessed it, when it was accessed, and how it was used. By ensuring that data is secure throughout the entire pipeline, companies can reduce the risk of data breaches, data leaks, and other security incidents that can negatively impact their reputation and bottom line. • Enabling open-source innovation: Built off open-source projects and a vibrant community innovation model that ensures open standards. Open-source software also offers flexibility and interoperability, which are critical for modern businesses. With open-source software, companies can avoid vendor lock-in, as they are not tied to proprietary solutions. This allows companies to choose the tools that best fit their needs and to customize them to meet their specific requirements. Open-source software often has lower costs and offers faster development cycles, which can help companies reduce costs and time-to-market. Additionally, by using open-source software, companies can benefit from the security of opensource projects, which are often scrutinized by a large community of developers who are continuously identifying and fixing bugs and vulnerabilities (Lazzeri 2022).

70

F. Lazzeri and A. Robsky

3.6 Summary In this chapter, you learnt what machine learning is and how being a machine learning organization implies embedding machine learning teams to fully engage with the business and adapting the operational support of the company. Most importantly, you learnt the following four dimensions that companies can leverage to become machine learning driven: • Understanding algorithms and the business questions that algorithms can answer • Defining business metrics and business impact • Establishing machine learning performance metrics • Architecting the end-to-end machine learning solution The goal of this chapter is to guide every company throughout the machine learning life cycle and adoption. While the specifics will vary based on the organization, scope, and skills, the strategy should clearly define how a model can become a solution and can move from stage to stage, designating responsibility for each task along the way.

References Ajgaonkar A., MLOps: The Key to Unlocking AI Operationalization, Insight Tech Journal, 2021 Bahrammirzaee A., A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems, Neural Computing and Applications, 19(8), 1165–1195, 2010 Bijamov, A., Shubitidze, F., Fernández, J. P., Shamatava, I., Barrowes, B., & O’Neill, K. (2011). Comparison of supervised and unsupervised machine learning techniques for UXO classification using EMI data. Proceedings of SPIE—The International Society for Optical Engineering, 801706, 1–11. Bose B. K., “Neural network applications in power electronics and motor drives-an introduction and perspective”, IEEE Trans. Ind. Electron., vol. 54, no. 1, pp. 14–33, Feb. 2007 Brynjolfsson E., Hitt L.M., and Kim H.H. Strength in numbers: How does data- driven decision making affect firm performance? Working paper, 2011. SSRN working paper. Available at SSRN: http://ssrn.com/abstract=1819486 Davenport T.H., and Patil D.J. Data scientist: the sexiest job of the 21st century. Harv Bus Rev, Oct 2012.

3 Searching for the Right Algorithms

71

Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for credit card fraudulent transaction detection: A comparative study. 2018 IEEE International Conference on Information Reuse and Integration, 122–125. Dilmegani C, Large Language Model Training in 2023, https://research.aimultiple. com/large-language-model-training/, 2023 Herath, H. M. M. G. T., Kumara, J. R. S. S., Fernando, M. A. R. M., Bandara, K. M. K. S., & Serina, I. (2018). Comparison of supervised machine learning techniques for PD classification in generator insulation. 2017 IEEE International Conference on Industrial and Information Systems, 1–6. Lazzeri F., The Data Science Mindset: Six Principles to Build Healthy Data-Driven Companies, InfoQ, 2019 Lazzeri F., What You Should Know before Deploying ML in Production, InfoQ, 2022 Lushan, M., Bhattacharjee, M., Ahmed, T., Rahman, M. A., & Ahmed, S. (2019). Supervising vehicle using pattern recognition: Detecting unusual behavior using machine learning algorithms. 2018 IEEE Region Ten Symposium, 277–28 Mishu, S. Z., & Rafiuddin, S. M. (2017). Performance analysis of supervised machine learning algorithms for text classification. 2016 19th International Conference on Computer and Information Technology, 409–413. Moh, M., Gajjala, A., Gangireddy, S. C. R., & Moh, T-S. (2016). On multi-tier sentiment analysis using supervised machine learning. 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 341–344. Robles-Durazno, A., Moradpoor, N., McWhinnie, J., & Russell, G. (2018). A supervised energy monitoring-based machine learning approach for anomaly detection in a clean water supply system. 2018 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), 1–8. Russell S. J., Artificial Intelligence: a Modern Approach, Upper Saddle River, N.J., Prentice Hall, 2010. Stollnitz Bea, How GPT models work, https://bea.stollnitz.com/blog/how-gpt- works/, 2023. Tambe P. Big data know-how and business value. Working paper, NYU Stern School of Business, NY, New York, 2012.

4 Operationalizing Your Machine Learning Solution

Artificial intelligence (AI) presents a significant opportunity for companies to transform their operations. By leveraging AI, companies can create intelligent applications that can predict and schedule equipment’s maintenance, to intelligent research and development (R&D) systems able to estimate the cost of new drug development, and use human resource (HR) AI-powered tools to enhance the hiring process and employee retention strategies. However, to be able to leverage this opportunity, companies must learn how to successfully build, train, test, and push hundreds of machine learning (ML) models in production and to move models from development to their production environment in ways that are robust, fast, and repeatable (Lazzeri 2022). To achieve this, companies need to develop a robust end-to-end AI architecture, as mentioned previously, that can support the entire machine learning workflow. This involves selecting the right data architecture, tools, and technologies for data collection, storage, processing, and analysis, as well as for model operationalization. Additionally, companies need to have a strong understanding of the data they are working with, and they must ensure that their models are accurate, explainable, and compliant with any relevant regulations or ethical considerations. To build and deploy successful AI solutions, companies must also foster a culture of experimentation and continuous improvement. They need to encourage collaboration between data scientists, developers, and business analysts to ensure that everyone understands the business goals and objectives and to develop a shared vision for the AI solution. Furthermore, they need to implement agile development methodologies that allow for rapid iteration

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 F. Lazzeri, A. Robsky, Machine Learning Governance for Managers, https://doi.org/10.1007/978-3-031-31805-4_4

73

74

F. Lazzeri and A. Robsky

and testing of models, and they need to create a feedback loop to continually refine and improve their AI solutions. In this chapter, we will introduce some common concepts and challenges of machine learning model deployment and AI application building process; in particular, we will discuss the following points to enable organizations to tackle some of those challenges: • What is AI, its benefits and limitations, and how it differs from traditional software development? This will give organizations a clear understanding of the potential of AI and the scope of its applications. • Why successful model deployment is fundamental for AI-driven companies? In other words, building an accurate and effective model is only the first step, and deploying it in a production environment requires a comprehensive strategy that covers a wide range of aspects, including data preparation, model selection, testing, and maintenance. • How to select the right tools to succeed with model deployment and AI adoption? There is a wide range of tools and platforms available for machine learning and AI development, and selecting the right one depends on various factors, such as the business objectives, the technical requirements, and the budget constraints. We will present some of the most common tools and their pros and cons to help organizations make informed decisions. • Why machine learning operations (MLOps) is critical for the successful maintenance of AI applications? MLOps is a set of practices and tools that aim to streamline the machine learning life cycle, from development to deployment and maintenance. It includes a range of tasks such as version control, testing, monitoring, and automation that ensure the model is performing as expected and continuously improving over time.

4.1 What Is AI AI is a field that combines computer science, big data, and machine learning to enable problem-solving. These disciplines are comprised of machine learning algorithms that seek to create trained models, which make predictions or classifications based on input data (Russell 2010). Both deep learning and machine learning are subfields of artificial intelligence, and deep learning is a subfield of machine learning. Deep learning is comprised of neural networks (Bose 2007). Neural networks are composed of layers of interconnected nodes, and the learning process occurs by adjusting the weights and biases of the connections between nodes. Machine learning is a subset of AI that uses

4 Operationalizing Your Machine Learning Solution

75

statistical learning algorithms to build smart systems. Machine learning systems can learn from historical data, be deployed, and automatically learn and improve without explicitly being programmed again. The machine learning algorithms are classified into three categories: supervised, unsupervised, and reinforcement learning. In supervised learning, the model is trained on labeled data to predict the correct output for new inputs. In unsupervised learning, the model identifies patterns and relationships in unlabeled data without any prior knowledge or guidance. Reinforcement learning involves training an agent to make decisions in an environment where the agent receives feedback in the form of rewards or penalties based on its actions. Successful model deployment is fundamental for AI-driven companies because it allows them to leverage the insights gained from data and use them to create new business value. Model deployment involves taking the trained machine learning models from the development environment and deploying them in a production environment where they can make predictions or classifications on new data. Deep learning is a subset of machine learning, and it is a technique that is inspired by the way a human brain filters data and external input. In simple terms, deep learning models filter the input data through layers of mathematical and optimization functions to predict and classify information. Deep learning network architectures are classified into several types, each of which is suitable for a specific type of data and problem. Convolutional neural networks (CNNs) are commonly used for image classification and processing, while recurrent neural networks (RNNs) are useful for sequential data such as text and speech processing. Recursive neural networks (RvNNs) are a type of deep learning network that can process hierarchical structures such as syntax trees or molecules (Russell 2010). In the Table 4.1, we summarize some of the most innovative AI applications in recent years, divided by business areas. Companies are increasingly recognizing the competitive advantage of applying AI insights into business objectives and are making it a business- wide priority. For example, targeted recommendations provided by AI can help businesses make better decisions faster. An e-commerce company can use AI to analyze customer purchase history, browsing behavior, and other relevant data to provide personalized product recommendations, improving customer satisfaction and increasing sales. Similarly, an AI-powered supply chain optimization system can use data from sensors and other sources to optimize routes, reduce waste, and improve efficiency. For example, an e-commerce company can use AI to analyze customer purchase history, browsing behavior, and other relevant data to provide personalized product recommendations,

76

F. Lazzeri and A. Robsky

Table 4.1 Most innovative AI applications in recent years Industry

AI Application

Retail and E-Commerce Education

Personalized shopping, AI-powered assistants, Virtual fitting rooms

Healthcare Manufacturing Transportation Finance

Voice assistants, personalized learning, machine translation (speech to text and text to speech) Clinical trials for drug development, robotic surgeries, virtual nursing assistants Steel quality inspection, predictive maintenance, food waste reduction Self-driving vehicles, pedestrian detection, computer vision–powered parking management Fraud detection, commercial lending operations, investment robo-advisory

improving customer satisfaction and increasing sales. Similarly, an AI-powered supply chain optimization system can use data from sensors and other sources to optimize routes, reduce waste, and improve efficiency. AI can also help companies identify new opportunities and develop innovative solutions to complex problems. For instance, an AI-powered R&D system can analyze vast amounts of data to identify potential drug targets, significantly reducing the time and cost of drug development. Moreover, AI can help companies automate many routine and repetitive tasks, freeing employees to focus on more complex and creative work. Many of the features and capabilities of AI can lead to lower costs, reduced risks, faster time to market, and much more (Bahrammirzaee 2010).

4.2 Why Successful Model Deployment Is Fundamental for AI-Driven Companies Machine learning model deployment is a crucial step in the process of making machine learning models useful in real-world applications. Machine learning model deployment is the process by which a machine learning algorithm is converted into a web service. The goal is to make the model available for use by other applications and services, allowing them to query the model and receive predictions. We refer to this conversion process as operationalization; operationalizing a machine learning model means transforming it into a consumable service and embedding it into an existing production environment. The process of operationalizing a machine learning model requires several steps, including selecting the right infrastructure, packaging the model code and dependencies into a container or serverless function, and defining an

4 Operationalizing Your Machine Learning Solution

77

application programming interface (API) for interacting with the model. It also involves testing the model for accuracy and robustness and ensuring that it meets the necessary security and compliance standards. One of the key benefits of operationalizing a machine learning model is that it enables businesses to incorporate AI insights into their existing systems and workflows. From the previous example, an e-commerce company might use a machine learning model to recommend products to customers based on their browsing history and purchase behavior. By operationalizing the model as a web service, the company can seamlessly integrate these recommendations into its existing product catalog and checkout process. However, deploying machine learning models can be challenging, especially at scale. As the number of models and applications grows, managing the deployment process can become complex and time-consuming. This is where machine learning operations (MLOps) comes in. MLOps is a set of practices and tools for managing the machine learning life cycle, from model development to deployment and maintenance. MLOps can help businesses automate and streamline the deployment process, reducing the risk of errors and increasing the speed at which new models can be brought into production. Model deployment is a fundamental step of the machine learning model workflow (Fig. 4.1). Deploying machine learning models to production has become a critical step in the machine learning pipeline. When we think about AI, we focus our attention on key components of the machine learning workflow such as data sources and ingestion, data pipelines, machine learning, model training and testing, how to engineer new features, and which variables to use to make the models more accurate. All these steps are important; however, thinking about how we are going to consume those models and data over time is also a critical step in every machine learning machine learning pipeline. We can only begin extracting real value and business benefits from a model’s predictions when it has been deployed and operationalized. Deployed models must be performant, stable, and scalable to meet the needs of the organization. Additionally, they must be secure and compliant with industry standards and regulations. Proper monitoring of deployed models is also crucial to ensure that they remain reliable and accurate over time. By ensuring successful model deployment and operationalization, organizations can leverage the power of machine learning to drive business growth and innovation. We believe that successful model deployment is fundamental for AI-driven enterprises for the following key reasons: • Deployment of machine learning models means making models available to external customers and/or other teams and stakeholders in your company.

78

F. Lazzeri and A. Robsky

Fig. 4.1 Machine learning model workflow

• By deploying models, other teams in your company can use them, send data to them, and get their predictions, which are in turn populated back into the company systems to increase training data quality and quantity. • Once this process is initiated, companies will start building and deploying higher numbers of machine learning models in production and master robust and repeatable ways to move models from development environments into business operations systems. Through machine learning model deployment, companies can begin to take full advantage of the predictive and intelligent models they build, develop business practices based on their model results, and, therefore, transform themselves into actual AI-driven businesses (Lazzeri 2022). Right from the first day of an AI application process, machine learning teams should interact with business counterparts. It is essential to maintain constant interaction from the very beginning of the AI application process. This helps in understanding the business requirements, which in turn will guide the machine learning model experimentation process. In addition, this

4 Operationalizing Your Machine Learning Solution

79

will also enable the model deployment and consumption process to be parallel to business requirements, leading to better alignment between the model’s output and business objectives. Most organizations struggle to unlock machine learning’s potential to optimize their operational processes and get data scientists, analysts, and business teams speaking the same language. This can lead to models being developed in isolation from the business objectives, resulting in models that are not effective in achieving the desired outcomes. The lack of clear communication between these teams can also result in the development of models that are difficult to deploy and maintain. Deploying machine learning models can be challenging, as there are many factors that need to be considered. One of the most critical factors is the quality and consistency of the data used for training the models. Machine learning models must be trained on historical data, which demands the creation of a prediction data pipeline, an activity requiring multiple tasks including data processing, feature engineering, and tuning. These tasks can be time- consuming and require careful attention to detail to ensure that the pipeline is correctly implemented and produces accurate results. Another critical factor in deploying machine learning models is the need to ensure that the development and production environments are consistent. Each task, down to versions of libraries and handling of missing values, must be exactly duplicated from development to production environment. Sometimes, differences in technology used in development and production contribute to difficulties in deploying machine learning models. This can be especially problematic when deploying models at scale, as even small differences in performance can have a significant impact on the overall effectiveness of the system (Lazzeri 2019). As mentioned before, it is important for machine learning teams to work closely with their business counterparts and other stakeholders. Collaboration between these teams can help to ensure that everyone is working toward the same goals and that any issues or challenges are addressed quickly and effectively. It is also essential to establish clear processes for deploying and testing models, including version control and testing frameworks, to ensure that the models are deployed in a consistent and reliable manner. Companies can use machine learning pipelines to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Companies can use these pipelines to automate the process of preparing data for modeling by taking data from a variety of sources and transforming it into a format suitable for machine learning models. Each phase can encompass multiple steps, each of which can run unattended in

80

F. Lazzeri and A. Robsky

various compute targets. Pipeline steps are reusable and can be run without rerunning subsequent steps if the output of that step hasn’t changed. The pipeline can then use a variety of techniques, including feature engineering and model selection, to improve the accuracy of the model. Once the model has been trained, it can be deployed for use in various applications, such as predicting customer behavior or optimizing production processes. The inference/scoring phase is used to evaluate the performance of the model by scoring new data points and comparing them to the actual outcomes. One of the key benefits of machine learning pipelines is their ability to automate tasks, thereby reducing the time and effort required to complete complex tasks. In addition, each phase of the pipeline can be scaled up or down depending on the size of the data set and the complexity of the model. Pipelines also allow data scientists to collaborate while working on separate areas of a machine learning workflow. Companies can use machine learning pipelines such as continuous integration/continuous deployment (CI/CD) to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Each phase can encompass multiple steps, each of which can run unattended in various computer targets. Pipeline steps are reusable and can be run without rerunning subsequent steps if the output of that step hasn’t changed (Lazzeri 2022). In order to be effective, machine learning pipelines must be carefully designed and implemented. They should be modular, allowing individual steps to be swapped out or modified as needed. They should also be flexible, allowing for different configurations depending on the needs of the specific machine learning project. Finally, they should be scalable and be able to handle large datasets and compute-intensive operations.

4.3 How to Select the Right Tools to Succeed with Model Deployment and AI Adoption Building, training, testing, and finally deploying machine learning models is often a tedious and slow process for companies that are looking at transforming their operations with AI. Moreover, even after months of development, which delivers a machine learning model based on a single algorithm, the management team has little means of knowing whether their data scientists have created a great model and how to scale and operationalize it. The process

4 Operationalizing Your Machine Learning Solution

81

can be iterative, with data scientists frequently revisiting earlier stages to refine their approach and improve the accuracy of the model. Despite the amount of time and resources that companies invest in developing machine learning models, the process of scaling and operationalizing them can be equally challenging. Once a model has been developed, it needs to be integrated into existing systems and workflows, which often involves significant customization and testing. Even then, it can be difficult to know whether a model is performing as expected or if it could be further optimized to deliver better results. To address these challenges, companies are turning to machine learning platforms and tools that offer end-to-end support for the entire machine learning life cycle. These platforms provide a range of features and functionality, including data management, model training and testing, deployment and monitoring, and collaboration tools that enable data scientists and other stakeholders to work together more effectively. By leveraging these platforms, companies can streamline their machine learning processes and improve the efficiency and effectiveness of their AI initiatives. Below is a guideline on how a company can select the right tools to succeed with model deployment. The model deployment workflow should be based on the following simple steps: • • • •

Register the model. Prepare to deploy (specify assets, usage, compute target). Deploy the model to the compute target. Register the model.

A registered model is a logical container for one or more files that make up the model. For example, if data scientists and ML developers have a model that is stored in multiple files, data scientists and ML developers can register them as a single model in the workspace. After registration, data scientists and ML developers can then download or deploy the registered model and receive all the files that were registered. When downloading the model, they will receive all the files that were registered, making it easier to reproduce the model in a different environment. When deploying the model, the registered model can be used as a single entity, with all the files needed for the model to work correctly. Using registered models can help streamline the model deployment process, reducing the risk of errors and simplifying the management of models. It also allows data scientists and ML developers to easily share models with other teams or stakeholders, as they can simply share the registered model without worrying about the individual files that make up the model.

82

F. Lazzeri and A. Robsky

To deploy a model as a web service, data scientists and ML developers must create an inference configuration and a deployment configuration. Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on production data. In the inference configuration, data scientists and ML developers specify the scripts and dependencies needed to serve the model. In the deployment configuration, data scientists and ML developers specify details of how to serve the model on the compute target (Lazzeri 2022). This includes defining the type of deployment, such as a web service or a batch job, and setting up the endpoint uniform resource locater (URL) that clients will use to interact with the model. They must also configure the number of instances to allocate for the model and set up any authentication or security requirements for accessing the endpoint. After configuring the inference and deployment configurations, data scientists and ML developers can deploy the model as a web service, allowing external clients to consume the model’s predictions. When a model is deployed as a web service, it requires an entry script to handle the requests from the clients. The entry script is a script that defines how the web service should interact with the model. The entry script receives data submitted to a deployed web service and passes it to the model for processing. The script then takes the response returned by the model and returns that to the client. The script is specific to the model; it must understand the data that the model expects and returns. When data scientists and ML developers register a model, data scientists and ML developers provide a model name used for managing the model in the registry (Bose 2007). This model name is a logical container for one or more files that make up the model, and it can be used to track and manage the model over time. By registering a model, data scientists and ML developers can store all the files that make up the model in a central location and make it easy to deploy the model as a web service. This approach helps to improve the reproducibility and scalability of the model since all the files that make up the model are stored in a single location, making it easy to manage and track changes to the model over time. Finally, before deploying, data scientists and ML developers must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. These configurations include settings for the compute resources needed to host the web service, the type of environment needed for the deployment, and other operational details that ensure the smooth running of the web service. When deploying locally, data scientists and ML developers must specify the port where the service accepts requests. The port number is a network endpoint through which the service can

4 Operationalizing Your Machine Learning Solution

83

receive and respond to requests. The deployment configuration can also include settings for security, such as authentication and authorization mechanisms, and for scaling the service up or down based on demand. Other important settings may include logging and monitoring configurations, as well as data management and backup configurations to ensure data consistency and reliability. To ensure the deployment configuration works as intended, data scientists and ML developers must test the configuration before deploying the model to a production environment. This testing should include validating the deployment configuration on different compute targets, ensuring that the web service is deployed with the appropriate permissions and configurations, and verifying that the service is operational and meets the performance requirements. By properly configuring the deployment environment, data scientists and ML developers can ensure that their models are ready for production use and can be scaled to meet changing business needs.

4.4 Why MLOps Is Critical for Successful Maintenance of AI Applications Machine learning operations (MLOps) is a set of practices and tools that enables efficient and scalable development, deployment, and management of machine learning models. It is important for several reasons. First, as mentioned, machine learning models rely on huge amounts of data, making it very difficult for data scientists and engineers to keep track of it all. MLOps provides solutions to this problem by automating the processes of data ingestion, cleaning, preprocessing, and storage. This ensures that the data used in machine learning models is clean, consistent, and up-to-date. Second, MLOps helps data scientists and ML developers to keep track of the different parameters that can be tweaked in machine learning models. These parameters include hyperparameters, regularization parameters, learning rates, and others. MLOps tools automate the process of tuning these parameters, allowing data scientists and ML developers to focus on more critical tasks such as model experimentation and optimization. Sometimes, small changes can lead to very big differences in the results that data scientists and ML developers get from the machine learning models. Third, MLOps helps data scientists and ML developers to keep track of the features that the model works with. Feature engineering is an important part of the machine learning life cycle and can have a large impact on model accuracy. MLOps tools enable data scientists and ML developers to manage features efficiently by automating feature selection, extraction, and transformation.

84

F. Lazzeri and A. Robsky

Deploying a machine learning model in production is not the end of the process, but rather the beginning of a new phase. Once in production, monitoring a machine learning model is not really like monitoring other kinds of software such as a web app, and debugging a machine learning model is complicated. Models use real-world data for generating their predictions, and real- world data may change over time. As it changes, it is important to track the model performance and, when needed, update the model. If the model’s performance begins to degrade, data scientists and ML developers need to identify the root cause of the issue and fix it. Debugging a machine learning model can be complicated, as the model’s inner workings are often opaque and difficult to interpret. This means that data scientists and ML developers have to keep track of new data changes and make sure that the model learns from them (Fig. 4.2) (Lazzeri 2022). This means that they need to continually retrain the model with new data and evaluate its performance to ensure that it is still effective. To facilitate this process, MLOps teams often build monitoring and feedback loops into their pipelines, which can automatically flag performance issues and trigger retraining when necessary. By continually monitoring and updating machine learning models, data scientists and ML developers can ensure that their models remain accurate and effective over time. There are many different MLOps capabilities to consider before deploying to production. First is the capability of creating reproducible machine learning pipelines. Machine learning pipelines allow data scientists and ML developers to define repeatable and reusable steps for the data preparation, training, and scoring processes. By breaking down the machine learning workflow into a series of steps, data scientists and ML developers can optimize each step individually, improving the overall performance and accuracy of the machine

Fig. 4.2 MLOps for AI applications

4 Operationalizing Your Machine Learning Solution

85

learning model. These steps should include the creation of reusable software environments for training and deploying models, as well the ability to register, package, and deploy models from anywhere. These environments can be created once and then reused across multiple projects, making it easier to deploy models quickly and consistently. Moreover, data scientists and ML developers can use pipelines to register, package, and deploy models from anywhere, which is particularly useful when working with remote teams or when deploying models to edge devices. Using pipelines allows us to frequently update models or roll out new models alongside other AI applications and services. This enables companies to keep up with changing business needs and customer requirements by rapidly developing and deploying new machine learning models. Additionally, pipelines can be integrated with version control systems to track changes to models and provide a clear audit trail of who made the changes and when. In addition to creating reproducible machine learning pipelines, data scientists and ML developers also need to track the associated metadata required to use the model and capture governance data for the end-to-end machine learning life cycle. This metadata can include information such as the data sources used, the code used for training and testing the model, and the hyperparameters used to train the model. Metadata is important for reproducibility and accountability, as it allows anyone to reproduce the exact steps taken to create the model and the results obtained. This is particularly important in regulated industries such as healthcare and finance, where audits and compliance requirements may necessitate detailed documentation of the machine learning process. In terms of governance, it is also important to track lineage information for example, who published the model, why changes were made at some point, or when different models were deployed or used in production (Lazzeri 2022). Lineage information can be used to trace back the history of a model and understand the reasons for any changes or updates made to it. This is crucial for ensuring transparency, accountability, and compliance with industry regulations. Tracking metadata and lineage information requires a robust system for data management and version control. This system should be able to handle large volumes of data and metadata, as well as provide mechanisms for versioning and tracking changes over time. It should also be able to integrate with other systems used in the machine learning life cycle, such as data storage and model deployment platforms. With proper metadata and lineage tracking, data scientists and ML developers can ensure that their machine learning models are reliable, reproducible, and compliant with industry regulations.

86

F. Lazzeri and A. Robsky

It is also important to notify and alert people to events in the machine learning life cycle. In the machine learning life cycle, there are several events that should be notified and alerted to, such as experiment completion, model registration, model deployment, and data drift detection. By setting up notifications and alerts, data scientists and ML developers can take action quickly to address issues as they arise. Additionally, data scientists and ML developers must monitor machine learning applications for operational and ML-related issues. Here, it is important for data scientists to be able to compare model inputs from training time versus inference time, to explore model-specific metrics, and to configure monitoring and alerting on machine learning infrastructure. They can explore model-specific metrics to ensure that the model is performing well and configure monitoring and alerting on machine learning infrastructure. These activities are critical to detecting potential issues and taking corrective action quickly. The second aspect that is important to consider before deploying machine learning in production is open-source integration. These frameworks provide an efficient and cost-effective way to accelerate machine learning solutions. By leveraging the power of the community to develop, maintain, and extend these frameworks, data scientists and ML developers can benefit from cutting- edge research and development efforts. Some popular open-source training frameworks include TensorFlow, PyTorch, and RAY. Another important consideration is the use of open-source frameworks for interpretable and fair models. As machine learning models continue to be used in high-stakes decision-making, it is critical to ensure that they are transparent, explainable, and unbiased. To this end, many researchers and practitioners have developed open-source frameworks that enable data scientists and ML developers to interpret and analyze machine learning models. For instance, tools such as SHAP and LIME enable data scientists to understand which features a model is using to make decisions, and why (Lazzeri 2022). As mentioned, there are many different open-source training frameworks. Three of the most popular are PyTorch, TensorFlow, and RAY. PyTorch is a popular open-source machine learning framework that provides an end-to-end solution for building and deploying machine learning models. One of the key features of PyTorch is TorchServe, an easy-to-use tool for deploying PyTorch models at scale. TorchServe provides a simple way to package PyTorch models into a production-ready format and allows data scientists and ML developers to deploy models with a single command. It supports various deployment scenarios, including cloud-based deployments on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), as well as on-premises deployments. Another important feature of

4 Operationalizing Your Machine Learning Solution

87

PyTorch is its mobile deployment support. PyTorch provides a mobile deployment library called TorchMobile, which allows data scientists and ML developers to run PyTorch models on mobile devices. This can be useful in scenarios where real-time predictions are needed on the edge, such as in autonomous vehicles or mobile applications. PyTorch also has excellent cloud platform support. It offers integrations with popular cloud platforms such as AWS, GCP, and Azure, providing data scientists and ML developers with the flexibility to deploy their models on a platform that best suits their needs. Finally, PyTorch provides C++ frontend support, which is a pure C++ interface to PyTorch that follows the design and the architecture of the Python frontend. This allows data scientists and ML developers to write high-performance, low- latency applications that interact with PyTorch models using C++. Overall, PyTorch is a powerful and flexible framework that offers a wide range of features for building and deploying machine learning models. TensorFlow is another end-to-end machine learning framework developed by Google that is very popular in the industry. One of the main features for MLOps is TensorFlow Extended (TFX), which is an end-to-end platform for preparing data, training, validating, and deploying machine learning models in large production environments. TFX pipelines are a sequence of components that are specifically designed for scalable and high-performance machine learning tasks. The TFX pipeline is made up of several components, including data validation, preprocessing, training, evaluation, and deployment. The data validation component is used to ensure that the data used to train the model is of high quality, while the preprocessing component is used to transform the raw data into a format suitable for training the model. The training component trains the machine learning model on the preprocessed data, while the evaluation component is used to evaluate the performance of the trained model. Finally, the deployment component is used to deploy the trained model to a production environment. TFX pipelines also come with built-in support for continuous training, which allows data scientists and ML developers to continuously improve their models over time by retraining them with new data. To conclude, TFX is a powerful tool for managing the end-to-end machine learning life cycle in large-scale production environments. RAY is an open-source framework for building and training reinforcement learning (RL) models. The framework consists of several useful libraries that facilitate the development, deployment, and management of distributed machine learning workflows. One of these libraries is Tune, which is used for hyperparameter tuning. Tune provides a variety of search algorithms, including random search, grid search, and Bayesian optimization. These search

88

F. Lazzeri and A. Robsky

algorithms can help to find the optimal set of hyperparameters for a given machine learning model. Another useful library in RAY is RLlib, which is designed specifically for training RL models. RLlib provides a range of reinforcement learning algorithms, such as deep Q-network (DQN), asynchronous advantage actor critic (A3C), and proximal policy optimization (PPO). These algorithms can be used to train agents in a variety of environments, from simple grid worlds to complex video games. RLlib also includes support for distributed training, making it possible to train RL models across multiple machines in parallel. In addition to Tune and RLlib, RAY includes a library called Train, which is designed for distributed deep learning. Train provides a simple API for building and training deep learning models using popular frameworks such as PyTorch and TensorFlow. With Train, it is possible to train models across multiple machines in parallel, which can help to reduce training time and increase model performance. RAY also includes a library called Dataset, which is designed for distributed data loading. Dataset provides a simple API for loading large datasets from a variety of sources, including Hadoop distributed file system (HDFS) and simple storage service (S3). Dataset can also be used to preprocess data, such as applying image augmentations or normalizing features. RAY has two additional libraries, Serve and Workflows, which are useful for deploying machine learning models and distributed apps to production (Lazzeri 2022). Serve is a scalable and efficient framework for serving machine learning models. Serve provides an easy-to- use API for deploying models in production, as well as features such as automatic model versioning and request batching. Workflows is a library for building and managing distributed workflows. Workflows provides a simple API for defining workflows that can include multiple steps, such as data preprocessing, model training, and model deployment. For creating interpretable and fair models, two useful frameworks are InterpretML and Fairlearn. InterpretML is an open-source package that incorporates several machine learning interpretability techniques. The package includes a number of interpretability algorithms, such as partial dependence plots (PDPs), individual conditional expectation (ICE) plots, SHapley Additive exPlanations (SHAP), and accumulated local effects (ALE) plots. With this package, data scientists and ML developers can train interpretable glass box models and explain black box systems. Moreover, it helps data scientists and ML developers understand the model’s global behavior and understand the reason behind individual predictions. Fairlearn is a Python package that provides metrics for assessing fairness and mitigating unfairness in AI and machine learning models. With Fairlearn, data scientists and ML developers can identify the groups that are negatively impacted by a model and can compare multiple models in terms of their use

4 Operationalizing Your Machine Learning Solution

89

of fairness and accuracy metrics. Fairlearn also supports several algorithms for mitigating unfairness in a variety of AI and machine learning tasks, with various fairness definitions (Lazzeri 2022). The package has built-in methods for computing metrics such as demographic parity, equalized odds, and equal opportunity. Additionally, it provides several methods for mitigating bias in models, including reweighting examples, adjusting decision thresholds, and generating counterfactual data. With Fairlearn, data scientists and ML developers can evaluate the fairness of their models and make informed decisions about how to address any identified issues. Our third open-source technology is used for model deployment. When working with different frameworks and tools, data scientists and ML developers have to deploy models according to each framework’s requirements. In order to standardize this process, data scientists and ML developers can use the open neural network exchange (ONNX) format. ONNX is an open-source, community-driven project that aims to provide an open and standardized format for exchanging machine learning models between different machine learning frameworks. With the ONNX format, data scientists and ML developers can train a model in one of the many popular machine learning frameworks, such as PyTorch, TensorFlow, or RAY. Data scientists and ML developers can then convert it into ONNX format and then in different frameworks, for example, in ML.NET. This makes the deployment process more flexible and efficient, reducing the time and resources needed to deploy models. As mentioned, the ONNX format supports a wide range of machine learning frameworks, including TensorFlow, PyTorch, MXNet, Caffe2, and more. It also supports different types of models, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and more. Additionally, the ONNX format supports a variety of hardware platforms, including central processing unit (CPUs), GPUs, and specialized hardware such as field-programmable gate array (FPGAs). In addition to standardizing the deployment process, the ONNX format also provides a set of tools and libraries for optimizing, validating, and visualizing ONNX models. This makes it easier for data scientists and ML developers to ensure the quality and performance of their models before and after deployment. The ONNX runtime (ORT) represents machine learning models using a common set of operators, the building blocks of machine learning and deep learning models, which allows the model to run on different hardware and operating systems. ORT optimizes and accelerates machine learning inferencing, which can enable faster customer experiences and lower product costs. It supports models from deep learning frameworks such as PyTorch and TensorFlow, but also classical machine learning libraries, such as Scikit-learn.

90

F. Lazzeri and A. Robsky

There are many different popular frameworks that support conversion to ONNX. For some of these, such as PyTorch, ONNX format export is built in. For others, such as TensorFlow or Keras, there are separate installable packages that can process this conversion. The process is very straightforward: First, data scientists and ML developers need a model trained using any framework that supports export and conversion to ONNX format. Then data scientists and ML developers load and run the model with ONNX runtime. Finally, data scientists and ML developers can tune performance using various runtime configurations or hardware accelerators (Lazzeri 2022). The third aspect that data scientists and ML developers should know before deploying machine learning in production is how to build pipelines for the machine learning solution. The first task in the pipeline is data preparation, which includes importing, validating, cleaning, transforming, and normalizing the data. This step is essential as it ensures the data is ready for modeling, and any data anomalies or issues are resolved before training the model. Once the data is prepared, the next step is feature engineering, where the data is analyzed, and relevant features are extracted. Feature engineering plays a critical role in building an accurate model, and it involves several techniques such as one-hot encoding, normalization, scaling, and more. Next, the pipeline contains training configuration, including parameters, file paths, logging, and reporting. The training configuration step involves specifying parameters such as learning rates, batch sizes, number of epochs, loss functions, and optimizers, as well as configuring file paths for data, model checkpoints, and logs. This step is crucial for making the training process efficient and repeatable, as it allows data scientists and ML developers experiment with different hyperparameters and settings and track the results of each experiment. Once the training configuration is defined, the pipeline moves on to the actual training and validation jobs. These jobs involve training the model using the specified configuration and validating its performance on a held-out validation set. To make the training process efficient, data scientists and ML developers may use techniques such as distributed processing, which involves running the training job on multiple compute nodes simultaneously and progress monitoring, which involves tracking the model’s performance during training and stopping the training job when it no longer improves. Efficiency might come from specific data subsets, different hardware, compute resources, distributed processing, and progress monitoring. Finally, after the model is trained and validated, it is time to deploy it to production. This involves versioning the model so that different versions of the model can be deployed and rolled back if necessary. It also involves scaling the model, so

4 Operationalizing Your Machine Learning Solution

91

that it can handle the expected workload, provisioning, the necessary compute and storage resources, and setting up access control so that only authorized users can access the model. Choosing a pipeline technology will depend on the needs; usually, these fall under one of three scenarios: model orchestration, data orchestration, or code and application orchestration. Each scenario is oriented around a persona who is the primary user of the technology and a canonical pipeline, which is the scenario’s typical workflow (Lazzeri 2022). In the model orchestration scenario, the primary persona is a data scientist who is responsible for building, testing, and deploying machine learning models. The canonical pipeline in this scenario starts with data preparation, followed by training, model evaluation, and deployment. In terms of open- source technology options, Kubeflow pipelines is a popular choice for this scenario, as it provides a platform to build, deploy, and manage machine learning workflows. For a data orchestration scenario, the primary persona is a data engineer, who is responsible for managing data pipelines and ensuring data quality. The canonical pipeline starts with data ingestion, followed by data cleaning, transformation, and storage. A common open-source choice for this scenario is Apache airflow. It provides a platform for creating, scheduling, and monitoring data workflows and can support different data sources and destinations. Finally, the third scenario is code and application orchestration. Here, the primary persona is an app developer, who is responsible for integrating machine learning models into applications. The canonical pipeline starts with the integration of code and models, followed by model training and deployment. One typical open-source solution for this scenario is Jenkins, as it provides a platform for continuous integration and continuous deployment of software applications. Then the service determines the dependencies between steps, resulting in a very dynamic execution graph that reflects the workflow of the pipeline. When each step in the execution graph runs, the service configures the necessary hardware and software environment for each step of the pipeline. The step also sends logging and monitoring information to its containing experiment object. When the step is complete, its outputs are prepared as inputs to the next step. Finally, the resources that are no longer needed are finalized and detached to prevent resource leakage and optimize resource utilization (Lazzeri 2022). This pipeline execution process is a critical step in ensuring that machine learning models are trained, validated, and deployed efficiently and effectively.

92

F. Lazzeri and A. Robsky

4.5 Summary In this chapter, we introduced some common challenges of machine learning model deployment, and we discussed why successful model deployment is fundamental to unlock the full potential of AI, why companies struggle with model deployment, and how to select the right tools to succeed with model deployment. The goal of this chapter is to guide every company throughout the AI life cycle and adoption. While the specifics will vary based on the organization, scope, and skills, the strategy should clearly define how a model can become an AI solution and can move from stage to stage, designating responsibility for each task along the way. The following questions (Ajgaonkar 2021) can also help to guide conversations as companies build the MLOps strategy: • • • • •

What are the company goals with AI? What performance metrics will be measured when developing models? What level of performance is acceptable to the business? Where will data scientists test and execute the machine learning model? How will the company create alignment between the development and production environments? • How will data ultimately be ingested and stored? • Who is responsible for each stage in the life cycle? • Who will build the MLOps pipeline?

References Ajgaonkar A., MLOps: The Key to Unlocking AI Operationalization, Insight Tech Journal, 2021 Bahrammirzaee A., A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems, Neural Computing and Applications, 19(8), 1165–1195, 2010 Bose B. K., “Neural network applications in power electronics and motor drives-an introduction and perspective”, IEEE Trans. Ind. Electron., vol. 54, no. 1, pp. 14–33, Feb. 2007 Lazzeri F., The Data Science Mindset: Six Principles to Build Healthy Data-Driven Companies, InfoQ, 2019 Lazzeri F., What You Should Know before Deploying ML in Production, InfoQ, 2022 Russell S. J., Artificial Intelligence: a Modern Approach, Upper Saddle River, N.J., Prentice Hall, 2010.

5 Unifying Organizations’ Machine Learning Vision

5.1 The Challenges of Working in Data 5.1.1 Scalability Data is growing at an unprecedented rate, and as discussed in previous chapters, businesses are struggling to keep up with the demands of scaling their data processing and modeling infrastructure. As more and more data is generated every day by the various telemetry systems, the ability to scale the operations of data science and machine learning becomes increasingly important. However, scaling data processing infrastructure is not as simple as adding more resources, and there are many challenges that arise when working with increased demand for insights and large datasets that constantly grow. One of the most significant challenges of scaling machine learning (ML) infrastructure is the cost. In Chap. 1, we discussed the different stages of a business. In the early days of a business, cost was being carefully measured and minimized. However, adding more resources to handle large datasets with rapid processing is expensive, and businesses must be careful to balance the cost of infrastructure against the potential benefits of scaling. For example, adding more processing power to target low long-term value customers might not always be the correct timely investment. Moreover, the cost of scaling is not just limited to the hardware itself but also to the cost of software licenses, maintenance, and operations, which might be hidden at first. Another challenge that arises when scaling data processing infrastructure is the need for increased performance, reliability, and coverage. As data grows, processing time can increase and data delays might go longer, which can © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 F. Lazzeri, A. Robsky, Machine Learning Governance for Managers, https://doi.org/10.1007/978-3-031-31805-4_5

93

94

F. Lazzeri and A. Robsky

negatively impact the user experience. Businesses must be able to handle increased ML demands while maintaining optimal performance, which can be a difficult balancing act. Scalability also introduces challenges around data consistency and data integrity. As data processing infrastructure grows, there is a greater likelihood of inconsistencies or errors in the data. If data is not being instrumented well or there are gaps with the processing infrastructure, coverage, which means whether the data covers all users/scenarios, might lead to false conclusions and predictions. In other words, these gaps can lead to inaccurate data analysis and insights, which can ultimately impact business decisions. Finally, scalability introduces challenges around data management and governance, which is the main theme of this book. As data grows, it becomes more difficult to manage, organize, and maintain. As discussed in previous chapters, businesses must have a robust data management strategy in place to ensure that data is stored, processed, and analyzed appropriately. Additionally, businesses must have strong governance practices in place to ensure that data is secure and compliant with relevant regulations. In conclusion, the scalability challenge is one of the most significant challenges facing businesses today. As data continues to grow at an unprecedented rate, businesses must be able to scale their data processing infrastructure to keep up with demand. However, the challenges around cost, performance, consistency, integrity, and governance must be carefully considered to ensure that scaling is done effectively and efficiently.

5.1.2 Development Environment for Data Scientists To be successful in driving business in the right direction and achieving the desired impact, data science projects need a fast iterative model with businesses to mine the data, hypothesize and explore solutions, build fast proof of concepts (POCs), test findings rapidly, and determine the correction required to the solutions before committing them to production. The lead times before production and of the POCs are important aspects of data science life cycle. They are productive, and their insights drive business in the right direction and help get closer to the desired impact. The iteration between the hypothesis phase, building POC to validate them, and testing the findings with business should not be hastily concluded in favor of going to production prematurely. A solid understanding of the problem, data, and findings of the POCs is an essential prerequisite to going to production with mature insights. Predictive Insights are very sensitive to data changes,

5 Unifying Organizations’ Machine Learning Vision

95

reflecting change in the business environments and ecosystems. For example, insights about the top N customers with propensity to move from one product to another depend on many factors including usage patterns of both products and cohort growth/churn for each product. The insights’ durability and time resilience should be seen as factors that can impact production readiness and production durability. As such, POCs are to be considered an essential part of machine learning products, not ephemeral products that need to be avoided or rapidly retired. In order to be effective, data science teams require: • Development environment • Preproduction environment • Production environment The development environment needs to have consistent access to the same authoritative data sources, that represent the single source of truth for a business process, which adhere to service level agreement (SLAs) on completeness, accuracy, and freshness. Data scientists should avoid using data aggregators that do not own or control the quality of the data they host and lack the ability to build SLAs for their completeness, accuracy, and freshness. In addition, data scientists should avoid using multiple sources for the same business process and insist on identifying and using a single source of truth per business process. Data scientists should avoid using multiple siloed development environments (i.e., DS dev boxes, multiple subscriptions, or sandbox environments), as these result in having inconsistent data, unconfirmed data, data that is out of sync and with different time grain for the different development activities, inefficiencies (repetition of collection, ingestion, processing, and storage), and poor data hygiene (i.e., privacy, security, compliance, and retention requirements). Data science teams should avoid using high-price tag systems for storing large amounts of data for exploration and investigation. Rather they should aim for efficiency and cost savings as much as possible and use platforms that favor cost-effectiveness than unnecessary high performance at a high cost in this phase. A preproduction environment serves as a testing environment for gathering insights from data before they land in production. Insights are tested for functionality (correctness), performance, efficacy (accuracy), and efficiency. In cases where the product is a live system with immediate customer impact, organizations strive to have a solution that is an exact replica (at a smaller scale) of the production system. However, we need a more modest replica of

96

F. Lazzeri and A. Robsky

our production environment to save costs without sacrificing our ability to test effectively. Finally, data science teams need two production environments to host their final models and work: • ML-based production environment, to leverage machine learning operations (MLOps) capabilities and maintaining models and accuracy over time • Non-ML-based production environment, to use the platform suitable for the use case. Most of our use cases do not entail analytical querying in real time (for which large clusters, compute, and storage are required). Rather, if the use case depends more on batch loading of insights and making them available for PUSH or PULL models, then the use of platforms such as Synapse is recommended as they can achieve the goals while maintaining cost-efficiency.

5.1.3 Getting the Right Talent Getting the right talent in the data science and machine learning field is one of the biggest challenges that companies face today. In any market condition, companies face the shortage of skilled professionals in data science (DS)/ML. The demand for data professionals has increased significantly in recent years, but the supply of qualified professionals has not kept up with the pace. As a result, companies find it challenging to recruit the right talent with the required skills and experience to work with data and in many cases hire the wrong talent for the right job. The landscape of the skill set is constantly changing. With new technologies, techniques, and tools emerging frequently, it is challenging for companies to keep up with the latest trends and ensure that their data professionals are up to date with the latest knowledge and skills. Traditionally, data science and machine learning were Computer Science Ph.D.s with an emphasis on machine learning; however, currently, there are way more different areas that don’t require Ph.D. or even MS in Data Science. Data is a broad field that encompasses various skills such as data analysis, machine learning, data visualization, data engineering, and data science. Finding a professional who possesses all the required skills is rare, and companies may need to hire multiple professionals to cover all the necessary skills. This requires continuous investment in training and development, which can be time-consuming and expensive.

5 Unifying Organizations’ Machine Learning Vision

97

Nonetheless, when a company finds the right talent for the right job, it faces competition for talent. Companies that offer attractive salaries, benefits, and opportunities for career growth tend to attract the best data professionals. In many cases, the top tech companies offer comparable packages, so it is the call for added values beyond compensation. Companies need to offer competitive compensation packages and create a positive work environment to attract and retain top talent. To summarize this, getting the right talent in data is a significant challenge for companies today. Companies need to address the shortage of skilled professionals, keep up with the rapid pace of change in the data landscape, offer competitive compensation packages, and hire professionals with diverse skills. Overcoming these challenges will enable companies to leverage data to make informed decisions, improve efficiency, and gain a competitive advantage in today’s data-driven world.

5.1.4 Privacy and Legal Considerations Data privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), require businesses to protect personal data and ensure that it is used only for specific purposes. Companies must ensure that they have obtained proper consent from individuals whose data is collected, and they must take measures to ensure that data is not breached or misused. There are many legal requirements that businesses must comply with, such as intellectual property laws, data retention regulations, and contractual obligations. Companies must ensure that they comply with these laws and regulations, or they risk facing penalties, legal action, and reputational damage. An example is Google, in 2019, which was fined €50 million by the French data protection regulator for violating the GDPR. The fine was issued after an investigation found that Google had not obtained proper consent from users for displaying personalized ads. We cannot overlook the fact that working with data can also pose ethical challenges. Companies must ensure that they are using data in a responsible and ethical manner. Companies must consider the potential consequences of their actions and ensure that they are not engaging in any discriminatory practices or using data to harm individuals or groups. There is a field called responsible artificial intelligence (AI), which we will not talk about in our book, but this is encouraged for further reading.

98

F. Lazzeri and A. Robsky

5.2 Managing ML/AI Projects Globally and Remotely 5.2.1 Remote Talent In recent years, managing machine learning (ML) and artificial intelligence (AI) projects globally and remotely has become more common. One of the primary advantages of remote employees is the cost savings that can be achieved. Companies can save money on office space, utilities, and other overhead costs associated with maintaining a physical office. Remote employees also tend to be more cost-effective as companies can hire employees from countries or regions where salaries are lower than in their home country. Moreover, companies can hire the best talent from around the world without being limited by geographical location. This allows companies to build diverse teams with different perspectives and skill sets that can lead to better problem-solving and innovation. However, managing remote teams also presents some unique challenges. Communication can be more difficult when team members are not in the same physical location, and it can be challenging to build team cohesion and culture. Managing time zones and ensuring that team members have the necessary equipment and infrastructure to work effectively can also be challenging. To overcome these challenges, companies can use various inclusive strategies, such as regular team meetings in the time zones that work for most employees, video conferencing with a push for turning on the camera, and collaboration tools. Establishing clear communication protocols and ensuring that remote team members feel included and valued can also help to build team cohesion. Additionally, companies must ensure that remote employees receive adequate training and support to perform their roles effectively. This can include providing access to online learning resources, regular performance reviews, and opportunities for professional development.

5.2.2 Strong Infrastructure with Footprint in Multiple Regions Another important aspect of managing ML/AI projects globally and remotely is having a strong infrastructure with a footprint in multiple regions. Having a strong infrastructure with a footprint in multiple regions can offer several benefits to companies.

5 Unifying Organizations’ Machine Learning Vision

99

First, it can help ensure that MLOps projects run smoothly and that teams have access to the necessary resources and support. This can include reliable internet connectivity, secure data storage, and robust cloud computing resources. A strong infrastructure can also help to mitigate the risks associated with downtime or other technical issues, which can have a significant impact on project timelines and deliverables. Second, having a strong infrastructure can help companies comply with data privacy and security regulations in different regions. As mentioned in the privacy section in the chapter, this is particularly important for companies that operate in countries with strict data protection laws such as the European Union’s General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA). Having a strong infrastructure that is compliant with these regulations can help companies avoid potential legal and financial risks. Third, companies can use various strategies such as providing remote teams with access to a centralized project management platform, offering training and support to ensure that team members are proficient in the tools and technologies being used, and establishing clear communication protocols. This would enable synergies to flow efficiently as it would have a centralized global view to manage the projects. However, it is not simple nor easy to establish a centralized view but could be beneficial if done correctly.

5.3 A Guide to Data Team Structures with Examples To effectively manage and leverage data, companies need to have a well- structured data team with multiple roles that work together seamlessly. In this section, we will provide a guide to data team structures with examples of multiple roles, including Applied Data Science team, ML Research team, MLOps team, BI team for reporting and dashboarding, PM team (inbound and outbound), and Data Engineer.

5.3.1 Applied Data Science Team The Applied Data Science Team is responsible for leveraging data to solve specific business problems. This team is composed of data scientists, who are skilled in statistical analysis, machine learning, and data visualization. They

100

F. Lazzeri and A. Robsky

work closely with other members of the data team to build predictive models, develop algorithms, and provide insights that help drive business decisions. The top data scientists’ responsibilities are: • Develop a deep understanding of business needs and leverage that knowledge to deliver best-in-class data and machine learning models. • Build and own a data science roadmap (from data exploration to model building and evaluation) for the Data Science team in collaboration with the Product Development group. • Build and improve cross-functional baseline; work closely with Data Science, ML Engineering, Data Engineering, and MLOps teams to optimize the strategy of cross-work. • Embrace a business and technical mindset to develop ML solutions at scale. • Review and provide guidance on existing research projects. • Use strong technical expertise to rapidly formalize, prioritize, and move forward with new data science initiative.

5.3.2 ML Research Team The ML Research Team is responsible for exploring new technologies and techniques in the field of machine learning. This team is composed of machine learning researchers and scientists who work closely with the Applied Data Science Team to develop new algorithms and models that can be used to solve business problems. The ML Research Team also collaborates with the MLOps Team to ensure that these new models can be deployed at scale. The top machine learning researchers and scientists’ responsibilities are: • Use statistical and machine learning techniques to create the next generation of tools that empower Amazon’s selling partners to succeed. • Design, develop, and deploy highly innovative models to interact with sellers and delight them with solutions. • Work closely with teams of scientists and software engineers to drive real- time model implementations and deliver novel and highly impactful features. • Establish scalable, efficient, and automated processes for large-scale data analyses, model development, model validation, and model implementation. • Research and implement novel machine learning and statistical approaches. • Lead strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment.

5 Unifying Organizations’ Machine Learning Vision

101

5.3.3 MLOps Team The MLOps Team is responsible for deploying and managing machine learning models in production environments. This team is composed of machine learning engineers who work closely with the Applied Data Science Team and ML Research Team to ensure that models are properly trained, tested, and deployed. The MLOps Team also collaborates with the Data Engineering Team to ensure that data is properly stored, managed, and processed. The top MLOps engineers’ responsibilities are: • Work in agile pods to design and build cloud-hosted ML products with automated pipelines that run, monitor, and retrain ML Models. • Design and build effective, user-friendly infrastructure to enable scalable, auditable, and maintainable machine learning services. • Support life cycle management of deployed ML apps (e.g., new releases, change management, monitoring, and troubleshooting). • Work as MLOps subject matter expert (e.g., develop and maintain enterprise machine learning standards, user guides, and release notes). • Design AI/ML apps and implement automated model and pipeline adaption and validation working closely with data scientists, engineers, project managers, and others. • Walk stakeholders and solution partners through solutions and review product change and development needs.

5.3.4 BI Team for Reporting and Dashboarding The BI Team is responsible for creating reports and dashboards that provide insights into key business metrics. This team is composed of business intelligence analysts who work closely with other members of the data team to ensure that data is properly collected, analyzed, and visualized. The BI Team also collaborates with the PM Team to ensure that reports and dashboards meet the needs of different stakeholders. The top business intelligence analysts’ responsibilities are: • Independently conduct statistical and quantitative analyses for assigned area. In partnership with business subject matter experts, lead the development of advanced statistical modeling and quantitative analysis projects. • Validate and review results to ensure they address identified business needs. Report results to senior leaders as needed.

102

F. Lazzeri and A. Robsky

• Prepare visualizations and reports and deliver results to management and/ or other leaders. • In partnership with business subject matter experts, provide recommendations for addressing observed outcomes. • Translate complex technical concepts and analyses to nontechnical audiences. • Partner with cross-functional leadership to define data-related requirements. • Create and manage the strategy and ensure data deliverables and processes drive business results for key segments of the data science organization.

5.3.5 Program Management Team (Inbound and Outbound) The Program Management (PM) Team is responsible for managing the data team’s projects and ensuring that they are delivered on time and within budget. This team is composed of program managers who work closely with other members of the data team to define project scope, set priorities, and manage stakeholder expectations. The PM Team also collaborates with the BI Team to ensure that reports and dashboards meet the needs of different stakeholders. The program managers’ key responsibilities are: • Implement communications standards across a portfolio of programs including executive and key partner communications. • Establish a reliable and visible cadence for program reviews, decision- making, prioritization, code, and model reviews. • Lead a governance structure that drives effective executive decision-making. • Ensure governance structure effectively exposes and mitigates dependencies. • Seek out and identify change management opportunities that increase program velocity affect multiple teams. • Define and manage a program portfolio of data science and machine learning products that target high business impact for the organization and product area.

5.3.6 Data Engineer The Data Engineering Team is responsible for managing the data pipeline and ensuring that data is properly collected, stored, and processed. This team is composed of data engineers who work closely with other members of the data team to build and maintain data pipelines that support the needs of the

5 Unifying Organizations’ Machine Learning Vision

103

business. The Data Engineering Team also collaborates with the MLOps Team to ensure that models are properly trained and deployed. The data engineers’ key responsibilities are: • Build and operationalize complex data solutions, correct problems, apply transformations, and recommend data cleansing and data quality solutions. • Design complex data solutions and architectures. • Perform analysis of complex sources to determine value and use and recommend data to include in analytical processes. • Incorporate core data management competencies including data governance, data security, and data quality. • Collaborate within and across teams to support delivery and educate end users on complex data products/analytic environment. • Perform data and system analysis, assessment, and resolution for complex defects and incidents and correct as appropriate. • Test data movement, transformation code, and data components.

5.4 Breaking Communication Barriers with a Universal Language 5.4.1 Strive for Clarity Clarity is the first area most employers seek to improve. It takes real dedication to keep your speech clear and—more importantly—concise. Be careful of overusing business jargon, department-specific language, or technical terms that all of your employees might not be familiar with (and get in the habit of defining those you do use). Experiment by reworking the last company-wide email you sent. If your email is on the longer side, try cutting it down by half and only including the most pertinent information. This will help you get a feel for using clearer language. With any luck, your continued efforts to clarify your speech will trickle down through your company culture as a whole.

5.4.2 Communicate Often Many organizations struggle to establish frequent and consistent lines of communication. In my own company, I’ve learned that no matter how much I communicate openly with my staff, they could always use just a little bit

104

F. Lazzeri and A. Robsky

more. Frequent department updates, company-wide messages from leadership, and frequently updated employee recognition systems are key to opening up conversation in the workplace. Most important are one-on-one check-ins, which help each team member feel valued, understood, and supported. Don’t wait for annual employee reviews to have a conversation with your employees.

5.4.3 Encourage Active Listening Listening is a skill that all of us, employers and employees alike, can work on. Inattentiveness and multitasking are all too common in the workplace and can only be eliminated by truly active listening. I recently sat in on a meeting where a department head invited attendees to close their laptops, put down their phones, and engage fully with the meeting. We established this habit for every company meeting since, and our conversations have been far more productive, creative, and inspiring. Find ways to help your employees be fully present while they’re communicating with others, and everyone will benefit.

5.4.4 Promote Transparency Dishonesty is the all-time enemy of effective communication and should be avoided at all costs in all areas of your company. That being said, many leadership teams still rely on disseminating information on a “need to know” basis— essentially hiding or obscuring information from other team members. Though not technically dishonest, this practice breeds a serious lack of trust and creates a serious barrier to communication. Aim to be entirely transparent with your team across all leadership levels, and create ways to openly acknowledge employees and provide feedback that keeps everyone in the know. Public scoreboards, custom uniform swag, and recognition events are an easy place to start.

5.4.5 Allow for Emotions Stress often runs high in the workplace, particularly in metrics-based companies such as call centers or sales development firms. If stakes are high and your team is overwhelmed, emotions are bound to come into play in many of your internal communications. Anticipate this reality and be prepared to welcome emotion and healthy coping mechanisms into your workplace. Expressing emotion is healthy for both the individual and the organization, and your

5 Unifying Organizations’ Machine Learning Vision

105

acceptance can help each employee feel heard and valued at work. Bonus tip: Being open to emotions at work will also help your employees and managers gradually become more open to giving and accepting feedback.

5.4.6 Insist on Face-to-Face Why spend 5 minutes composing (or reading) an email when a simple 30-second conversation will do? Most companies struggle to find the balance between an overabundance of email and other delayed text communications and not enough in-person connection. This is one communication barrier that’s quick and easy to solve: Simply invite your team to cut down on messages by at least 30% this week and rely on face-to-face discussion instead. In-person communication allows each of us to practice active listening and absorb the body language and subtext that accompany the speaker’s main message. It also fosters personal connection and helps build friendships, which is key to improving motivation in the workplace.

5.4.7 Understand Diversity Finally, it’s important to acknowledge that cross-cultural barriers to communication exist in most, if not all, American workplaces today. Whether you’re doing business internationally, have a percentage of your team who speaks English as a second language, or welcome employees with different backgrounds or ability levels, your company will benefit from endeavoring to understand and celebrate the diversity that surrounds us all. Train your staff on English as a second language (ESL) communication tips, research into helpful disability accommodations, and learn from your employees how to respect and honor their cultures and ethnicities in a productive and meaningful way through your communication.

5.5 How Data Storytelling Can Make Your Insights More Effective 5.5.1 Making Sure There Is an End-to-End Story Data storytelling is not just about presenting data, but it’s about crafting a story that connects with the audience. The best data stories are those that provide an end-to-end narrative that takes the audience on a journey from the

106

F. Lazzeri and A. Robsky

beginning of the data analysis to the final insights. The end-to-end story must be structured in a way that is easy to follow and understand. Here are some tips for creating an end-to-end story: • Start with the problem statement: The beginning of the story should explain the problem that you are trying to solve. This helps the audience understand the context and why the analysis was conducted. • Provide background information: Provide information about the data sources and how the data was collected. This helps the audience understand the quality of the data and the potential biases that may be present. • Analyze the data: This is where you analyze the data and identify any patterns or trends. It’s important to explain the analysis in a way that is easy for the audience to understand. “Know your audience” should be the go-to for every presentation; for example, if the audience is fairly technical, then show the technical elements of the analysis; however, if the audience has high-level leaders, do not show technical details (you can have them in appendix), and show business top line metrics. • Explain the insights: The insights are the key takeaways from the analysis. It’s important to explain the insights in a way that is relevant to the audience. Think one step ahead and imagine how the audience will act upon your insights, what questions they might have, and whether it is interesting enough for them to engage. • Provide recommendations: The recommendations are the actions that the audience can take based on the insights. It’s important to provide clear and actionable recommendations. The impact of the work can be measured by the actions taken, as discussed in Chap. 2. • Wrap up the story: The end of the story should summarize the problem statement, the analysis, the insights, and the recommendations. It should also provide a call to action for the audience. Another good ending could be a timeline for another update or a new project that started based on the current project. This shows continuation of the work. By creating an end-to-end story, you make it easier for the audience to understand and remember the insights. The story helps to provide context and meaning to the data, making it more relevant to the audience. It also helps to build a stronger emotional connection with the audience, which makes it more likely that they will take action based on the insights.

5 Unifying Organizations’ Machine Learning Vision

107

5.5.2 Data Visualization Data visualization is the process of presenting data in a visual format, such as charts, graphs, and maps. Visualization can help to simplify complex data sets and make it easier for the audience to understand the insights being presented. A well-designed visualization can help to highlight patterns and trends in the data, making it easier for the audience to identify key insights. When creating visualizations, it’s important to keep in mind the following tips: • Choose the right visualization type: There are many different types of visualizations available, such as bar charts, line charts, and scatter plots. It’s important to choose the right visualization type based on the data being presented and the insights you want to highlight. • Simplify the data: Visualizations should be simple and easy to understand. It’s important to remove any unnecessary data and focus on the key insights you want to highlight. • Use color and contrast effectively: Color and contrast can be used to highlight important data points and draw the audience’s attention to key insights. However, it’s important to use color and contrast sparingly and effectively, as too much can be overwhelming. • Provide context: Visualizations should provide context to help the audience understand the data being presented. This can be done by providing labels, annotations, and titles. • Tell a story: Visualizations should be used to tell a story and support the narrative of the data analysis. The visualizations should be integrated into the overall data storytelling process. By following these tips, you can create effective visualizations that support the overall data storytelling process. Visualizations should be used to support the insights being presented and should be integrated into the overall story being told.

5.6 Summary In this chapter, we have explored the various challenges of working with data. Scalability is one of the biggest challenges as our world evolves to a faster and larger computation data-driven decisions engine. With scalability comes the

108

F. Lazzeri and A. Robsky

challenge of getting the right talent in a timely manner. Once companies establish the infrastructure and talent, they need to make sure the data is kept under privacy regulations globally. Globalization of our remote workforce and infrastructure calls for attention to understanding challenges and solutions on managing ML projects globally by using established frameworks with clear structural organization. Once we have established the right team and process, it is important for the company to visualize the achievements, data, and opportunities with an end- to-end story that is easy to digest and inform data-driven decisions.