126 12 2MB
English Pages 148 [286] Year 2023
Strategy Guide for Automation Scale your business with IT automation
Magnus Glantz
www.bpbonline.com
Copyright © 2024 BPB Online
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor BPB Online or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
BPB Online has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, BPB Online cannot guarantee the accuracy of this information.
First published: 2024
Published by BPB Online
WeWork
119 Marylebone Road
London NW1 5PU
UK | UAE | INDIA | SINGAPORE
ISBN 978-93-55515-650
www.bpbonline.com
Dedicated to
My dad, who got me into computers and programming.
My colleagues over the years and the open source community, who taught me so much.
My wife, who pushed me along, supported and helped.
My son Vilhelm and daughter who are teenagers.
About the Author
Magnus Glantz has done IT automation for over 20 years, as an operator, developer, consultant, product owner and architect across industries such as telco, retail, FSI, cloud computing and public sector. He has successfully created internationally recognised automation architecture and strategy for many organizations. He has automated many tens of thousands of systems across most technical domains using many different proprietary and open source based automation solutions, including but not limited to cloud provider specific automation, Puppet, Chef, Terraform and Ansible. He sits on the board of Open Source Sweden, Sweden’s industry organization for open source, where he works to further the open source ecosystem in Sweden and in Europe. Today his main job is to advise on automation strategy and implementation for customers, working as a Principal Solution Architect at Red Hat, where he also specializes in DevSecOps and Ansible specifically.
About the Reviewers
Johnny Westerlund has over 20 years of experience from the IT industry, working in different roles. He's passionate about technology in general and open source technology in particular and how it can help organizations scale. Automation is a key capability to enable scale and Johnny has helped numerous of his customers adopt and implement automation technology.
Johnny is currently working as a Principal Solution Architect at Red Hat.
Ilkka Tengvall has expertise from various areas of IT. His professional career includes global companies R&D developer, project management, architecture and later consulting and sales. In his current role as associate principal solution architect at Red Hat he helps enterprise and governmental organizations design their sorftware and services development and delivery operations smooth, robust and efficient, wherever in hybrid cloud it is best to be implemented. Automation being crucial element of the designs.
Acknowledgement
Special thanks to Ilkka Tengvall, Johnny Westerlund and Peter Gustafsson. You are all awesome, so very clever, and smart, and I am lucky to have you as colleagues.
Further special thanks to Kenny Säfström, my best friend, who has been one of my most significant supporters. Without our trips into nature, I am not sure I would have had the energy to get this book done.
Also, thank you to BPB Publications for making sure this book got written. Without all the planning, process, and much reviewing, this book would have never seen the light of day. That is for sure.
Finally, to you who are reading this: If a book is written, published, and thrown into a forest, and no one reads it, was it actually written? Yes, it was, but I am very thankful for you doing so, nevertheless.
Preface
Creating organization-wide IT automation strategy and related architecture is exceedingly difficult to do. This is in-part because there is little literature on the topic, specifically when it comes to IT automation strategy, but also regarding general purpose vendor neutral architecture which can supports the strategy. This book attempts to fill this void and is based on the authors far reaching in-the-field experience of successful automation, automation strategy and automation architecture.
While this book is thorough in nature and including many details, it attempts to not assume that the reader is already familiar with the topics covered. As such, this book has three parts to it. First, an introduction part where the reader is introduced to the concept of automation. IT automation is also defined in a way so that the reader more effectively can understand what it means to organizations.
The second part is written to be a comprehensive guide to what should be in a successful organization-wide IT automation strategy. Step-by-step each core component of a successful IT automation strategy is reviewed. Including elements such as budget and ownership, strategy performance monitoring, tools strategy, skills development, and key processes for development and collaboration. The reader is taught the impact of both successful implementations and flawed or faulty ones. This allows the reader to look at their current organization and identify things that needs to be fixed with existing strategies as well. This also empowers the reader with relevant knowledge to do cost and return-on-investment estimations related to a strategy.
The third and final part is written to be an extensive review of a vendor neutral, robust, secure, and highly scalable automation architecture. At the implementation stage of the strategy, architecture needs to be created. There are furthermore many ways that architecture and strategy interact, which means that you need to understand both to properly understand either of the two. Topics covered in this part includes federated automation, High-Availability and Disaster-Recovery, security, and separation of duty concerns, key trends and Automation as-a-Service (AaaS). It includes useful architectural drawings that can be used to build solutions that also scales for large global organizations with significant security requirements.
Part 1: Introduction
Chapter 1: Success of To create a solid foundation which better allows you to understand and think about automation, this chapter takes you through how automation has influenced the world through history and features key examples from the past 200 years. This helps you to look at automation of complex IT systems with new eyes. We then move from the past struggles of initial industrialization to modern times featuring global enterprises, modern industry, and digitalized services. You will then review the many different key benefits of automation reaped by enterprises around the world, moving beyond simple reduction of cost and increased speed.
Chapter 2: Ways to Redefine Automation - Today in the world, the prevailing and traditional view of automation is that automation is something external to our IT landscape of applications or something
which at best is wrapped around them. We will review what is wrong with this view and what severe consequences it has for our effort to digitize our businesses.
Chapter 3: Key Elements of Implementing Automation Strategy - Strategy comes before architecture and implementation, if it does not, it is rarely strategy, but either some retrofit of what is already at the organization or something which deals with non-strategic concerns. Successful automation strategy has several distinct areas, which are important to know before diving into each one, as they are both related to each other and related to architectural and implementation. Commonly issues also bleed into one from the other. To decrease creation time and make the roll out strategy as problem free as possible, there is an optimal sequence in which it is done, which this chapter outlines – together with the distinct elements of automation.
Part 2: Creating Successful Automation Strategy
Chapter 4: Things that Matter: Budget and Ownership - While the way we operate in the IT space has become increasingly agile, when it comes to both budget and ownership, it is more common that we work in a more static fashion. Planning of budgets happens most often on a yearly basis. Ownership is commonly not reconsidered except for during more impactful re-organizations, when ownership of strategic components and technologies may change. This often means that we must navigate this organizational rigidity when both creating and implementing our automation strategy. For sure, it is not always that you have the luxury of creating things from scratch. With that in mind, in this chapter we will not only review key topics within budget and ownership, but also explore
what may happen when we get this wrong. This enables you to recognize and fix problems for an existing roll out of an automation strategy.
Chapter 5: Performance Monitoring of Automation Strategy there is no measuring in place which can tell us about the impact on the organization and our systems, we are driving blind. It goes beyond not knowing exact results though. If we step back and think about what we have learned in the past chapters, which is that automation is a prerequisite for digital transformation, that means that for each year of running an ineffective automation strategy, is a year where we have postponed the digitalization of our organization. Learn about what happens when you do not do performance monitoring and what objectives and outcomes you should look at.
Chapter 6: Selecting Right Tools and Platform - In the world of IT automation tools and platforms, the one ring to rule them all eludes most organizations, also in the overseeable future. There are many reasons why this is, such as that a singular automation platform for the complete organization would have to be able to manage and often integrate to all systems and software. At the same time, if we do not manage the growing complexity which is born out of having an ever-increased pool of different technologies, our scaling of IT (and our business) may easily grind to a halt due to increasing inefficiency, costs and time-to-deliver. We have some tough decisions to make when it comes to selecting the right tools and platforms and this chapter attempts to guide you through them.
Chapter 7: Approach to Automation Skill Development - Let us start by reflecting on something fundamental. All change starts with people. Even if that is so, in the world of information technology, we often end up talking about technology instead. To help you tackle skills development
related to your automation strategy, this chapter teaches how to understand, implement, and support skills development as a part of your successful automation strategy.
Chapter 8: Key Processes for Development and Cross-team Collaboration - The relationship between process and technology will be explored indepth in this chapter, including a review of the most important processes for development of automation and cross team collaboration. We will also focus on how we can make things more future proof, as changing how hundreds or thousands of people work is the most demanding thing you can do in an organization.
Chapter 9: Catering for a Digitized Future - Now that you know about the main components needed to be dealt with in your automation strategy, it is time to discuss current and future trends and how they may and will impact your strategy. The Greek philosopher Heraclitus is credited with the idea that the only constant is change. The broader question you should ask yourself related to this, as you construct your automation approach, is what change would disrupt your plans. Related to this, we will delve into a discussion about how an ever-heterogeneous landscape impacts us and how this can be dealt with in your automation approach.
Part 3: Automation for Architecture that Matters
Chapter 10: Scaling Up Automation to Organization-wide - There is one thing which alone breaks all systems, and that thing is scale. Your system can manage 5 million customers, but it cannot manage 5 billion customers rushing through the system requesting data and interacting. In this chapter you learn how federated automation can provide a solution to your scaling needs.
Chapter 11: Establishing High Availability and Disaster As automation becomes more fundamental to your IT landscape and digitalization strategy, many organizations automation journey arrives at the point where there is a need to re-assess the availability and recovery requirements for the automation platforms used. This is natural as lower automation maturity includes less important use-cases, no or lacking automation strategy and few standardized platforms. This chapter discusses common automation use-cases to be on the lookout for, when it comes to high availability (HA) and disaster recover (DR) requirements and outlines a general-purpose architecture for HA/DR, possible to apply to different automation platforms.
Chapter 12: Security and Separation of Duty Requirements - As a part of automation touching more systems in the organization, it is natural to assess the security of your automation. At the end of the day, automation gives you better security, so it makes sense that your automation systems touch your systems which have the highest requirements for security. At the same time, this means that if you have not hardened your automation systems, they can easily become the weakest link in your security chain. With all this said, this chapter delves into both security for automation and automation for security.
Chapter 13: Explore Automation-as-a-Service (AaaS) - There is a good reason this is the last topic of this book, which is that it is one of the final developments in organizations development journeys, going from opportunistic islands of automation to automated cross-organizational processes which enables new business or organizational capabilities. Automation-as-a-Service (AaaS) means that we provide a service which not only runs automation for others, but also helps to create high quality
automation by providing related services. Review a general purpose, vendor neutral architecture for AaaS, when done correctly it is exactly what the doctor ordered.
Coloured Images
Please follow the link to download the
Coloured Images of the book:
https://rebrand.ly/wb14i7p
We have code bundles from our rich catalogue of books and videos available at Check them out!
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at :
[email protected]
Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family.
Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at :
[email protected] for more details.
At you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
Piracy
If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author
If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit We have worked with thousands of developers and tech professionals, just like you, to help them share their insights with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions. We at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about BPB, please visit
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
Table of Contents
Part 1: Introduction
1. Success of Automation
Introduction
Structure
Objectives
Historical view of automation
Modern time automation
Benefits of automation
What successful companies do differently
Conclusion
2. Ways to Redefine Automation
Introduction
Structure
Objectives
The traditional view of automation
There is no automation, there are only apps
Conclusion
3. Key Elements of Implementing Automation Strategy
Introduction
Structure
Objectives
Main components of automation
Components of automation strategy
Budget and ownership
Performance monitoring
Tools strategy
Skills development
Key processes for development and cross-team collaboration
Sequence of implementation
Conclusion
Part 2: Creating Successful Automation Strategy
4. Things that Matter: Budget and Ownership
Introduction
Structure
Objectives
Budget management challenges
Traditional (incremental) budgeting
Zero-based budgeting
Driver based budgeting
Budget challenges
Importance of ownership
When things are not as they should be
Conclusion
5. Performance Monitoring of Automation Strategy
Introduction
Structure
Objectives
Why organizations are lacking performance monitoring
What happens when you do not do monitoring
Common key outcomes
Efficiency gains
Speed gains
Relationship with the business
Employee retention
Common key objectives
Organizational
Enablement
Efficiency gains
Capability gains
Conclusion
6. Selecting Right Tools and Platform
Introduction
Structure
Objectives
Tools, platforms, and languages
Automation tools
Automation platforms
Automation languages
One solution versus many and why consolidation can be detrimental
No return on investment
Delayed return on investment
Focus on platforms instead of tools
Using a saw to hammer a nail
Selection criteria for automation platforms
Hybrid compute support
Ecosystem
Integration
Security
Scalability
Sustainability
Conclusion
7. Approach to Automation Skill Development
Introduction
Structure
Objectives
The state of the modern IT workforce
Skill gaps challenges related to automation
Management
Automation creators
Enablement and training which scales
Commercial training offers
Train the trainer
Ad-hoc training
Conclusion
8. Key Processes for Development and Cross-team Collaboration
Introduction
Structure
Objectives
Relationship between process and technology
Changing and rolling out processes
List of key processes and how they should be performing
Cross-organizational collaboration
Documentation
Development
Integration
Change management
Conclusion
9. Catering for a Digitized Future
Introduction
Structure
Objectives
Current trend—cloud
What to consider
Current trend—statelessness
What to consider
Current trend—Internet of Things
What to consider
Current trend—artificial intelligence
What to consider
Conclusion
Part 3: Automation for Architecture that Matters
10. Scaling Up Automation to Organization-wide
Introduction
Structure
Objectives
Reflection on how different types of automation impact segmentation
Reflection on automation strategy and budget for segmentation
Organizational makeup and segmentation impact
Federated automation which scales
Conclusion
11. Establishing High Availability and Disaster Recovery
Introduction
Structure
Objectives
How communication breakdown and complex integration silently put HA/DR requirements on your automation platform
Common automation use cases to be on the lookout for
HA/DR specific requirements on automation tool chains
An example of common-purpose HA/DR architecture for automation platforms
Conclusion
12. Security and Separation of Duty Requirements
Introduction
Structure
Objectives
DevSecOps in the world of automation
List of must-have security controls for automation
Supply chain security controls
Vulnerability management
Role-based access controls
Logging
How to mitigate the cost of security
Conclusion
13. Explore Automation-as-a- Service (AaaS)
Introduction
Structure
Objectives
Managing adoption challenges
A minimal viable product
All the bells and whistles of a complete platform
Example architectural pattern for Automation-as-a-Service
Conclusion
Index
Part - 1
Introduction
C HAPTER 1 Success of Automation
Introduction
The world of automation is a difficult place to make it in, and it is full of enterprises and organizations with failed automation strategies and initiatives. For you as a creator, architect, or manager to successfully author, oversee, and execute a successful IT automation strategy and architecture, you first need a solid pair of glasses through which you can properly view and assess the things that matter in your enterprise or organization. In a complex IT world full of buzzwords and ever-changing trends and efforts, it is easy to lose track of what is important, what to focus on, and what to spend less time worrying about. If you are a creator, you can easily get lost in technology choices. If you are an architect, you can get lost in requirements and frameworks that no longer apply to today’s business challenges. If you are a manager overseeing things, you can get lost in all these things as people tries their best to create budget space for well-meant but misdirected programs of change. This chapter hopes to make things clearer to you.
To create a solid foundation that better allows you to understand and think about automation, this chapter takes you through how automation has influenced the world through history and features key examples from the past 200 years. This helps you to look at the automation of complex IT systems with new eyes. We then move from the past struggles of initial industrialization to modern times featuring global enterprises, modern industry, and digitalized services. You will then review many different key benefits of automation reaped by enterprises around the world, moving beyond simple reduction of cost and increased speed. At the end, we will
review what the enterprises that have succeeded with large-scale automation does differently than the many that fail.
Structure
In this chapter, we will discuss the following topics:
Historical view of automation
Modern time automation
Benefits of automation
What do successful companies do differently
Objectives
This chapter provides the reader with a good understanding of automation fundamentally, how it influences the world, and how to think about it. These things are paramount to be able to reap benefits.
Historical view of automation
Automation can simply be viewed as a task being done in an automatic fashion instead of a human doing it manually. From that perspective, bicycles automate the task of walking. Automation has since long been a key factor for disruptive change across markets and society, even though the word automation was first coined by the automotive industry in the 1940s. My favorite historical automation example starts with what many consider one of the most important inventions in modern times, the steam engine, which can be traced back to the start of the 17th century. The steam engine would become the cornerstone of another great invention later in the 19th century. This invention was the train. Before the introduction of trains, the only way to move people and goods over greater distances was by horse or by boat, where the latter would often not be an option. Trains enabled people to travel in a completely new way, moving thousands of kilometers in days instead of weeks or months. They also enabled companies around the world to cut their time-to-market by a factor of many. It is easy to imagine the business impact of this if your company made a living by selling products to a market some distance away.
Imagine if you used horseback to freight your products to market and your competitor was an early adopter of trains. Your competitor would not only be able to deliver goods many times quicker than you but also at far greater volumes. The risk that you would become disrupted would be high.
Let us have a look at the disruption in the century caused by the adoption of trains, as shown in Figure
Figure The world in the 19th century
Modern time automation
Later in history, during the century, the computer became another example of automation. Instead of humans making calculations using a counting frame or slide rule, computer programs can today almost flawlessly perform calculations over and over again. Instead of humans filing a piece of paper in a filing cabinet, computers can now file and transfer information in milliseconds. With these fantastic capabilities, computers have helped companies cut time-to-market and allow them to also increase the volumes of products possible to ship. Let us have a look at a prominent example, the distribution of videos. Computers revolutionized many things, for example, how videos and film are now distributed. First, we had humans visiting stores where video cassettes were sold or rented out. This developed into streaming media distributed almost instantly by computers to TVs, mobile phones, and computers in our homes or wherever we happen to be. There is a huge difference in time-to-market and distribution volume between the two. Consider what a video store would look like, which would feature the current content of today’s video and streaming services, and you quickly realize that such a store would not even be physically possible to stand up.
But let us not get lost in the complexity of computerized systems. At the end of the day, we automate because of the same reasons companies came to establish increasingly automated factories.
Benefits of automation
Time-to-market and distribution volume are two things that we see benefit greatly from automation. But there are more benefits, such as delivery precision. It is challenging for humans to perform even simpler tasks over and over again with the same method and result. Moving forward to more complicated tasks consisting of dozens or more steps, delivery precision suffers increasingly. Due to this, in order to produce large amounts of products that have similar quality and experience, automation is required. Because of the large number of manual tasks in the IT world, scale is increasingly a challenge at the heart of many types of Information Technology systems. Examples of such manual tasks may be to install a computer, install an application, patch a system, add a user to a system, add capacity to existing systems, or manage security and networking.
If we start and look at how scale is introduced into IT, the number of customers is a main driver of scale. The interconnection of physical traditional business places like a store or a sales office with IT then applies the same scale factor to IT. For example, a retail chain with thousands of stores will have several computers running in each physical location, which translates to many thousands of computer systems. Furthermore, when services are digitized, they can more easily reach more customers, meaning more scale. Another thing that creates scale in IT is data. More data means more storage, more computers to process it, more network services to transfer the data, and so on. If each customer generates a lot of data that is used in the service, that means scale. As companies digitize more of their services, the amount of data increases. An example is cars. Modern cars both consume and
generate increasing amounts of data, more so as autonomous driving services get more advanced.
The car needs to retrieve both static information like map data and process and send information about what is happening at the moment, such as if there is a traffic jam on the road we are traveling on. If we are not already there, the automotive industry will soon reach a point where the ability to scale IT will dictate competitiveness in the car market.
At this point, we must not forget that, just like in factories, automation grants the ability to scale in IT. Just like it is not scalable to manually create millions of beer cans, we cannot keep manually managing IT systems which are much more complicated than beer cans. The complexity is compounded as these IT systems often need to be maintained for decades.
The level of automation also impacts organizations’ ability to scale in less clear and indirect ways. For example, by affecting their ability to allocate staff. If you use qualified staff to automate an advanced task and provide an automated workflow by pushing a proverbial button—you can now have less skilled staff performing that task. This is becoming paramount as the availability of highly skilled resources in IT is a global challenge. Examples of this can be found in the last (2021) Report on Labor Shortages and Surpluses from the European Labor Authority, where software professionals account for a large part of the reported skills shortage.
In Figure we can view how the Report on Labor Shortages and Surpluses points out key occupations in the IT space:
Figure Report on labor shortages and surpluses
Similar shortages are reported across North and South America and Asia Pacific. The conclusion to make from this is that adding staff to solve scalerelated challenges in IT is an unlikely solution. Many organizations and companies of the world are digitizing, but there are simply not enough people for everyone to get the resources they need. Even less so when we talk about more qualified resources which can automate IT systems. But let us stay here for a bit and consider what is needed for someone to automate an IT system. It is both considerable knowledge about both the system to automate and applicable automation frameworks, which is a rare combination of skills. A majority of IT systems do not set themselves up, configure themselves to requirements and then maintain themselves automatically. This is automation which most often needs to be added using some additional system.
These highly skilled resources that can do effective automation are often needed in order to deliver new and increasingly complex IT solutions that are key to the business, often creating situations where companies have to make difficult prioritizations between automating and delivering what is in the pipeline. This leads us to another benefit of automation, which is the ability to cope with change. For example, changes to personnel. When highly skilled
resources change employers, you can reduce disruption if tasks performed by those employees have been automated. This means you can continue to deliver the same quality for some time until maintenance and an update of automation are required. This is instead of directly impacting your ability to deliver and operate key parts of your business.
Other types of changes that automation makes you more resilient to are changes in demand and supply. Take the example of a bank providing the service of a loan, and it used to be that taking out a loan involved a lot of human interaction, estimations, and meetings at physical bank offices. Today in many countries, loans can be provided en masse using completely automated workflows running on computers, allowing a bank to seamlessly scale out their ability to provide loans as the economy changes. The same applies to a lot of different organizations, from the government tax agency, which allows a majority of the population to file their taxes at the same time, retail stores which can scale out online shops, and many businesses which have customer services that can be made digital with automated AI bots. Overall, business agility increases in the company as more and more arbitrary tasks can be executed in an automated fashion. This is because rare and highly skilled resources can be reallocated more easily and not get stuck in the maintenance of existing complex or high-value systems.
Automation also indeed positively impact directly on organizational performance and reduces production and operational cost. The higher the degree of automation is, the easier we can scale out our business and, therefore, profits and market shares. For example, automation can greatly decrease the time it takes to scale out the capacity of online services. Automation can also reduce the time it takes to establish a new retail store, a new production factory, or a warehouse in a world where IT systems are an integral part of the physical supply chains.
Obviously, when standard IT operations are automated, the cost is also reduced. This is as system outages are reduced by automating standard changes in the IT landscape and as fixing outages can be done quicker. Furthermore, automation can speed up the time-to-market of any service which is partly or fully digitized.
Followed by Figure we will review what a lot of digital disruption looks like in reality:
Figure The world in the 21st century
Looking at companies and organizations that are now on an automation journey and those that have already arrived at the finish line, there is one important difference. That difference is how they view and treat automation in their organization. That change of view was forced by scale, just as it was in the past surges of industrialization. Because of the importance of scale in the modern world, my favorite way to ascertain if a system is sustainable is to apply scale. Ask yourself, can we do this task 10,000 times per day? All systems break at some point, which tells you a lot of things about the system itself, its relationships with the business, and more. A simple example is if the company’s Web frontend can process 100 concurrent users, we will have to
be able to scale it out in order to cater to an influx of over 100 users. Predicting an influx of new users is often complicated due to it being related to business developments or market changes. This scenario teaches us that if we can automatically scale the company’s Web frontend, we can simplify business development. This is as the business no longer has to move in lockstep with IT in case business developments significantly increase the influx of users, which means such business developments can happen faster, such as launching a new campaign. This can be considered business agility and shows in a good way what automation can bring to the table in regard to business capabilities. A more useful but more advanced exercise would be to put a strain on a business service and explore scaling of all associated IT solutions.
What successful companies do differently
If you are going to produce millions of cars in a year, you can no longer build the cars by hand, and this is what the automotive industry realized as they embraced automation as something beyond useful, but something which is as fundamental as the engines powering the cars. In the same way, when modern IT companies deliver IT services at great scale, such as social networks, streaming video, vast marketplaces, and financial or retail services— automation is the foundation that they stand on.
You need automation to be able to operate and, more importantly, maintain vast complex systems numbering in the tens of thousands, hundreds of thousands, or millions of systems. Otherwise, your delivery speed will keep on increasing as you scale until you come to a standstill. Otherwise, the lack of delivery precision will generate unhappy customers and frustrated coworkers. The lack of flexibility will have you lag behind critical market changes. Resource issues will decrease the retention of key personnel, and all the resulting organizational challenges make scaling to market demands impossible within any reasonable time frame.
It is not enough to just have computers, software, and data centers. It is not enough to hire consultants. You need to stand up and scale out complex environments, which include operating systems, networking, storage, security, observability, development pipelines, and test and production environments at a push of a button, or better, fully automatically and ondemand. When you can do that, IT stops being considered a cost and starts being considered an investment. IT can then become an integrated part of your business and go-to-market strategy. And this all comes from the simple
realization that automation is not a nice-to-have thing but the required fundament on which modern digitized companies stand on. This is what in the end makes all the difference, as it will allow you to create and execute an effective automation strategy and succeed with your IT automation.
In the following figure, we can review how scale enabled by automation relates to modern enterprise capabilities and how to lack of automation creates difficult-to-handle timelines:
Figure Scale and IT-related enterprise capabilities
Conclusion
Hopefully, by now, you have a clearer view of what automation and its core is. Any and all tasks are done in an automatic fashion instead of being done manually. That automation is something that we as human beings have used as the foundation for our modern world and often as a means to disrupt competition. Furthermore, that automation often raises as a solution to demands for scale, and the key to successful automation is simply that you need to see it for what it is a fundamental and nonnegotiable prerequisite.
In the upcoming chapter, you will learn how to further view and understand automation when zooming into complex IT landscapes that are not fully understood by any single person.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
Ways to Redefine Automation
CHAPTER 2
Everything great that has happened in the world first happened in some human’s imagination -Astrid Lindgren
Introduction
Automation is often viewed as something external and often nice to have. As we have reviewed in the previous chapter, this goes very much in contrast with what is required to succeed with organization-wide automation. In this chapter, the reader learns how to redefine automation for themself in a time where IT landscapes are so complex that they are not fully understood by any one single person.
Today in the world, the prevailing and traditional view of automation is that automation is something external to our IT landscape of applications or something which, at best, is wrapped around them. We will review what is wrong with this view and what severe consequences it has for our effort to digitize our businesses. Our view of automation has an impact on organizational structure, system architecture, and implementation and operations of systems and applications. For our effort to be able to scale, we need to redefine how we view automation so that we can properly and effectively organize. The reality is that we have fooled ourselves into believing that the many automation frameworks we work with are external to our applications. On a low level, this can be technically true, but this is what we need to forget. This way of thinking does not benefit our business. The reality is that it is time to wake up from the Matrix and realize that there is neither a spoon nor automation; there are only applications. Or, if you will, the opposite, meaning there are no applications but only distinct pieces of automation. Enabled with new insights, we will review how we could and how we should organize in a way that makes us scale far into the distant future. Failing to do so will
only make the inevitable re-organization more painful or, even worse, create a lock-in effect, which makes re-organization much more difficult.
Structure
In this chapter, we will discuss the following topics:
The traditional view of automation
There is no automation, there are only apps
Objectives
This chapter allows the reader to define what automation is in the context of information technology. This is the final building block, which the reader will stand on while designing, overseeing, and implementing successful automation strategies.
The traditional view of automation
First, many people view the IT landscape as a collection of applications or systems. Depending on how technical you are, you may view the IT landscape as a collection of applications running on computers. Or you may describe each operating system instance running on a computer as a collection of hundreds or thousands of applications running together, almost indistinguishable from the business application also running in that operating system. Let us review some common ways that people visualize IT landscapes.
In Figure we have three common visualizations of the IT landscape:
Figure 2.1: Different views of IT
The common denominator between these three separate ways to visualize IT is that we recognize different things, such as applications, services, or a virtualization platform. Our recognition of these items being separate from each other in a meaningful way is likely to cause us to try and organize using that. For example, we can do this by creating a Web shop team, a billing system team, a hardware team, a network team, and various application teams. The 500 different applications that make up a normal operating system are viewed as either being one thing or you recognize that because they are so intimately connected to each other, they should be treated as one thing.
Now, let us have a look at what modern enterprise automation systems look like. There is normally a distinct and different system where we create and manage the automation. Therefore, most companies and organizations create a separate automation team, which is no issue in itself. What is more important here is how the automation and the automation systems are viewed. The automation written to automate things on the various systems also lives in the central automation system and is, therefore, also viewed as something external to the systems, applications, or services being automated.
Let us have a look at how from a strict technical perspective, automation for systems lives outside of the system to be automated.
In Figure we visualize how automation lives outside of the automated
Figure: Simply, the automation for system 1 and 2 lives in the central automation system
This is where things start to go wrong from several points of view. One consequence of us viewing the automation for system 1 as separate from system 1 is that it feels natural for us to assign the automation of system 1 to a central automation team. This is history repeating itself, where companies and organizations for decades struggled with implementing centralized monitoring teams, which were responsible for configuring monitoring for every application. We also made the same mistake when it came to security, having centralized security teams, which were to be specialists in how to keep all the systems secure. When you think about it, it is obvious why this does not work. To create automation, monitoring, or security for a system, you need to be a domain specialist of that system. A central team can never be domain specialists for all things because that would require the central team to hire one or more domain specialists for each different thing. If we persist in this context and force these central teams to be responsible for the external
systems, only one thing can happen, and that is bad deliveries and massive bottlenecks.
Figure 2.3 follows an illustration of how the concept of a central team fails:
Figure 2.3: Central automation team becomes a painful bottleneck
If you just look at the preceding picture and imagine the number of people in the 30 new teams consisting of actual experts in those systems, it makes less sense to use the three people responsible for the automation system to do the work.
Another consequence of us viewing the automation for system 1 as separate from system 1 is that we do not recognize that availability requirements of systems automated often should be transferred to the system automating
them. And if system 2 depends on system 1, you may also argue that the availability requirements of system 2 also should be applied to the automation system. In many companies, availability requirements are passed from system to system, up to three or more levels, to ensure that the highly available systems are not taken down by a system they depend on, which happens to run off an IT engineer’s desk.
Figure 2.4 is an illustration of how the automation system’s availability requirements are worse than both system 1 and system 2, which is a common thing to see in modern companies and organizations:
Figure 2.4: By viewing automation as an external thing, we fail to see why the preceding can be wrong
What makes this discussion more difficult is that when companies and organizations in the past started automating things on a larger scale, they often started with automating things that were viewed as non-business critical, such as the creation of a new system. If such automation breaks, it
does not take down production, which was the normal argument; we just must do that manually for a limited time. Today though, failing to create new systems is often considered a business-critical event. In modern architecture, we often depend on being able to scale out capacity for peak demand, and to scale out, you need to create new systems. If we fail to scale out, the result is the same as if systems went down; a lot of customers are not able to get service. What makes this even more difficult is that the users of the “create a server” automation may not notify the owners of the automation system that their use now is business critical. This is especially true in larger organizations.
There is no automation, there are only apps
Now, it is time to realize that this whole time, there was no spoon. If you remember from our first chapter, we established that automation comes in many shapes and forms. One of those shapes is applications. For example, an e-mail application automates the job of writing a physical letter, walking with the letter to a mailbox where you insert it into a small metal slit and then wait for a potential reply. So, it is all just automation. But, due to how established the understanding of an application is, it is more useful to say that it is all just an application. Where we think about automation running in an external central system as an extension of our applications, just as we would view an application server and its database as one thing providing a service to a customer.
Let us try this out in the following figure:
Figure On the left-hand side is the old way of seeing things; on the right-hand side, we view automation as a part of the application:
Figure 2.5: Seeing automation of an application as a part of it can be especially useful
The +500 different applications that make up a normal operating system are viewed as either being one thing or recognizing that because they are so intimately connected to each other, they should be treated as one thing. The same thing goes for automation. Automation is intimately connected to the application, and that becomes more so the more we automate and digitize. Let us have a look at how previous issues we had when it came to an understanding of how to organize around automation, with central teams and confusion dealing with availability concerns almost disappear with this new view of things.
First, let us have a look at how we view the organization of resources around automation. Is it clear who should be responsible for the automation? As we view automation as a part of the applications, creating automation for the 30 new systems is simply viewed as doing development on those systems.
Figure 2.6 illustrates how one central team performing what is key development (automation) of arbitrary systems makes less sense:
Figure Illustration of what is wrong with the preceding picture
It comes naturally that the development teams for the 30 new systems take on the development of their applications. It makes little sense that the development teams outsource key application development to such a small team as the automation team, which does not have domain knowledge about the applications, and which are extremely limited regarding capacity.
Next, let us have a look at availability concerns in the following picture. It becomes painfully obvious that the automation system should have at least 98% availability, if not 99%, as system 1 is viewed as being run inside of the automation system. Again, this reflects how we would view the relationship between, for example, a Web front end and its database. If the Web front end has 99% availability requirements, the same should apply to the database. Take a look at the following figure for reference:
Figure 2.7: Suitable availability requirements are more easily understood than the preceding
When it comes to security, a new term was invented, DevSecOps, which was used to convince the world that pushing responsibility to the actual experts out in the development teams, was a clever idea. Too bad DevAutoOps does not have the same nice ring to it, as DevOps does not seem to have made the message clear enough.
Conclusion
In this chapter, we have learned how redefining automation as being a part of the applications they automate is useful to make it more clear how we should organize around automation and even what types of requirements we should be putting on automation systems. Doing this prepares us for a future where automation is significantly scaled out and much more business-critical.
In the upcoming chapter, we will review the overview and sequence of creating, overseeing, and implementing a successful automation strategy and automation architecture.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 3 Key Elements of Implementing Automation Strategy
Introduction
To have successful automation organization-wide, you need both a thought-through strategy, robust architecture, and systems that provide automation in a more central manner. Together the three form a foundation for success automation in the organisation. Without all, we fail. If you do not have a proper strategy, impact and return on investment will suffer, as only parts of the organization improve. If you forget key architectural requirements for your implementation, you are risking both the availability and security of the organization. With poor automation systems, we struggle with scale and widespread adoption due to a lack of functions and complexity. To understand the components which underpin automation success is vital.
Automation strategy in modern companies is often lacking. An example is wrong things being put into the strategy. It is common to see a focus on details around the execution of the strategy instead of strategic components, such as detailing what automation tools, which should be used instead of a strategy for tool selection. This common mistake not only turns the strategy into a non-strategic solution design document but also makes the approach fragile as it exposes it to the rapid change of our technical world.
At the same time, many implementations of automation are lacking. For example, it is common that implementations do not cater to businesscritical workloads. This may be because business-critical use cases were
not considered when the systems of automation were conceived. This can be explained by the absence of an organisation-wide automation strategy outlining business impact. It may also be because IT and business simply are not talking to each other.
There are often many different automation systems at the enterprise. In the absence of automation strategy and architectural requirements, these systems often suffer from scalability and complexity issues.
Strategy comes before architecture and implementation; if it does not, it is rarely strategy, but either some retrofit of what is already at the company or something which deals with non-strategic concerns. Successful automation strategy has several distinct areas, which are important to know before diving into each one, as they are both related to each other and related to architecture and implementation. Commonly issues also bleed into one from the other. To decrease creation time and make the rollout strategy as problem-free as possible, there is an optimal sequence in which it is done, which this chapter will outline—together with the distinct elements of automation.
Structure
In this chapter, we will discuss the following topics:
Main components of automation
Components of automation strategy
Sequence of implementation
Objectives
The chapter provides a proper overview of what components make up successful organisation-wide automation. It also covers an optimal implementation sequence, which reduces issues and implementation time.
Main components of automation
To have successful organisation-wide automation, we need strategy, architecture, and tools. In larger companies, this commonly involves various parts of the organization. Organisation-wide automation strategy is something that should be owned at the CIO or COO level, depending on the organization. A successful automation strategy is about business and wide-reaching digitization, which means that the owner of the automation strategy needs to be close to the business. If the automation strategy owner does not sit on the board of executives together with the CEO and CFO, it is unlikely that an organisation-wide strategy will be possible to implement. As we review the various parts of a successful automation strategy, this will become clearer.
The second part which underpins successful organisation-wide automation is architecture. This is something that should be owned at the IT architectural level. The reason for this is that it is likely that there will be several different systems in which automation is centrally provided. At the IT architecture level, we can ensure that all systems of automation abide by the same requirements. If we instead have engineers, product owners, or such deciding on key aspects, the risk is that different systems will cater differently to the business, making business development.
Third, we have various automation systems which provide automation to the systems they manage. These systems are owned by product or platform owners residing more centrally in the IT organization. It is
common for larger domains in IT, such as computing, networking, and cloud, to have different automation systems and owners, even though that is not required.
In Figure we have a depiction of how strategy informs architecture, which informs the implementation of different systems of automation:
Figure Key components of successful automation
Components of automation strategy
The fundamental element of successful organisation-wide automation is strategy. For larger companies with hundreds or many thousands of employees, it is rarely possible to succeed without it in place. A successful automation strategy will have some different elements to it, which we will outline following:
Budget and ownership
Performance monitoring
Tools strategy
Skills development
Key processes for development and cross-team collaboration
Let us have a quick look at each element and how they are related to each other.
Budget and ownership
Without budget(s) that cover the full ambition of the system strategy, the strategy is but a paper product, which is more likely to confuse and frustrate the organization than get something done. Because of how common it is to use more than one automation system, it is highly likely that more than one budget will have to be adjusted to cater to the strategy.
Who owns the automation strategy is important from several viewpoints, one is because of budget, and another is about viewpoint. Because of this, a successful organisation-wide automation strategy is owned at CIO or COO level.
Performance monitoring
To evaluate the success of the automation strategy, we need to set proper objectives and outcomes for the strategy. We are not doing automation to gain technical outcomes, but strategic ones, so target outcomes should be related to the business of the company or organization. Once we have meaningful outcomes to aim for, we can device performance monitoring for our strategy.
Tools strategy
Tools strategy is about defining strategic capabilities or elements of automation tools. The tools strategy is often misunderstood to mean the selection of specific technical tools or automation systems. As explained previously, defining specific products in the tools strategy is an effective way to make the strategy into something non-strategic, such as a solution design document. Furthermore, defining things which are sure to be fleeting in nature and quickly changing will make your strategy fragile, meaning you will have to change it every time a new tool comes out. Tools selection is better left to tested IT and solution architectural processes.
Skills development
We are currently in a world where the successful execution of our automation strategy will depend on skills supplied in many different technical domains. In order to implement an organisation-wide automation strategy, we will need to develop skills in-house, as we require vast amounts of long-term resources to succeed.
Key processes for development and cross-team collaboration
Organisation-wide implementation of automation is about collaboration. It is no longer the case that any single team understands the full stack of technology underpinning our business services. Because of this, it is key that we both have a common approach to the development of automation and have powerful cross-team collaborative processes in place, providing low-resistance collaboration across the complete enterprise.
Sequence of implementation
Because automation often is viewed as something detached from our IT landscape and often to be something that is not business critical, companies rarely follow the strict flow of creating strategy first, then creation common architecture, and then following up with the selection and implementation of tools and automation. It does not make it less important to follow the correct sequence. If you feel tempted to go full steam ahead, just select some automation products and get started; it can be good to understand that the domain of automation often undergoes significant changes, which turns automation products into legacy and otherwise causes challenges and issues.
A good example is the introduction of the public cloud, which is currently forcing many companies and organizations to re-evaluate their choices for automation tools. With a good automation strategy stating both that everything needs to be automated and that automation responsibility falls on applications teams—it would have been natural to immediately think about the ability to run automation systems in a hybrid cloud world as soon as cloud became a discussion. This would have both saved a lot of the pain related to these migrations and would have, at the same time, speed up cloud adoption.
Following is a depiction of how an automation strategy dictating development teams being responsible for the automation of their application would have made cloud adoption easier for a lot of companies:
Figure What automation strategy can do
The reasons why long-term strategic goals should guide our implementations are simply because it is exceedingly difficult to foresee how future technologies will develop in only a few years, whereas it is much easier to foresee what is important for your company or organization.
At organisation older than just a few years, it is most common that selection of automation tools pre-dates any organisation-wide automation strategy or common architecture. Because of this, various parts of our IT organization may have to revisit tool choices to match up to both the strategy and common architectural requirements. At the creation of a strategy and common architectural requirements, it is useful to halt sourcing of new systems until completion of these prerequisites to successful organisation-wide automation.
Conclusion
In this chapter, we have reviewed the different key elements of a successful automation strategy. That is, having a multifaceted strategy in place, creating key architectural requirements for your automation systems, and finally, having some automation tools and platforms for the organization to automate with.
In the upcoming chapter, we will deep-dive into what a successful automation strategy looks like, starting with budget and ownership.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
Part - 2
Creating Successful Automation Strategy
C HAPTER 4 Things that Matter: Budget and Ownership
Introduction
Creating an automation strategy is simple, as there are few known standards or well-known best practices. Ad-hoc or improvised strategies tested by baptism in the fire during their implementation often fail, though. We are now moving into the part of the book where we will identify the most important fundamentals for an automation strategy, which will increase the chances of successful implementation, also in larger enterprises with tens of thousands of employees.
In the previous chapter, we reviewed the fundamental components of a successful automation strategy, and we will now dive into two of these, which can doom the most promising strategy executed in the most promising environments. These two fundamentals are budget and ownership, items that need to be considered properly before strategy execution. Even when all other things in our strategy are on point, and we have an organizational environment that is very conducive to change, getting budget and ownership wrong can easily turn any automation strategy into a paper product that has only limited tactical effect.
While the way we operate in the IT space has become increasingly agile, when it comes to both budget and ownership, it is more common that we work in a more static fashion. Planning of budgets happens most often on a yearly basis. Ownership is commonly not reconsidered except for during more impactful re-organizations, when ownership of strategic components and technologies may change. This often means that we must navigate this organizational rigidity when both creating and implementing our automation strategy.
For sure, it is not always that you have the luxury of creating things from scratch. With that in mind, in this chapter, we will also explore what may happen when we get this wrong so that you can recognize and fix problems for an existing rollout of an automation strategy.
Structure
In this chapter, we will discuss the following topics:
Budget management challenges
Importance of ownership
When things are not as they should be
Objectives
This chapter provides insights into the importance of budget management and who owns the automation strategy and budget. In a world where budgets are decided on a yearly basis and getting things wrong can easily doom an otherwise great strategy.
Budget management challenges
There are some different approaches to budgeting, which have some unique challenges when it comes to handling larger investments, such as the ones associated with an automation strategy or digitalization strategy. For readers who are not familiar with common types of budgeting, here are some distinctly different and common approaches.
Traditional (incremental) budgeting
The most common way to budget is by taking the previous period’s numbers and incrementing them by some percentage. This is both an uncomplicated way to budget and an effective way to get continuity over some period of time, which makes it popular. Higher-ups in the organization still need to be aware of costs to prevent the budget from increasing year by year, no matter what.
Zero-based budgeting
In Zero-Based Budgeting a new budget is created from scratch each budget period. Each department in the company or organization needs to justify all their costs to get them into the budget. This is still a common approach when costs need to be controlled or cut.
Driver based budgeting
Driver-based budgeting starts with setting the sales budget and then assessing what is required from operations to achieve the sales goals in terms of people and equipment. A set of rules which governs the operational budget then allows automatic adjustment if sales goals later change. Linking sales numbers to operational costs reduces the ability of lower-level managers to include padding into various operational budgets.
Budget challenges
It is common that companies apply either one of the preceding budget approaches, a mix of two or all three different approaches, and other ways. In all common cases, larger-than-normal investments need to be justified by higher-ups in the organization. Of course, if the organization applies a more fiscally conservative budgeting approach, justifying larger investments often becomes more challenging.
To execute an organisation-wide automation strategy, there is a lot of cost that needs to be justified. There are two main types of costs. First, there are higher costs related to large central automation systems. These costs are commonly high enough so that they need to be cleared at the CxO level (CTO, CIO, COO, CFO, or CEO). The second type of cost is related to the common support systems required. That includes things such as version control systems for source code, artifact repositories, and CI engines for automating development processes and other things which may be missing. If these things are missing, that may end up becoming a sizable portion of the overall cost. The third and final type of cost is related to the automation of the applications across the IT landscape. Considering what we learned previously in the second chapter, the automation of an application should be viewed as a development of that application, and costs should be included as application development or maintenance costs.
Figure 4.1 is a depiction of the three main costs related to organisation-wide automation strategies:
Figure Different types of cost related to automation strategies
Another large budget-related challenge is the budget period. Normally that period will be a calendar year long with a budget input period measured in weeks or a couple of months. Because of this, it is common that budget decision-making and the decision-making regarding the automation strategy are uncoordinated with each other. The worst-case scenario is that this delays the execution of the automation strategy with a calendar year plus some months. The way to prevent this is to plan early so that this does not happen. Think about when the end of the budget input period is when starting to plan the automation strategy so that investments required for executing the strategy can be added in.
Importance of ownership
Now that we have established the budgetary nature of the automation strategy, it is time to discuss ownership. First, the importance of automation and the role it plays in creating the next generation of business services is poorly understood, also by people working hands on the various technical domains. This means that it is complicated to create an easy-to-understand justification for the larger investments needed for a widely impacting automation strategy.
Here are some examples which make it even more difficult. First, an organisation-wide automation strategy is an expensive enough venture regarding software and resource costs so that investments normally need to be cleared at the CxO level in the company. Second, opposite to popular belief, it is not easy to make a complete business case for an automation strategy. One reason is that to make a proper business case, you need wide-ranged and unfettered access to the business and operational areas of the organization. The kind of person who has that type of access is higher up in the organization. This means that if the automation strategy is not owned by someone higher up, the business case for the automation strategy can easily become flawed due to missing information and not showing all benefits. This, in turn, often leads to higher-ups reducing the scope of the automation strategy and, therefore, also the impact of the strategy. This is more often so when the organization uses a more fiscally conservative budgeting approach.
Another reason is that for some investments, such as a move from oldschool automation systems to more modern cloud-native automation systems, short-term ROI can be none, and long-term ROI is difficult to calculate, even with all information available. Short term, moving things already automated from one system to another brings less business use. But in the long term, it may be prerequisite for enabling the creation of future business services in a cloud-native world. At the same time, it is often not known what future business services can be created, a bit like we could not imagine all the things which would be made possible by the train at its introduction into the world. This means that when clearing a budget for our strategy, we sometimes need to take a leap of faith and trust that by modernizing and changing together with the world, we increase our chances for success in the future. Such a leap of faith can be difficult for higher-ups in the organization to take. Therefore, it is optimal that if someone higher up in the organization is not the owner of the strategy, they are at least an incredibly involved sponsor. There are more reasons why an organisation-wide automation strategy should be owned high up in the organization, and that is because of the impact that a proper strategy has on the business. A director responsible for the organization’s IT operations is less likely to have a deeper insight into the business of the organisation and may therefore be challenged by owning something central as an organisation-wide automation strategy.
When things are not as they should be
A common indicator of budgetary issues related to the automation strategy is that there are a lot of discussions and frustration about resources. When a budget is not properly funded, that risks killing the spirit of the very people needed to successfully execute the strategy—such as specialized architects and engineers. These people are often highly motivated as they get hired with the promise that they will lead or be a part of change in the company. When that is not delivered, these people tend to move on to a company where they can make a difference.
Furthermore, if the automation strategy is straight out failing to deliver specific capabilities or expected improvements, that can also be due to budget related issues. It is quite common that automation strategies do not have proper ownership and/or sponsorship, which will lead to the strategy’s owner having to “beg for scraps” from various budgets around the company. This is rarely successful and is guaranteed to delay implementation several years due to the difficulty of synchronizing the execution of the strategy across multiple departments’ budgets. This, of course, includes convincing several budget owners and their management that the strategy is something to invest money and resources in.
Conclusion
It is key that the automation strategy is properly funded. Otherwise, the most ambitious strategy becomes a paperweight or an e-mail that tells people to improve without them having any tools or resources to do so. This causes unnecessary frustration, which can impact employee turnover. Ensure that your strategy is either owned by or has a highly involved sponsor from higher up in the organization, often CxO level.
In the upcoming chapter, we will cover performance monitoring of an automation strategy. This is, of course, very much also related to budgets getting signed off as the strategy progresses.
C HAPTER 5 Performance Monitoring of Automation Strategy
Introduction
The domains of monitoring, measurement, and observability are starting to gain prominence as key parts of modern development practices. As a part of a new release of a customer-facing service, observations are made of different customer behavior to understand whether the release was successful or not. An example is observing purchasing-related behavior after releasing a new set of colors for our e-commerce site. Another example may be measuring response times between an application server and a database after having made an update to a database schema. Finally, if a website simply goes down, we have monitoring tools that notify operational teams. This all comes naturally to us, but when we are discussing more abstract things, such as an automation strategy, it is less common that any advanced measuring is implemented. If a website goes down and you do not know about that, people will question the sanity of your monitoring. At the same time, there are companies running ineffective automation strategies for years without having a clue about it. The problem is obvious when we compare monitoring for the downed website. When we hit the “nothing is working” scenario for an automation strategy, angry customers are not banging on the door, telling you that things are not working. Though, that does make performance monitoring for your strategy so much more important.
When there is no measuring in place that can tell us about the impact on the organization and our systems, we are driving blind. Though, it goes beyond not knowing the exact results. Now, step back and think about what we have learned in the past chapters. Automation is a prerequisite for
digital transformation, which means that for each year of running an ineffective automation strategy is a year where we have postponed the digitalization of our organization.
Structure
In this chapter, we will discuss the following topics:
Why organizations are lacking performance monitoring
What happens when you do not do monitoring
Common key outcomes
Common key objectives
Objectives
This chapter explains and outlines what successful performance monitoring of an automation strategy looks like, including common key objectives and outcomes. As well as we explore what happens when we do not monitor our strategy.
Why organizations are lacking performance monitoring
As long as we have had development of software, we have tested and monitored said software. It is even standard to see central teams responsible for monitoring in organizations. When it comes to business strategies, it is also common to see performance monitoring of them. This often comes naturally in enterprises as that type of information is featured in yearly reports. Also, investors often want to know the effectiveness of the company’s business strategies. If we do not know how to do performance monitoring of our business strategies, there are plenty of business consultancy companies that can help.
When we venture into the domain of IT-related strategies, though, we are not in the same position. The reasons for this are many, but among them is a lack of knowledge. When an automation strategy is owned by someone not well versed in performance monitoring of strategies, we are less likely to set it up. Look out for when people lower down in the organization owns the automation strategy, such as an architect, a product or service owner, or a line or middle manager in IT. As you come higher up in the organization or go outside of IT, leadership is more likely to understand that strategies need performance monitoring. Another reason for us not having any performance monitoring in place is that we do not have any standard requirements for automation strategies. A third reason is that an automation strategy fits poorly with other technology-related strategies and generally has more things in common with a business strategy rather than an IT strategy. A common technology strategy is that the company should use specific technology for a specific task. For example, servers
should be virtualized. Measuring such a strategy is simple. Just measure the total number of servers and see how many of them are virtualized and some related information. When it comes to automation, it is less clear what we should measure. What we consider an automated state is not as black and white as “do we use this specific technology or not”. Finally, there are not as many consultancy companies out there that focus on performance monitoring of IT strategies as there are for business strategies.
What happens when you do not do monitoring
When we do not do performance monitoring of our automation strategy, it becomes difficult to understand if the strategy is at all effective. That also means that any ROI calculations related to investments made as a part of the strategy become lacking or plain faulty. If we do not have accurate data to act on, decisions will instead simply be made based on gut feelings, politics, or, at best, qualified guesses. This, in turn, often leads to bad or sub-optimal technological choices or vendor strategies.
The performance monitoring of your automation strategy further provides significant insights into the state of your digital transformation. If you do not know what the state of automation is organisation-wide, you know less about the state of digital transformation at the enterprise.
The following figure is a depiction of how automation strategy, digital transformation strategy, and business strategy impact each other in a general way. Of course, the reality is more complex, but in a general way, automation is the foundation of digital transformation, which in turn enables and impacts business strategies. Take a look at the following figure:
Figure 5.1: How different types of strategies influence each other in a general way
Common key outcomes
The outcome is a fancy way of saying the result. The outcomes of the automation strategy are the results we are looking for. In this section, we will review common outcomes that we look for in automation strategies. To arrive at an outcome, you normally need to achieve several different objectives. Outcomes are the most important results we are looking for, and as such, they should not be too many so that we can focus better. At the same time, they should be coupled with digital transformation and our business.
Common outcomes for strategic organisation-wide automation strategies are as follows:
Overall development efficiency improved by X%
Overall IT operational efficiency improved by X%
The IT organization receives a customer satisfaction score of X or better from the business
Time-to-market less than X hours|days|weeks|months
Employee retention is X over Y year(s)
Common outcomes focus on four different things, efficiency gains, speed gains, the relationship with the business, and employee retention or turnover.
Efficiency gains
If we are not gaining efficiency when automating, that can indicate that we have not automated complete processes and still rely on humans performing manual tasks in parts of our processes which prevents efficiency gains from getting realized.
Speed gains
A key reason for digitalization overall is to improve time-to-market. In digitalized organizations, it is common to measure time-to-market, measuring from idea to product launch. But depending on the organization, we may also measure time-to-market from factory to store and the like.
Relationship with the business
As the automation strategy supports the overall digitalization of our enterprise, for example, our business, a successful automation strategy is expected to improve the relationship between the business and IT, making IT a more strategic partner.
Employee retention
This may for some seem counterintuitive. But in most organizations, as we automate, we remove a lot of drudge tasks and allow employees to focus on more important and often more fun things. At the same time, when doing automation, we are investing more in our employees with, for example, training and time to improve the state of things, which improves morale and further improves employee retention. If we are not improving employee retention while doing automation, that can be a sign that we have not communicated our strategy properly. It is common for people to fear that automation will make them redundant.
Common key objectives
The following are key objectives that we will follow in an automation strategy, which allows us to measure indications that we are on our way to deliver the overall outcomes:
Automation adoption team, dojo, or center of excellence created
Operational team for automation platform X in place
Architectural and production guidelines adjusted
Budgets aligned
All key personnel have been trained in automation technology X
X% of key personnel have been certified in automation technology X
All employees and consultants trained in the automation strategy
Employee satisfaction average at X
X% of IT processes automated
Y% of operational IT tasks automated
Z% of changes automated
Task X is now delivered in Y time
X% of incidents received automatic remediation
Y% of systems are fully self-healing | self-managed
X number of automation systems consolidated to Y number of automation systems
The act of provisioning and decommissioning IT systems does not include the involvement of any humans except for the end-user requesting the system
The act of scaling IT systems does not include the involvement of any humans
Common key objectives can be grouped into organizational, enablement, efficiency gains, and capability gains.
Organizational
There are several key objectives related to organizational changes. This includes the creation of support teams, such as an automation adoption team. If we are adopting a new automation platform, we require people to be in place to support it as well. In larger organizations, new technology requires architectural approval. More significant, though, is to adjust existing architectural and production guidelines so that they allow automation. It is common to have requirements for human controls in various IT-related processes, such as case-by-case approvals for both adoption of new technologies and any changes to existing systems. Of course, budgets need to be aligned with the strategy, which sometimes requires some organizational changes as well. Finally, we look at employee satisfaction to catch issues in the communication of the strategy and enablement of personnel.
Enablement
Without enablement in place, many people will not know how to automate their systems and processes, and all related items will suffer. This is considered basic hygiene for an automation strategy. With that said, depending on how enablement looks like, you can add more items related to this.
Efficiency gains
Upper leadership in the organization will often have a focus on efficiencies, as ROI calculations may be the foundation for short-term and long-term budget approvals. There are many ways efficiency can be measured, and one many distinct levels as well. It may be prudent to establish per department, team, platform, service, or task efficiency measurements. This may seem straightforward, but it requires a large amount of work, as you measure manual work and then technically measure the same things when they have been automated.
Capability gains
Measuring the arrival of strategic capabilities. Examples of such are selfhealing systems, fully automated provisioning and scaling, and systems. Also, self-operating systems, meaning all Day 2 tasks except for troubleshooting and root cause analysis, are automated.
Conclusion
In this chapter, we have learned some common reasons why organizations do not measure their automation strategies, and it may be because of both lack of knowledge and requirements for IT strategies. Furthermore, we reviewed suggestions for high-level outcomes and lower-level objectives, which we can use to do proper performance monitoring of our automation strategy.
The upcoming chapter is a major one, where we will dive into the selection of the right tools and platforms.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 6 Selecting Right Tools and Platform
Introduction
Constant change is a current constant in the business world. For example, over 90% of the companies featured in the Fortune 500 in 1955 were bumped out of the list by 2022. Change is also seen in the digital domain. Since the introduction of computers and software, we have had many disruptive changes to the digital domain, including how we work, how we package software, how we architect software, how we operate it, and where and how we run it. As there are no companies who has wholesale changed their whole IT landscape every time something new were invented or became popular, any organization that has been around for longer than a few years is running many programming languages, tools, application servers, databases, platforms, and operating systems. It is more difficult to manage many different things than many things that are the same. Because of this, when asked to cut costs, it is common that IT leadership goes on a journey to consolidate. The IT vendors of the world quickly noticed this and have for decades used consolidation as an argument to sell various IT solutions. But in the world of IT automation, the one ring to rule them all eludes most organizations, also in the foreseeable future. There are many reasons why this is, such as that a singular automation platform for the complete organization would have to be able to manage and often integrate into all systems and software. At the same time, if we do not manage the growing complexity, which is born out of having an ever-increased pool of different technologies, our scaling of IT (and our business) may easily grind to a halt due to increasing inefficiency, costs, and time-to-deliver. We have some tough decisions to
make when it comes to selecting the right tools and platforms, and this chapter attempts to guide you through them.
Structure
In this chapter, we will discuss the following topics:
Tools, platforms, and languages
One solution versus many and why consolidation can be detrimental
Selection criteria for automation platforms
Objectives
This chapter guides you through the most important things to consider when it comes to selecting automation tools and platforms. It outlines a healthy approach to consolidation and selection criteria for new platforms and discusses what to cover when you introduce a new tool.
Tools, platforms, and languages
To better understand consolidation challenges, we need to start by defining some key topics in the world of automation. When talking about automation in general, words such as tools, languages, and platforms are often used interchangeably. However, it is useful to think about these things as separate from each other. Let us review these useful definitions one by one.
Automation tools
Automation tools are simpler and more specific in nature. They are in general a program that run locally on one computer and which can perform specific automation tasks. As such, it is common to have many automation tools, and it is not necessarily a bad thing either. Using the correct tool for the correct task is often a good thing. Automation tools are often limited in the sense that they do not provide automation as-a-service, APIs, or GUIs. Furthermore, often these tools are accessed via command line interfaces and are executed by humans on a need-to-basis. They rarely have integration with surrounding systems such as identity and authentication systems, observability platforms, or ITSM systems.
Automation platforms
Automation platforms provide more broad general-purpose automation capabilities and allow automation across technical domains by integrating with many different automation tools. They are more complex IT solutions, often consisting of many different components, such as Web frontends and databases. They provide more advanced features, such as the ability to provide automation as a service via Web or API interfaces to both humans and machines, and integrate normally also with other systems such as identify, authentication, logging, and ITSM systems.
Automation languages
IT automation tools and platforms use different programming or scripting languages to describe automation. It is useful to view an automation language as a programming language. Automation languages are often unique to a specific automation solution, although this is not always the case. The difference from a normal programming language is that the language, in general, cannot be used to develop advanced applications but instead is used to describe a state of automation to their corresponding automation tools and platforms.
One solution versus many and why consolidation can be detrimental
Let us view the consolidation of automation systems through the lens that there is no automation, only apps. If we understand automation written as an extension of the applications we automate, this can be enlightening. For example, most organizations have consolidated the number of programming languages in which they write applications. But again, there is no organization that only uses a singular programming language. This tells us that consolidation is possible, but that consolidation to a single automation language is not only impossible but also makes little sense. The reason we have different programming languages in the first place is because they are good at different things. The same applies to automation languages and, in extension, IT automation platforms.
One common reason organizations initiate consolidation projects is due to a new solution entering the scene, or vice-versa, meaning that to consolidate, a new solution is procured. ROI calculations in both cases commonly include the replacement of existing tools or platforms. As covered previously in this chapter, another reason to consolidate is to manage the complexity and the sprawl of tools and platforms. With this said, the dangers of initiating large consolidation projects are multiple. So let us review them and see how we can mitigate related risks.
No return on investment
If the introduction of a new automation platform is not well-grounded, the worst-case scenario is that we spend a lot of time migrating automation from platform A to B just to arrive in a less efficient state, either because the new platform is less efficient, more difficult to work with, or the world of IT already has moved on to something better. Remediate this risk by doing a cross-organizational assessment of a proper pilot project for your new system. If you have not involved your organization’s many users of automation, including business, you are guessing what people want, and in extension, you are then putting the investment at significant risk. Also, follow the section in this chapter regarding “Selection criteria for automation platforms”.
Delayed return on investment
If you have introduced a new automation platform, and you are not specifically looking to get rid of excessive costs related to an existing solution, return on investment can sometimes benefit from you starting to focus on things that are not currently automated, rather than starting with a massive migration from the existing system to the new one. To mitigate this. If it is urgent to decommission existing systems, ensure that you at least also focus on automating new things. As an example, while moving heavy-duty automation related to technical standards and security baselines, also focus on creating new automation use-cases such as selfhealing capabilities for systems.
Following is a depiction of two different approaches to consolidation and how they impact return on investment. The left-hand side has a medium to long-term ROI, and left-hand side describes how benefits from new automation use-cases benefit ROI in the short term:
Figure 6.1: How different approaches to consolidation can influence return on investment
Focus on platforms instead of tools
A proper automation platform can wrap a host of different automation tools under one umbrella. It is natural that a large organization has multiple automation tools to solve specific tasks. It is not conducive to ROI to focus on consolidating all tools into one tool. That is not even possible to do today. If you focus on the integration of some tools instead of consolidation of all of them, that can save a lot of time, which you can then spend on automating new things and gain more efficiency and business value. Tools that are longhanging fruit can be consolidated into fewer ones. More challenging or complex tools can instead be integrated with a platform. The following figure illustrated the different approaches:
Figure 6.2: How different approaches to tools consolidation can improve benefit timelines
Using a saw to hammer a nail
Currently, in the domain of IT automation, there is simply no single ring to rule them all. If you force a single automation system to manage your complete IT landscape, you will inevitably spend time hammering nails with saws. At the same time, sometimes it may seem that you are heading off in the wrong direction trying to do that when what you need is training for a team that is not used to the new tool or platform. To mitigate this, always ensure proper enablement and training are in place. Another reason people try to use proverbial saws to hammer nails is that crossorganizational collaboration is lacking. If one department does not collaborate well with other departments or, due to other reasons, is not viewed in a good light, then others are more likely to create their own solutions, simply to be able to better solve their challenges.
Selection criteria for automation platforms
First, what we want to select to solve the challenges of the future are automation platforms. This means solutions that can provide automation as-aservice via APIs, CLIs, and GUIs. The solution which can wrap automation tools and integrate with your surrounding systems and processes.
When we approach the selection criteria, we need to consider some current trends. One of these trends we have already dealt with in this chapter. It is a seemingly ever-lasting change across IT and business. The second thing is the hybrid future of computing. Organizations have been moving out of their data centers for quite some time now to the public cloud, hosted data centers, and edge. At the same time, there is broad consensus that the future is not that the world is going to outsource all their computers to one of three global public cloud companies for the rest of the time. Technical developments, geopolitical developments, and IT is increasing as we build our business services, all indicating that the future will also require momentous change. While compute workloads are getting increasingly flexible regarding placement, the future is a changing hybrid mix of multiple places where computing happens.
Figure 6.3 is a depiction of how the hybrid future will require abstraction layers, which of course includes automation, to prevent full-stack snowflakes of operating systems, automation platforms, infrastructure services, applications, services, toolchains, organizations, and collaborative abilities— for each place of compute:
Figure The hybrid technology stack
Not creating abstraction layers for the various places of compute creates an unmanageable sprawl of technology, process, and organization and makes cross-organizational collaboration exceedingly difficult. Moreover, for the public cloud, not doing so creates a level of lock-in that we have rarely seen before in the history of IT, meaning exit costs rarely been seen before. Automation equals externalized functions of our applications, so if we run the cloud provider’s own automation, that means our applications stop working if we try to lift and shift them to a new place.
With this in mind, we can now review more future-proof requirements for a new automation platform.
Hybrid compute support
It is far from everyone who is yet to run workloads in the public cloud. At the same time, companies are just now discovering the edge, running software far from the comfort of data centers and public clouds. That means there is still time for a lot of organizations to avoid a lot of expensive legacies in the future.
It is common that when moving into the public cloud, automation platforms change as the current systems either do not support the public cloud or do not run in it. This further underscores the importance that your automation platform needs to support these new modern places of compute. When talking about support, that means several things:
Ability to integrate with services: Public cloud-type services come with a range of built-in services which you, in some cases, simply need to use to be able to run applications there. For example, network services are built into the offering. So, networks are now created by talking to your public cloud providers’ APIs. If you cannot integrate to these, you either need to get a new automation platform or use the cloud provider’s own automation, which locks you in. Furthermore, cloud providers may have their own preferred authentication systems as such.
Ability to automate technology Operating systems, applications, storage, network, security, and compute layers are not uncommonly both different and work differently in the different places of compute. If your platform does not support most technology where you are or will run services,
moving to a new place includes a potentially significant effort where you both need to run existing automation and some new automation in a new automation platform. This both creates lock-in and slows the adoption of a new place of compute.
Ability to run in your places of compute: Due to technical reasons covered later in this book, such as availability, latency, and security, you will need to be able to run the automation platform itself where you deploy your workloads. Again, if you cannot do that, you will have to get a new automation platform.
It is worth noting that across modern places of compute, such as public cloud and edge, open-source solutions are most common across the complete stack; due to this, it is also more common to find open-source automation platforms and tools there.
Ecosystem
As touched on in the previous section about hybrid compute support, the ability to support the complete ecosystem in the various places where you run your business is vital. It prevents a sprawl of new automation platforms and tools for each new place of compute and allows automation to become an abstraction layer, which allows things further up in the stack to not care as much about where it is running.
Different automation platforms work differently, but in general, the automation ecosystem consists of the following two main types of assets.
Automation many different things depending on the platform; these are the small programs that are doing the actual automation. They are called upon in user-defined definitions of what automation to do. Think about modules as tools in a toolbox.
Automation the different types of modules, automation creators define automation. As an example, one automation pattern is often using many distinct types of modules to achieve something. Think about automation patterns as instructions for how to use various tools to build a house.
Figure 6.4 is a depiction of two main assets in automation ecosystems. Modules and patterns. The pattern is how to get an application to run. It uses several fictional modules, including a module that transfers an
installation file to a server, then one which installs the applications, and one which starts the application:
Figure 6.4: A fictional automation pattern that runs an application
Out of the two, the automation patterns are more important, as some tools can be very versatile and accomplish many different tasks. In the world of automation, you always have access to many tools; what is difficult is getting them to do something together in an automated fashion. As an example, just because you have access to all the tools required to build a simple house, such as a hammer, a saw, and an axe, does not mean you can do just that. Due to this, having access to patterns that describe the complete process of automating different things is of high value.
The ecosystem is one of the most important things, as having access to a lot of automation patterns allows you to save a lot of time. Having to learn by trial and error how to automate something can be very time-consuming and error-prone.
Open-source automation platforms are currently king in this place, as they feature significantly richer ecosystems. These ecosystems are often open, allowing anyone to upload their own automation patterns. As an example, the world’s most popular open source-based automation platforms have tens of thousands up to over a hundred thousand automation patterns, which can be downloaded from the internet, free for use. In the long term, AI technology may offset this. But again, open source also rules in the domain of artificial intelligence. If the ecosystem is the most important thing for you, your platform will be open-source-based. You should require modules and patterns to support most of the things you need to do. Modules missing means you will need to do development, patterns missing means you will need to do a trial and error to figure out what works, worst case, you discover that modules are missing.
Integration
Integration is related to the typical systems which you need to integrate with. Example of commonly required integration for an automation platform is as follows:
Identify and authentication Most organizations already have identity and authentication systems in place. To avoid having to re-create users and groups, you need to be able to connect to an existing system where information about your users is kept.
Credential and secret vaults: If you have a central system that stores credentials and or other secret information, you may have to integrate to them.
Logging systems: To accomplish goals for observability, monitoring, security, and ability to audit things centrally, you need to integrate with logging systems.
Monitoring or alert management systems: If something goes wrong, being able to integrate to a monitoring system. Of course, this also means that it is good if your current monitoring systems can monitor your solution out of the box.
Version handling systems (code repositories): Automation is code and should be treated as such. Automation should be possible to store in your version handling system.
ITSM systems: Many organizations still have requirements regarding IT service management, including creating a ticket containing a description of what is going to be done before a change and closing that ticket when the change has been completed. If that is a requirement for you, ensure that your automation platform also can integrate with your ITSM system. If there is no off-the-shelf integration to be found, ensure that one can be created and assess the complexity of doing that.
Security
Your automation platform is going to have access to many parts of the enterprise. That the security of your automation platform is paramount. Security requirements can be divided into the following two different areas:
Security of the This includes the vendor’s ability to fix security issues in the product, and your ability to patch the system on a regular basis (think, once a week rather than once every quarter). Also, the use of encryption and the ability to harden the system according to security standards, which are important to your organization.
Security of the systems This includes if there is software that needs to be installed on the managed systems; if there is, you impact the security of the managed systems by extending attack surfaces. Also, how the information to or from the managed systems is protected, including credentials and tokens used to access them.
Scalability
This is a difficult one, as vendors do not always have information about scaling their systems to larger degrees. If your vendor has not tested to manage thousands of systems, and that is what you have, include a scale test in your pilot project. The upside of the public cloud is that anyone with a credit card can spin up thousands of virtual machines and other devices for testing purposes. A few hundred dollars in public cloud cost is cheap compared to finding out that your selected platform has massive scalability issues.
Sustainability
When selecting an automation platform, ensure that you evaluate the sustainability of your vendor and the solution. In a domain so fluid as IT, it is easy to accidentally jump into already sinking ships. In automation, the ecosystem is king, so except for reviewing financial numbers, ask about the number of customers and success stories where the platform has been used at scale. For evaluating the sustainability of a company, look at the following:
Balance Be on the lookout for liabilities.
Income and cash Income, revenue, and where the company spends its resources.
Annual Here, you will normally be able to review the financial condition of the company and be able to see what priorities the company has.
Reports from the third party: Research and consultancy companies. Is the product considered competitive? Outside of that, do not focus too much on these reports.
Number of This feeds into the automation platform’s ecosystem’s health and is especially important.
Large-scale success You want to be able to see that people have used the platform at scale, cross-organizational.
Number of developers: Working on the product (this is normally not public information but is key).
Employee turnover numbers: How long people stay in the company.
As open source-based solutions are common in this space, ensure to evaluate the health of any open source project used. Open source projects cannot be evaluated as companies and need us to look at some different metrics. In some cases, we can also review items of interest that often are not disclosed by companies, which further reduces risk. Here are some key things to be on the lookout for:
The number of people contributing to the project. This assesses risk; if there are just a few people contributing to a project, that is a big red flag.
Number of companies contributing to a project. This also assesses risk. If there is only one company backing a project, the project can go stale if the company stops contributing.
Number of users and or Projects do not always disclose this, but sometimes they do. Sometimes it is possible to review this by looking at third-party repository information.
Did significant contributors recently stop contributing to the project? That means the project has lost a lot of knowledge and may get challenges to maintain contributed code.
Do you have developers who can dedicate time to contributing to the project? If you are not paying for support from a commercial vendor, then this is paramount. Contrary to widely held belief, there is no such thing as a free lunch. If you are not getting support from someone, you need to be able to support the software yourself. That includes fixing bugs, security issues, and backporting patches. Even if you are getting commercial support for the open-source solution, contributing enables you to get an edge by gaining influence over the roadmap and building up skills.
Outside of looking at the open-source project’s web pages, there are several different tools that you can use to evaluate open-source projects. Sometimes the code repository used will provide information. Distribution hubs for packages such as npm and pip will provide download statistics. You can also use third-party tools for analyzing and extracting useful information from code repositories.
Conclusion
Try to find a balance between consolidation and integration of automation tools and platforms. As stated, there is less business use in moving existing automation from one system to another. When selecting a new automation platform, ensure that it supports the workloads of the future and that it allows your business and your IT solutions to move between various relevant places of compute. The ecosystem is the most important system of any automation platform, but do not forget to assess things broadly.
In the upcoming chapter, Approach to Automation Skills we will learn about how to build skills that allow your automation to scale.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 7 Approach to Automation Skill Development
Introduction
Let us start by reflecting on something fundamental. All change starts with people. Even if that is so, in the world of information technology, we often end up talking about technology. We often look out over the IT landscape and conclude that various technologies, such as mainframe computers, monolithic application architecture, and so on, are legacy. This is often done at a higher level, where several types of IT-related architects identify and communicate organisation-wide which technologies are legacy and should be migrated away from and which technologies are modern and current and should be preferred instead. It is currently uncommon to see the same type of focus and clarity when it comes to different types of skills or skill levels. It is often taken for granted that when modern technology is adapted, so are all related skills, but that is rarely the case. Our lack of focus on skills has many different detrimental effects, which we will review in this chapter. By just spending a fraction of the money we are indeed spending on various technologies, the gains are many, and it can be argued that skills development is a prerequisite for many types of technical change. It most definitely is the case when we are rolling out and executing an organizational-wide automation strategy. The reason for this being is that to fully automate, we need many of the people involved in the creation of business services, from management, development, and operational departments, to all start doing things differently. What makes this a more complicated challenge is that different employees will require distinct types of skills development. To help you tackle skills development related to your automation strategy, this chapter
teaches how to understand, implement, and support skills development as a part of your successful automation strategy.
Structure
In this chapter, we will discuss the following topics:
The state of the modern IT workforce
Skill gaps challenges related to automation
Enablement and training which scales
Objectives
The purpose of this chapter is to guide you through the common state of the modern IT workforce, teach you how to tackle the most common skills-related challenges, and what training and enablement which scales across an organization looks like.
The state of the modern IT workforce
A vast majority of the current modern IT workforce has not worked in companies that are fully digitized and all automated. Large companies still outsource to low-cost consultancy houses to increase cost efficiency. They work in companies where there is no serious and successful automation strategy. Furthermore, outside of actual development teams, especially in the Operations part of IT, there are a lot of people who work in IT who are not developers or even people who understand development or development practices. They are often people who have good domain knowledge of various technologies but who have no or little experience of automation in that domain. As an example, people who work with the organization’s central storage services but who are not able to build storage-as-a-service for the organization. At the same time, they can tell you all about the central storage array, the efficiency of distinct types of disk drives, and data replication challenges. Though, this challenge does not only relate to the people on the floor; sometimes, the issues are more severe in the organization’s management layer. People in management are older and less often have an understanding or experience of development. They do not have to perform development, but if they do not have a good understanding of what a massive challenge it is to implement an organization-wide automation strategy, things will fail because of many reasons already covered, such as budgetary ones.
Out of the many selection criteria for automation platforms that were dealt with in the previous chapter, there is one elusive feature that can help us with the described challenge of the modern IT workforce, and that feature
is simplicity. To automate your complete organization, a lot of people who have little or no experience with development or organization-wide automation need to do just that. This should be top-of-mind when you consider the enablement part of your automation strategy.
Skill gaps challenges related to automation
The skills gap which needs to be closed differs depending on the role. Before we dive into the different challenges, there is some important context to cover. The context is that by doing automation, we are inevitably driving up complexity for people. Not always because what we moved to is more complicated, but because we changed how things work, and that means a lot of people now do not know how things work.
The following figure depicts significant transformations in the world of IT during the last plus 20 years across three areas: process, application architecture, and compute—how we run applications and infrastructure where we keep it all in:
Figure 7.1: Significant transformations related to IT in the last plus 20 years
It is important to consider that the implementation of your automation strategy is not happening in a vacuum. By automating the complete organization, you are going to change how a lot of things work, which means that, for some time, that drives up the complexity as people learn and get used to how things work. At the same time, a lot of other changes are happening across your organization. Development teams are learning new programming languages and frameworks, operational teams are learning about new types of compute and infrastructure, and everyone is working according to new processes. If enablement overall has not been properly prioritized, that compounds the challenge, as you then are adding a lot to an already significant existing pile of things people need to learn. With this said, let us review skill gaps challenges specific to different organizational roles.
Management
It used to be common to hear managers at all levels boast that they did not understand IT. Those days have ended. The key to being able to support or collaborate with others is to understand as much as possible about those people. This is one of the things which underlies ways of working, such as DevOps. Developers need to understand more about operations and vice versa. More specifically, you need to understand how the things you do affect specific things which other people do. If you are in management and have budget responsibility, ensuring that there is room in the budget for enablement and that people have time to do that enablement is key. It is quite common to see management commitments to enablement go out the window because some projects did bad planning and now need all hands on deck to keep that deadline of yesterday. Moreover, this author has stopped keeping count of the number of times that he has seen a multibillion-dollar organization acquire software for millions and then spend zero time and zero dollars on training. With this said, the three main challenges for management are as follows:
Ensuring that the team has time to do enablement
Ensure there is a budget for enablement
Ensuring that the team does do enablement
Except for making sure that there is time and money for enablement, management needs to ensure that people take time and do the enablement needed. To help with this, the automation strategy must include
measurements for enablement to ensure people get trained. This helps to send a message that enablement is not optional but vital.
Outside of this, management themselves needs to be enabled so that they properly understand automation, what it is, and what role it plays in a modern digitized organization. If this last item is forgotten, everything else may fall, so it cannot be understated how important it is. There will not always be a central budget that can take all the costs related to the automation strategy, which means that depending on the budgetary approach, there may be a lot of work to ensure specific budgets are adapted to meet the challenge.
Failing to close automation-related skill gaps in management includes but are not limited to the following:
Insufficient budgets to reach objectives and outcomes
Widening the knowledge gap which makes it difficult to make the right decisions
Automation creators
People who are automation creators are often people who, more than others, are faced with a lot of other new things, such as changing application architecture, ways to run applications, and new infrastructure. If there is an enablement deficit, that means some people may first need to learn how to walk before they learn how to run. Learning a new automation system can be compared with learning a new programming language. When approaching enablement for automation creators, it is useful to adjust enablement for the two main types of automation creators.
People who are not developers. If the person who is going to create automation is not currently a developer, which means this person will need a lot of general knowledge about development tool chains and best practices as well.
Developers. People who are developers or who have previous experience may be bored with a lot of general-purpose enablement and may want to be able to get hands-on quicker. Ensure that there is fast-track enablement for those who want that.
Failing to close automation-related skill gaps for automation creators includes but are not limited to the following:
Delayed time-to-deliver and time-to-market
Reduced savings for automation due to inefficiency
Increased business risk due to lack of best practices in use
Automation users
An automation user is not developing automation but is someone who uses it directly or indirectly. Of course, there is a line here to be drawn; for example, a bank teller may not need to understand what keeps the computer systems updated. Automation users do include the following:
Business developers who use IT to build services
All developers who do work directly with the automation platforms
Services and product owners
Architects
Project and program managers
All operational staff—including people in the support organization
If you are unsure who needs enablement, it is better to include more people that need that; otherwise, the reason is that it is sometimes difficult to ascertain who could use the information and who could transition to become a contributor and automation creator.
Failing to close automation-related skill gaps for automation users includes but are not limited to the following:
Reduced overall impact of automation, including reduced savings and efficiency due to automation not getting used as much as it could.
Widening the knowledge gap between people and departments which causes collaborative issues.
Diversion from the selected technology path, creating islands of automation that are not aligned with your strategy.
Enablement and training which scales
Implementing enablement and training for large parts of your organization may seem like a daunting task, but there are ways things which can make it easier.
Commercial training offers
If the automation platform(s) you have selected come with commercial offerings for training, that is useful. If you are training a lot of people this way, it happens that there also are commercial offerings where trainers can come onsite to do the training. This helps you to reduce travel-related costs and can, therefore, train more people with your money. It may also reduce the amount of time that people are away from work. If onsite training is done, it is important that management makes it clear that this does not make it OK to skip class and do other things. It can be helpful to communicate this in general, that if someone is in training, they are not available for other tasks.
This is the more expensive option, but also the most effective one as everyone gets their learnings from professional trainers who are subject matter experts. By far, this speeds up adoption and enables benefits earlier rather than later.
Train the trainer
A version that can help you reduce your budget, compared with everyone getting commercial training services, is to train a smaller internal team of trainers, which then does the training of the larger mass. When this works as intended, there are extra benefits as well, such as giving people who engage in the automation strategy exposure and thereby creating champions for your cause, and people are more likely to listen to a trusted colleague rather than an external resource. It can be immensely powerful if the trainer can give direct examples from within the organization.
Challenges related to this approach are as follows:
If the trainers you are training do not already have experience with the subject matter, they need time to build relevant experience. This can be done by engaging and involving these people early in the process of creating the automation strategy.
Trainers are going to be valuable resources when it comes to several other things as well, not just enablement. After they have been enabled, keeping their schedules free can be extra challenging. Also, keep in mind that people who are suitable for the job may already be valuable resources, which may be difficult to take away from their normal job. Make sure that the people involved are properly allocated to do the task, meaning that it should not be enough that their manager’s manager needs help with something.
Ad-hoc training
So that there is no confusion, ad-hoc type approaches to enablement will fail, and in the best-case scenario, it will only reduce implementation speed, efficiency, and ROI of your effort, but more likely, it will cause your automation strategy to fail. Avoid having this approach.
Conclusion
Closing the skills gap as you roll out your automation strategy is paramount for success. You should have a similar approach to skills development as you have to application development. Consider what is legacy and what is current and preferred when it comes to skills and knowledge. Make sure you professionally train and enable your organization, including many more people than just a few automation creators. If you can afford to, engage with commercial training offers. If there is no way to clear a budget for this, use the train-the-trainer approach to scale out enablement and training. Do not forget to measure trainingrelated objectives and outcomes in your strategy to ensure things are going as planned.
In the upcoming chapter, we will go through something related to skills development, which is a key process for development and cross-team collaboration.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 8 Key Processes for Development and Cross-team Collaboration
Introduction
The precursor to some of the most significant changes in IT as of late has been a change to our ways of working. As an example, the Agile and DevOps movements pushed forward new architecture and recent technologies, such as public cloud, microservices, and container technology. These modern ways of working keep being defined for a lot of the modern technical stacks today. It is difficult to imagine using monolithic application architecture without APIs when developing something new, and one reason is that that does not as well support agile and collaborative ways of working.
There are many good reasons for spending time studying the role and impact of processes, but most chiefly because processes are the roads that your automation strategy drives on. By not having a minimal required set of processes, the implementation of your strategy is like asking all workers in a city to build their own required roads, bridges, and tunnels.
The processes related to collaboration are the most important ones, but with that said, it is not always clear which processes impact collaboration and how they impact it. In some cases, the impact is indirect, such as how an old fashion take on IT security can make the necessary interconnection of systems difficult or just forbidden, which will wreak havoc on collaboration. Today, perhaps because of this, many organizations which claim to be working in an agile manner are missing many of the fundamentals required for your automation strategy to become the foundation on which you can realize a successful digitalization strategy.
The relationship between process and technology will be explored indepth in this chapter, including a review of the most important processes for the development of automation and cross-team collaboration. We will also focus on how we can make things more future-proof, as changing how hundreds or thousands of people work is the most demanding thing you can do in an organization.
Structure
In this chapter, we will discuss the following topics:
Relationship between process and technology
Changing and rolling out processes
List of key processes and how they should be performing
Objectives
The objective of this chapter is to teach you about how processes impact your technology landscape and key decisions around it. Furthermore, we will review what processes you need to have in place to succeed with your automation strategy, including detailed requirements and expected outcomes, which you can add to your automation strategy.
Relationship between process and technology
On the positive side of things, modern ways of working drive the adoption of modern technology stacks and automation. On the other hand, with badly implemented or missing key processes, the risk that your automation strategy struggles as you go into implementation is significant. When you have a software bug blocking you, a support contract or some engineering can bail you out. When you must create a new process or change an existing one, that means people must change how they are working. When we are talking about widely used processes that hundreds or thousands of people work with, the challenge is significant. Because of this, old legacy processes or missing processes are as common as ditto for IT systems. The difference between fixing a system and a process is as follows:
The processes require more effort to change or implement.
A missing process or a legacy process can block the roll-out of modern technologies, locking you into using old IT systems.
The following are some common examples of how defunct or missing processes can lock you into using legacy systems:
All changes are required to be reviewed by a change advisory board, making it impossible to fully automate roll-out of changes.
There are no processes that support cross-team development, and this leads to an absence of collaboration and inefficiencies related to teams inventing the wheel over and over again, as well as less efficient IT systems as users of systems cannot easily suggest improvements or otherwise influence roadmaps of other teams.
The process for implementing new security policies does not include calculating increased costs or impact on efficiency, and this leads to the security department shutting down access to useful tools and services on the internet. Furthermore, implementing a new distributed compute platform is made impossible due to a network security policy from 2003.
IT systems have life cycle management plans, whereas releases have endof-support dates. Processes can go on running the initial release for years without a question asked, and that must change. Just as you are ensuring that your IT ecosystem is modern, ensure that your processes are as well. If you are seemingly stuck with legacy systems, that may be an indication that somewhere, you have a bad or missing process. Failing to do life cycle management for your ways of working means it is a matter of time before innovation, time-to-market, and efficiency take a hit.
Changing and rolling out processes
Mainly where people fail is regarding training. When we approach technology, it is easy to understand that people need qualified trainers, onsite training, and hands-on experience. When it comes to processes, a common training approach is a Web page with some online-only training consisting of some illustrations and perhaps some audio or video. Not all processes require qualified onsite training, but for many collaborative processes where people work together, it is important that people get appropriate hands-on experience.
List of key processes and how they should be performing
When it comes to automation strategy, there are a few processes that are must-haves to be able to successfully implement the strategy organizationwide. We will review these processes one by one and detail how they should be working. Requirements for these processes should be a part of your automation strategy. Extra attention is to make sure that your processes requirements and the processes themself can stand the test of time and do not have to be revised significantly as you move forward. On the detailed level, processes will be different, depending on organizational makeup, regulatory requirements, and culture, so instead of focusing on details, we will review the types of processes and outline key requirements and outcomes for them. Here follows the minimal set of must-have types of processes.
Cross-organizational collaboration
This is the most commonly missing. Sometimes there will be processes that govern cross-team collaboration within a part of the organization, but nothing, which allows the complete organization to collaborate, meaning, collaboration happens within IT but not between the business and IT, and so on.
Key requirements
Training required
Allows any person in the organization who is a user, developer, operator, or business stakeholder for automation to collaborate in the development process of said automation. This includes:
Being able to create support tickets (issues, incidents, suggestions for features, and questions).
Access to documentation.
Access to automation source code (repositories). And yes, this can be done in most cases without any significant security concerns. Sensitive parts can be encrypted or split out and stored in separate private repositories. Corner cases exist, such as systems dealing with national security, specific areas within banking and payment, public cloud vendors,
or specific sectors where legal, regulatory, or contractual restrictions apply.
Outcomes
Increased cross-organizational collaboration.
Increased cross-team collaboration.
Users, developers, operators, and business stakeholders can easily collaborate in the development process of automation.
Documentation
Normally considered a development discipline in the context of automation, we will deal with this separately as it is so important. Documentation is much related to all other types of processes, especially collaborative ones. We often miss documentation which impacts heavily on collaboration efforts, as it is difficult to contribute to something that is difficult to understand. Furthermore, it impacts operational costs, as maintaining things becomes more difficult without proper documentation. It also impacts onboarding, where it can make the difference between months and days of time to fully onboard people to a new project. Finally, it creates bottlenecks, as fewer people sit on the knowledge required to contribute; this also counters collaboration.
Key requirements
Training required.
All automation is documented.
Non-technical resources can use the documentation to understand how automation works at a basic level.
Technical resources can use the documentation to understand how the automation works at a detailed level.
Outcomes
Increased cross-organizational collaboration.
Increased efficiency and decreased operational costs.
Decreased onboarding time.
Development
Creating automation is development. That means there needs to be processes in place for how development should be done. If there are no standards for how development is being done, scaling automation becomes challenged by high maintenance costs and difficulties related to collaboration.
Key requirements
Training required.
Guiding principles (For example: keep things simple and readable).
Style guidelines, including naming conventions.
Version control guidelines that describe how to use version control.
Release guidelines that describe CI/CD, for example, how a release is pushed to production in an automated manner, which includes testing. More about change management specifically below.
Outcomes
Increased cross-organizational collaboration.
Increased efficiency and decreased operational costs.
Decreased onboarding time.
Integration
Doing automation is to integrate to different systems. This is done for each task in an automated workflow, to get something done, to fetch information, to log something, and so on. It is, therefore, essential that there are processes for how to integrate with systems and how to integrate with your automation. Without that last thing, you cannot create automation as a service. Here you will sometimes find old processes or policies from the past two decades, which makes integration exceedingly difficult. That can include putting automation systems on isolated networks which are more or less completely closed—network-wise. API management solutions can sometimes resolve integration challenges, but keep in mind that it is difficult to hide everything behind an API management solution.
Key requirements
Training required.
Integration into a system can always be requested, and such requests are logged and evaluated with automation in mind.
Standardized ways to integrate to systems (SSH, WinRM, RestAPI, XMLRPC, SQL, and so on).
Outcomes
Increased security.
Increased cross-organizational collaboration.
Increased efficiency and decreased operational costs.
Change management
It may seem strange to some, but it is still common to see antiquated change management processes which require human review and midflight approval for changes. It is OK to have manual review processes if they happen during the development of the automation, not as the automation is rolling out changes into the landscape. Modern development practices such as peer review processes and CI/CD during the development process are the way to go and will often much improve the quality of the review process. Concerns that conflicting changes are not performed at the same time can easily be automated away completely by using a quality automation platform.
Key requirements
Training required.
Allows fully automated changes without human intervention.
Outcomes
Increased security.
Increased time-to-market.
Increased efficiency and decreased operational costs.
Conclusion
Without the required processes in place, an organization-wide automation strategy is bound to fail due to challenges with collaboration and the ability to scale both the automation itself and the impact it has. In the upcoming chapter, “Catering for a digitized future”, we will gaze into the future to gain knowledge about challenges down the road.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 9 Catering for a Digitized Future
Introduction
Now that you know about the main components needed to be dealt with in your automation strategy, it is time to discuss current and future trends and how they may and will impact your strategy. The Greek philosopher Heraclitus is credited with the idea that the only constant is change. The broader question you should ask yourself related to this, as you construct your automation approach is what change would disrupt your plans.
Are you putting solutions, processes, and guidelines in place, which are likely to not age well? This chapter will outline some of the more prevalent technical trends and probe how these may impact various aspects of your strategy. Let that be an inspiration for a useful exercise, which is to continuously monitor and consider if developments happening out in the world merit a change to your automation strategy or implementation. Do not fall into the trap of thinking that you can construct flawless solutions for the next 10 years to come. No one can predict what the world will look like in ten years. At the same time, your strategy and implementation need to provide a structure that does not change on a yearly basis. The key to finding that balance is to understand common current and potential future trends, which is something this chapter will try to help you with.
Having concluded that things are changing constantly, we will spend some time considering what this means outside of the impact of specific technical trends. Even though our landscape is in constant flux, not
everything is changing or can even change. The reason for this is that there is not enough budget or resources to execute wholesale changes of a host of different things in our IT landscapes on a time horizon which is meaningful for an automation strategy to focus on. Related to this, we will delve into a discussion about how an ever-heterogeneous landscape impacts us and how this can be dealt with in your automation approach.
Structure
In this chapter, we will discuss the following topics:
Current trend—cloud
Current trend—statelessness
Current trend—IoT
Current trend—artificial intelligence
Objectives
The objective of this chapter is to teach you about current and potential future technical developments and how they may impact your automation strategy. Catering for the future of tech is important to archive a robust automation strategy which ages well.
There are many technical trends today. This part tries to deal with the most impactful technical trends, avoiding going out on edge and focusing on developments which may not pan out.
Current trend—cloud
As a result of increased pressure to modernize and automate, a lot of organizations have selected the path of outsourcing. The outsourcing in question is called the public cloud. The public cloud market is currently dominated by three large American companies. Public cloud providers sell everything from basic infrastructure services to high-level platforms, SaaS services and development tool chains. Although it is not necessarily intuitive to outsource something (IT), which is becoming more important year by year —due to lack of skills, perceived difficulty creating such services in-house or budgetary reasons, this is where the world is moving. The public cloud provider’s services get integrated with the organization’s IT landscape, processes, and team structure. The option for the public cloud is the private cloud, where organizations provide the same type of services under their own management in their own data centers. For most organizations, this means modernization of existing private datacenters. The mix of public and private clouds is called a hybrid cloud. What drives hybrid cloud strategy is that general-purpose public cloud environments cannot always cater for organizations’ business needs. For example, in some countries, public cloud outsourcing is not allowed when dealing with sensitive government information. In other cases, pay-as-you-go models become too expensive compared to owning your own datacenter, where costs may not increase as much depending on increased use.
Other cloud adoption models are multi-cloud or hybrid multi-cloud, where several public cloud providers are used without or with a private cloud component. The following is a picture that outlines these different cloud adoption strategies:
Figure 9.1: Different cloud option strategies
What drives multi and hybrid multi-cloud strategies is control and choice. Being able to move between public cloud outsourcing providers. How this can be a challenge will be made clearer in a bit.
Following is a rough overview of what public cloud providers provide and what things the customers often provide. This is to give a simplified view of what a split of responsibility and integration commonly look like:
Figure 9.2: High-level utilization of public cloud services, including the split of responsibility
Reviewing the preceding picture, the challenge with the public cloud may become apparent. The challenge is lock-in. Most organizations would today be hesitant to commit to committing to renting a house for several decades, but that is what organizations are doing in practice with public cloud providers when there are no realistic exit or business continuity plans for public cloud outsourcing. If integration to public cloud services is up to each team to decide on, the effort to untangle resulting applications, platforms, tool chains, processes, and organizational structure will, for many organizations, end up costing too much money and effort, turning them into permanent customers of these companies. This is important to understand as this is a major driver for hybrid, multi-cloud, and hybrid multi-cloud, where the strategy is to retain more control and choice. The key to being able to realize any hybrid and multi-cloud strategies is automation. This is because automated services are more well-defined and documented and can, therefore,
be moved more easily. This is further described in Chapter under “Selection criteria for automation platforms” section. The importance of hybrid and multi-cloud strategies to counter permanent lock-in to public cloud providers cannot be understated, as the ability to make free choices to adapt to changes in the world changes often decides the fate of organizations and companies alike.
What to consider
Your ability to move into the public cloud in general is tied to automation, as the public cloud is built to host automated cattle and not manual pets. Furthermore, automation can only be an enabler of hybrid and multi-cloud strategies if it acts as an abstraction layer to the public cloud. This means using the public cloud providers’ own vendor-specific automation frameworks counters your ability to adopt hybrid or multi-cloud.
Current trend—statelessness
Driven by requirements from public cloud vendors, statelessness is a significant trend. You often hear the slightly tasteless saying, “Deal with your systems like they are cattle, not pets. If an animal gets sick, you do not tend to it as you would with a loved pet; instead, you shoot it and get a new animal”. This stems from cloud-native architecture assuming workloads are distributed in nature as a means to solve availability challenges. This is because when public cloud services were built, they did not provide failover between public cloud datacenters. Instead of having infrastructure with deals which the failover of workloads, you simply spread systems across different availability zones so that if one zone is down, you still have working systems in other zones which deal with incoming requests. Having stateless systems created from images also helps with scaling, making it easy to start up new systems to service incoming traffic instead of creating, installing, and configuring systems.
The most popular example of statelessness is today’s container technology, which is a way to package software by packaging it into a stateless portable image. Container technology became prominent in part because of how well it works together with DevOps, microservices, and modern development practices such as CI/CD. Containers also rose to prominence as they provide a more resource-efficient way to run applications than virtual machines. The most common way to manage containers is using Kubernetes, a widely popular orchestration engine for containers. When running containers on Kubernetes, running workloads are not modified per se; instead, they are created and destroyed. If
something goes wrong with a running container, it is destroyed and recreated. With that said, the configuration of workloads does happen in the provisioning layer (Kubernetes, and so on) using the injection of files, variables, and data into stateless workloads. This means the need for automation moves from the workloads themselves to the platforms managing them.
What to consider
Automating stateless workloads are different from automating stateful workloads, and they still need a lot of automation, though. You often do not configure running workloads, and a lot of automation is replaced with a build process putting things in place, as they should be, inside of a virtual machine or container image. Automation can also happen by configuration of the runtime (Kubernetes) platform. Outside of that, automation focuses on creating and destroying systems and, in some cases, operational Day 2 type tasks, like troubleshooting and security response.
When it comes to workloads running on Kubernetes, a new set of automation tool chains has appeared, focused on containerized workloads and Kubernetes specifically. This means you may not necessarily want to run the same automation solution for containerized and non-containerized workloads.
Current trend—Internet of Things
Also referred to as IoT, this trend is about smaller computers, often with sensors running closer to people. An example is light bulbs which can be controlled via a central control device or app, those devices which detect the temperature of the water in a lake or humidity in a house. What makes IoT special is scale. While a normal medium size organization may have several thousands of computers, an IoT system may consist of tens of thousands or even millions of devices.
In the following figure, we can see how scale is impacted as we move out of the more traditional data centers and into the sometimes-called device edge:
Figure 9.3: Different places of compute
These smaller type computers are often limited in what they can do and rarely run normal full-blown operating systems. Instead, they are managed via simple APIs that can run on devices with limited computing power. Another thing that stands out is that IoT devices often are very much distributed across different physical locations, very much more so than normal computers. Take the example of intelligent light bulbs that are installed in each customer’s home at a scale of many devices per customer, totally in many millions of IoT devices. IoT is predicted to significantly scale in the coming years and is likely to significantly outpace growth in traditional datacenters. It is important to note that IoT commonly also requires some additional places of computing, like regional data centers and various compute edges, which are located closer to the IoT devices themselves.
What to consider
IoT devices are simpler in nature, meaning that automation systems that assume management software to be installed on automated systems do not work. Your automation system not only needs to be able to communicate via APIs but also needs to be able to manage the distributed nature of IoT systems, regional data centers, and compute edge. Furthermore, if there is a need to interact with the devices, you need to be able to scale. Likely, due to the distributed nature of IoT devices and communication to central locations often being unreliable, you need a system which IoT devices can call home to or which is located physically close to the devices managed.
Current trend—artificial intelligence
Artificial intelligence (AI) is gaining prominence across most areas of IT. Not only does it help us create smarter services, improve availability and security with automated AI IT operations, but it also impacts productivity by assisting us in the creation of automation. Artificial Intelligence technologies, such as Large Language Models (or transformer models), are plagued with fundamental challenges, such as often not having a definition of truth, meaning output from these models needs to be tested before use in critical applications.
Overall, AI has the potential to lower the cost of knowledge as AIpowered chatbots can help people understand how complex systems work and even help them with automating them. It is difficult to predict the full impact of AI in the future, but at this point, most importantly, it helps people across IT to become more productive. Keep your eyes on this field, as it is likely to disrupt organizations across all industries in the future, and AI is already useful for the development of automation including migration between automation frameworks.
A key development will be when AI becomes available locally in an organization’s own networks, allowing privacy and data-related challenges to be resolved. Further developments when Large Language Models get a definition of what is true and when legal concerns regarding copyright and licensing are addressed. Until then, limited disruption is expected in the field of IT automation.
What to consider
Lack of resources and lack of skills are common reasons why automation does not get done. AI presents a rare opportunity to deal with the status quo without adding significant additional resources to the organization, as AI can make current employees more efficient.
To be able to efficiently use AI-generated code and automation, you will need to have automated testing in place. It is rare that current AI models have a sense of truth, meaning that output is just based on predictions; this means code needs to be listed and tested properly before use to see that it is both syntax correct and does what it is supposed to do. This will have to be done in an automated fashion. Furthermore, to use the code generated, workloads and processes need to be automated. In this sense, automation and AI have an intertwined future. If you do not have fundamental things in place, adopting AI will be very painful or simply not possible.
For AI to become an integrated part of our application landscape, it will have to be able to run disconnected within the confines of company’s and organization’s networks due to common requirements on High Availability and Disaster Recovery security, and privacy. This is not as much the case with today’s smart AI chatbots, systems, and tools, which often require data to be sent across the internet and to be processed outside of organization’s networks.
There are current legal concerns regarding what Large Language Models output when it comes to copyright and license issues of the data they have been trained on. These issues will need to be addressed.
There is not currently a lot of legislation regarding AI, and this can be expected to bring some degree of disruption, as normally occurs when a previously unregulated area of technology becomes more heavily regulated.
Conclusion
The only thing which is certain regarding the future is that things will change. A hallmark of a high-quality automation strategy and architecture is its ability to remain intact in the face of that change. Ensure that you monitor trends and consider what different developments would mean for your automation and automation strategy. Even if you do not have a plan to roll out AI operations or to move into the cloud, consider what would happen if you would. In the upcoming chapter, you will learn how to scale your automation initiatives to the next level.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
Part - 3
Automation for Architecture that Matters
C HAPTER 10 Scaling Up Automation to Organization-wide
Introduction
There is one thing which alone breaks all systems, and that thing is scale. Your system can manage 5 million customers, but it cannot manage 5 billion customers rushing through the system requesting data and interacting. Even public cloud providers run out of capacity because, at the end of the day, you can only put that many servers within those datacenter walls. When we discuss application architecture, catering for scale comes naturally, and even then, we get issues during black Fridays or other events when all customers come rushing at the same time. Considering automation an extension of our applications does us a favor in that way, as it becomes natural for us to apply the same requirements for scale on our automation systems. We can execute 100 automated tasks at the same time, but can you execute 10,000 automated tasks at the same time? Scale is a requirement that is non-optional for automation systems to be able to provide services which we can rely on.
Scale is also not only something which makes technical architecture break apart when it is stressed, but the same goes for everything from how we organize people to how we manage our budgets.
A common reason for scaling our automation strategy organization-wide is segmentation. Segmentation happens within our organization, where we end up with teams living in an almost separate world from the rest of the organization. Segmentation also happens in our IT landscape, creating islands of automation that are difficult to integrate with. This chapter
explains why segmentation happens and what you can do to manage harmful segmentation. Furthermore, a flexible federated approach to automation is introduced as an architectural solution to this. It allows you to overcome technical scalability challenges, but more importantly, it also allows you to tackle the challenge of having more than just one system and one team which does your automation.
Structure
In this chapter, we will discuss the following topics:
Reflection on how different types of automation impact segmentation
Reflection on automation strategy and budget for segmentation
Organizational makeup and segmentation impact
Federated automation which scales
Objectives
The objective of this chapter is to teach you how to scale your automation to encompass your complete organization while keeping it consistent. In the end, you will know what to watch out for as you grow your organization and your business, making sure automation becomes an enabler instead of a prohibitor of growth.
Reflection on how different types of automation impact segmentation
First off, we need to understand that having several diverse types of automation systems in our landscape is both a reality and a current necessity. Forcing various hammers to become saws, screwdrivers, and wrenches simply do not scale due to bad technical fit. With that said, as different teams organize around different tools, that creates some degree of segmentation, even if it is necessary. The segmentation which happens relates chiefly to collaboration. Different tools have different approaches to solutions and differ in how they describe the IT landscape in separate ways. This can easily build segmentation between teams if the teams build up conflicting views on how to approach challenges and other things, which builds imaginary walls and make collaboration more difficult. Vendors sometimes help with building these walls and smaller islands of automation by creating consensus around the fact that their vendorspecific systems and approaches are far superior without consideration for implications to the broader context. Considering this, it is natural to try and consolidate systems when it is possible. Considering the significant challenges in making each of these systems not only scale but also scale together. Still, the only way to gain scale is to use the right tool for the right job. Even though there are automation platforms which can deal with large chunks of the IT landscape, no automation platform was created with all specific technical domains in mind. This is the most difficult tightrope to walk from an architectural perspective. To ensure that the systems you do select are right, you do not only need to test systems properly but you also need to test them in conjunction with other tools which you will integrate with.
Reflection on automation strategy and budget for segmentation
It is important to understand that harmful segmentation, like different teams building walled gardens consisting of automation systems and a lack of collaboration, only happens in a vacuum created by the absence of an organization-wide automation strategy. When teams collaborate on a regular basis and share both goals, objectives and how to get there, this creates a natural resilience against these walled gardens and islands of automation. If you see collaborative dysfunction and walled gardens, that can be an indication that your automation strategy is not communicated, understood, or enforced properly. It can also be that the strategy has flaws that need to be addressed. Most commonly, your issues will be communication-related, and it is time to sit down with the teams and identify what needs solving.
Except for technology influencing teams to build walled gardens, another common factor is budget. As stated several times before, if your automation strategy is not backed by budget, it becomes a paperweight or a doorstop. When there are no budgets which support change, teams will work with whatever they have, as that requires the least budget change, even if that will improve efficiency and save money. Many technical solutions and platforms include some type of native automation, which then will be what will be used, resulting in a myriad of different automation systems, which most definitely will not scale. Sometimes teams have signed a significant Enterprise-Level-Agreement for a technical platform, in which automation is included—that team is again unlikely to select something else if there is no budget.
Organizational makeup and segmentation impact
When it comes to organizational makeup, there is one main thing to consider, which is that centralization of responsibility and influence can be a bad thing when it comes to organization-wide automation. To introduce automation to the whole organization, you need everyone to get involved. In larger organizations, consider reporting paths, decision centers, and everyone who can say no. It is surprisingly common to see all-powerful and central teams without actual expert domain knowledge about the IT landscape or automation turning down automation initiatives. A classic example is the central architect group, disconnected collaboration-wise from large parts of the organization, which reports straight to the C-level with the ability to say no to any decision. Sometimes the architect group is instead an automation team, and sometimes, it is a management team responsible for things like a key business unit, IT operations, and so on. With your automation strategy anchored high enough up, budget backing, and wide engagement and collaboration with all key stakeholders, this becomes less of an issue, especially when it comes to people saying no simply because of budgetary reasons, but it is still important that you collaborate and involve the whole organization. Make people part of driving the effort instead of treating them as simple passengers on a train.
Federated automation which scales
Federated automation is the architectural solution to automating a segmented IT landscape. It allows flexibility regarding what tools perform automation while providing consistency for users of automation and great scale by supporting the distribution of tasks across multiple systems.
To make sense of a federated architecture for automation, it helps to start to consider that there are two different types of users of automation. One being humans and one being computer systems. In general, humans are looking for similar things from their interaction with an automation system, including things such as ease of use, responsiveness, and easy-to-understand feedback. When it comes to computer systems, requirements are different, focusing on ease of integration, API features, security, and so on. We have established that it will unlikely be possible for you to craft the one ring which rules all automation. What is possible, though, is to create a federated architecture for automation which provides a fewer number of interfaces for your two different sets of users. Let us have a look at what that can look like.
First, to serve your two distinct types of users, we need four different sets of roles, depending on the capabilities of the systems used in those roles. We may be able to merge the orchestration and system automation roles into a single system, but unlikely any other two. In Figure we can review the different system roles we will use to build our federated automation architecture:
Figure Distinct roles in a federated automation architecture
Our human interface will often be a so-called service catalog and is normally implemented by one or two systems. In the case of two systems, this is because human users are split between normal users and developers. This is where human end users can easily interact to order arbitrary things, which then triggers automation in the backend.
The orchestration role is normally implemented by one single system; in some cases, a system can share the orchestration and system automation roles, even though it is rarer currently. It is for when we have complex workflows of automated services and a more considerable number of automation systems that do the actual automation or support automation otherwise, like ITSM systems.
API management is basic hygiene in a modern API-driven landscape. You will normally have a single API management system, but it can be implemented across several instances located in separate locations. Except for being useful when you expose automation as a service, from the system
automation role, API management tools can help with the integration between automation platforms performing the system automation role and simplify or provide security for integration with systems being automated or integration between distributed setups of automation platforms.
The system automation role is for systems which do automation. You will normally have several different systems doing this role. This includes the systems which manage automation for the various parts of your IT landscape.
In Figure we can review an example of a federated automation architecture featuring all four roles:
Figure Example of high-level federated automation architecture
Now, we can see how the pieces of the puzzle fit together and provides us with a flexible and scalable architecture which allows us to standardize how our two different sets of users interact with the automation landscape while
allowing us to use the right tools for the right job when it comes to the automation itself.
Conclusion
Segmentation across our organization and IT landscape is an unavoidable fact. By having that top of mind when we implement our automation strategy, we can reduce the risk of harmful segmentation. By adopting a federated approach to automation, we can allow the right level of flexibility for our implementation while not sacrificing productivity due to a lack of standardization. In the upcoming chapter, Establishing High Availability and Disaster Recovery, we will look at the architectural reality of creating HA/DR for your automation roll-out.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 11 Establishing High Availability and Disaster Recovery
Introduction
As automation becomes more fundamental to your IT landscape and digitalization strategy, many organizations automation journey arrives at the point where there is a need to re-assess the availability and recovery requirements for the automation platforms used. This is natural as lower automation maturity includes less important use-cases, no or lacking automation strategy and few standardized platforms. This chapter discusses common automation use-cases to be on the lookout for, when it comes to high availability and disaster recover requirements and outlines a general-purpose architecture for HA/DR, possible to apply to different automation platforms.
High availability and disaster recovery are central parts which form the foundation which decides what types of use-cases you can automate. Consider this, if your automation platforms are not highly available and possible to recover in a disaster scenario, this simply means that we cannot automate things of importance without great business risk. And let us further consider what it means when your automation platforms suddenly can be down for hours or days and cannot be easily recovered, you would need a process to evaluate availability and recovery needs both before you automate something and for each use of that automation, to stay safe. This should all feel counterintuitive. If you have an automation strategy worth the paper it is written on - because what automation strategy would exclude all things business critical? The opposite should make sense, as most want their most critical services to be some of the
first to draw classical benefits from automation, such as increased time-tomarket with new features, reduced downtime, and improved security.
Furthermore, if your automation strategy does not include business critical items in your company, then that means your digitalization strategy does not include more than parts of your organization, as automation builds the foundation for digitalization. More critical, it does not include parts of your organization which depend on business-critical services.
Structure
In this chapter, we will discuss the following topics:
How communication breakdown and complex integration silently put HA/DR requirements on your automation platforms
Common automation use-cases to be the lookout for
HA/DR specific requirements on automation tool chains
An example of common-purpose HA/DR architecture
Objectives
The objective of this chapter is to teach you about what role HA/DR plays in automation. We will probe how the need for such requirements is created by common automation use-cases and further discuss what the specific requirements often are. The chapter ends with a common-purpose HA/DR architecture which can be applied to different automation platforms.
How communication breakdown and complex integration silently put HA/DR requirements on your automation platform
For automation to become the foundation which not only enables but also supercharges your digitalization strategy, you need to provide automation as a service and stop building a lot of islands of automation. That means you will have central teams which provide automation platforms to the rest of your development and operations teams. At the same time, we have hopefully accepted that automation is not an external thing and instead something which should be viewed as a part of what it automates. In this lies the challenge. When you provide the ability to create and run arbitrary automation for many others (often hundreds or thousands in larger organizations), you often do not know what type of automation they create and run. Even when you know what various automation does, such as creating a virtual machine, you do not always know for what purpose others use this automation.
The following is a depiction of the communication required to know what the availability and recovery requirements for your platform are:
Figure Key communication regarding availability and recovery requirements
In organizations with hundreds or thousands of people creating and consuming automation, it is exceedingly difficult to know if there is or if there is not any business-critical automation or uses of automation for a given platform. That leads us to one thing, which is that automation platforms, in general, need to be both highly available and possible to recover if a disaster happens. Forbidding the use of business-critical automation or businesscritical use cases or forcing heavy-handed evaluation processes of automation and use cases would be counterproductive to serious automation or digitalization strategies. This reality has yet to set in many organizations, and it is at the same time common that there is little to no communication between various teams creating automation and the users of this automation, which delays this realization further. To prevent issues, as a rule, automation outside of a team should always be highly available. It is further a
requirement to have such highly available platforms to be able to execute an organization-wide automation strategy.
Common automation use cases to be on the lookout for
If you are uncertain if there is automation or use of automation which is business critical. Here are some common automation use cases which often create HA/DR requirements listed and explained:
Table 11.1: Automation use cases
HA/DR specific requirements on automation tool chains
For completion, we will outline key requirements when creating automation platforms with high availability and disaster recovery capabilities. This will make it easier to understand the common HA/DR architecture in the next section of this chapter.
First, what is high availability? It is a service which is rarely down. It means a system will be available for a high percentage of a period, often a year. HA or high availability is often considered a service that will be available 90% to 99,9999999% of the year.
A common highest requirement in organizations is 99,999% availability or “five nines”, which translates to 5.26 minutes of downtime in a year. Requirements depend on context, and our context is an automation system which will deliver vital functions to the organization’s most critical services. A good starting point for an availability level for your automation platform is then the highest level you have for services, as a minimum.
When modeling probable availability, we must consider the availability of all our dependencies to a service, such as networking services. Automation fits into a similar category of services, like networking, which many things depend on, which means we may want to provide a higher level of availability than the systems we automate to decrease the risk of a single automation platform “breaking the bank”.
The following is a depiction of an example of how the automation platform we may decide to increase the availability of an automation platform to better
support the system with the highest availability.
Figure Considering availability levels for your automation platform
So, what is then Disaster Recovery Can you just not make your availability fifteen nines? Well, no. There are always things outside of our control, like a flood or an earthquake which takes out both our datacenters. Or maybe, more likely, a serious human mistake. Disaster Recovery is our ability to recover from such a disaster and is often a requirement for the organization’s most important services, which should include central things such as a central automation platform. Recovering from a disaster will normally take longer than our goal is for service availability.
Items of importance which often are discussed regarding HA/DR are as follows:
Percentage of something that is available over a period of time
Recovery Time it takes to recover from a normal outage (non-disaster)
Disaster recovery Time it takes to recover from a disaster
Having more than one of something, often allowing for components in a solution to fail without taking down a service.
When a redundant instance of components is on standby, ready to get started up in case of failure of the active component
When two redundant components are started and ready to receive traffic
Availability A concept made popular by cloud computing where computer systems within a zone are separate with their own basic services such as storage, networking, and power. The idea is that if an availability zone fails, that does not affect other availability zones. Automation can be considered a vital central service like networking, meaning separate instances of automation platforms needs to run within separate availability zones.
An example of common-purpose HA/DR architecture for automation platforms
Here follows a common-purpose architecture that can provide the required availability and recovery which a central automation platform requires.
The following is a depiction of said common HA/DR architecture:
Figure 11.3: Common architecture for an automation system
Let us dive into what is happening in this picture. First, this is a so-called active-active HA setup, where you install the automation platform in two separate locations (Location A and Location B), either in two different datacenters or availability zones. Each platform installation is assumed to be highly available by itself.
If a failure of a complete location happens, incoming traffic will failover via the load balancer cluster, which stretches across the two locations. This will normally happen within less than a second, but it depends on the load balancer’s capabilities.
The challenge with having a single installation of an automation platform stretches across separate locations is that stretched configurations like that are not always supported. Furthermore, such configurations when supported are often overly complex and are easy to get wrong. This architecture instead depends on an automation job running in the platform, which synchronizes the configuration of the platform from a version control system. In simple terms, automation creates your automation in the two separate installations of your automation platform, automatically. Some automation platforms allow you to define configuration and automation; it runs as code in version control; if this is not the case, you may have to implement this yourself by writing custom API integration (an API should most definitely should exist for your platform). You then trigger this synchronization of automation via a Webhook. If you look at the picture, you see 1–3 being noted. That describes the following events:
User changes automation is defined as code in the version control system.
Webhooks trigger a sync of the automation as changes are detected by calling the API endpoint of the “platform automation sync job” automation running in the platform.
Automation running within the platform (platform automation sync job) then puts in place what the platform should look like, including things such as what automation should be created in the platform. This leaves both instances of your platform looking the same.
Conclusion
High availability and disaster recovery is a hard requirements for any automation platform, which is of importance for your automation strategy. Much like networking or storage are vital components for most things which happen in the IT landscape, executing an organization-wide requires you to have automation platforms which are highly available.
In the upcoming chapter, Chapter Security and Separation of Duty we will explore security-related automation challenges.
C HAPTER 12 Security and Separation of Duty Requirements
Introduction
As a part of automation touching more systems in the organization, it is natural to assess the security of your automation. At the end of the day, automation gives you better security, so it makes sense that your automation systems touch your systems with the highest requirements for security. At the same time, this means that if you have not hardened your automation systems, they can easily become the weakest link in your security chain. Security compliance often infects systems that are connected, which means that there is a substantial risk that requirements for security compliance will impact your automation systems as well.
When you have successfully rolled out an organization-wide automation strategy, your automation systems will touch all aspects of your organization and moreover also have administrative rights on those systems. Hackers know this and will try to target these systems, as that would mean they gain administrative access to large parts of the organization.
Furthermore, as we have learned, automating something is simply another word for developing something, and automation is just another word for applications. With this in mind, we know that security concerns and best practices applied in the world of apps and development also apply in the world of automation. As this view of automation is more uncommon, it also means that few people talk about DevSecOps and supply chain security for automation, even though both topics are central to securing your automation.
Finally, automation is a solution to a balancing act that many are struggling with, which is cost versus security. Security which is not automated can indeed be a large source of cost, but the solution is simply to apply automation rather than creating requirements for a lot of expensive manual processes.
With all this said, it is time to delve into both security for automation and automation for security.
Structure
In this chapter, we will discuss the following topics:
DevSecOps in the world of automation
List of must-have security controls for automation
How to mitigate the cost of security
Objectives
The objective of this chapter is to teach you about the importance of security for automation, including specific requirements and security controls. Moreover, we will explore how automation can undo the negative relationship between security and cost.
DevSecOps in the world of automation
As noted in the introduction of this chapter, it is natural for us to apply security best practices in the world of automation. This as pieces of automation are applications because the act of automating something is equal to development. DevSecOps is specifically applying security best practices into DevOps, something which then also naturally applies to automation and automating things.
Except for DevSecOps also applying to automation, it is important to note that the highest security requirement of a system being automated is what you should apply to your automation system. In cases where you have strict compliance requirements for things such as PCI DSS, HIPAA, or government standards, connected systems often come to infect each other with compliance requirements.
The following figure depicts how an automation system often comes to inherit the security compliance requirement from an automated system:
Figure Inheritance of security compliance to your automation system
Now that we understand that DevSecOps is also for automation, a good place to start is to have automation teams engage with your central security team. It is common that automation systems are developed in a vacuum without engagement with the security department. Doing so has one main challenge, which is finding all your automation systems. This is not referencing central automation systems but the many islands of automation, which are often found in hundreds of larger organizations. An island of automation can be some scripts that we depend on to pick up the result from some batch job or a collection of scripts and programs which the ops departments run off a bastion host. An advantage of viewing automation as a part of applications and platforms is that you then document it as such. This means you more easily identify any islands of automation.
Once you have found your automation systems, the challenge is that security departments rarely have domain-specific knowledge, and that includes knowledge about automation and automation systems. It is still more rare that security departments themselves use general-purpose automation systems to solve their day-to-day tasks, even though automation for accomplishing things such as scanning for security issues is used. On the other hand, people who own and maintain automation systems are rarely security specialists and often know less about what types of security threats the organization faces. This means any engagement between automation and security teams needs to start with some robust enablement regarding both automation technology (platforms and frameworks), how automation needs to be viewed (there is no automation and there are only apps), automation architecture and development processes and all things security, including best practices for how to defend against security breaches and existing DevSecOps processes. If you have a well-defined automation strategy that includes training, which means this becomes simpler to achieve.
List of must-have security controls for automation
What has been included in different organizations’ DevSecOps toolboxes differs, so in this section, we will list and describe the various security controls necessary to secure your automation.
Supply chain security controls
Automation often requires dependencies outside of your automation systems for automating something, such as integration libraries, CLI-tools, and general-purpose libraries for different programming languages. Supply chain security in the context of IT automation is about two things:
Understanding what software is involved in automating a thing.
Risk management for software that you get from various external sources and vendors.
If we understand automation as applications and development, we are immediately familiar with the process of identifying dependencies and assessing how they impact both security and life cycle management. To accomplish this, there are luckily already a lot of tools that can be used to create a Software Bill of Material the challenge is that those same tools do not always understand the automation framework itself; instead, they are often focused on normal programming languages. This means you may have to create a custom solution to automatically create an SBOM for a discrete piece of automation, which can then be audited. Regarding auditing automation, there are a lot of tools that can scan software for security issues. These tools can also be used to scan your automation code repositories, with the downside that they often do not understand the automation framework itself.
Methods to ensure that your supply chain is intact and not compromised can be re-used, including the following:
Using trusted sources and not downloading random software and automation from the internet
Cryptographic verification that software and automation have not been tampered with
Cryptographically signing automation developed in-house
If your automation framework does not supply cryptographic signatures for automation, this can sometimes be accomplished by signing files containing automation and validating those files as a part of a custom CI/CD pipeline where automation is published for use.
Vulnerability management
Automation platforms can of course have security vulnerabilities, which then means greater than normal security risk, as these platforms manage many other systems. This means there needs to be aggressive patch management and a well-oiled process for identifying security issues. This may sound simple, but in the case that community supported open-source software is used, this can often become a challenge, as the publication of security patches is not always done in a standard way or in a way where you can easily get automatic notifications. Please note that using opensource software in the domain of automation is a best practice and strongly advisable, only this can be a challenge if you are not getting enterprise-grade support for your software from a vendor and do not have significant internal resources invested.
Furthermore, automation is just another word for code and applications, so it is easy to understand that automation can have security vulnerabilities. Depending on how this automation is executed and how it is made available to the world outside the automation system, security vulnerabilities in automation may not be as serious as security vulnerabilities in, for example, a Web application serving users. In part because normally fewer users and systems can interact with the automation and in part because the code may only be running when automation is triggered.
Role-based access controls
As described in earlier chapters, automation is best served via an automation platform. This platform needs to be able to apply role-based access controls to related resources. This means you, at a minimum, need to be able to control what users can:
Execute automation
Edit automation
View automation
Edit resources related to what systems automation can run against
View resources related to what systems automation can run against
Use credentials or keys used when authenticating against automated systems
Edit credentials or keys used when authenticating against automated system
Perform administrative tasks in the automation platforms (add users, change authentication settings, logging, and the like.)
Using external systems which store credentials and keys and not having humans manually configure the automation platform (something we will cover in the next chapter) makes these tasks easier and the platform more secure.
When it comes to who can edit automation, that does not only include things configured in a central automation platform but is also related to code stored in code (for example, git) repositories. Organizations often struggle with this, as it is not always clear how to organize automation across code repositories. Automation platforms that do not support storing automation in code repositories cannot always track what changes are made to automation, which, in turn, is a significant security risk.
Logging
Producing and sending logs to an external location, where they can be audited, is central to being able to preserve audit trails for automation that has been run. This is also important for other reasons, such as people wondering what changes were made during last weekend’s service window. Items worth sending to an external logging platform include the following:
What automation has been run
What user ran the automation
When was the automation run
Changes made to the automation and related resources
Output from automation which has run
How to mitigate the cost of security
By now, it should be clear how you mitigate the cost of IT security. The solution is of course automation. The challenge is that security departments that create requirements and processes for the organization to follow too often either do not focus on automation or cost.
With this said, the first step is to require that your security department calculate costs imposed on the organization for any security controls or processes it requires people to follow or implement. This creates a natural focus on automating said security controls and processes, and its importance cannot be understated. Your author has too many times seen manual security controls and processes being rolled out, at the cost of hundreds of man years, without any reaction or reflection from the ones requiring these controls and processes. That kind of approach to security is why many people equate good security with high cost. That approach also does not scale, forcing people to choose between security and excessive cost.
The second step is to apply your security strategy to the area of security as well. This will mean that when people implement security controls, they will default to automating them. This should also mean that the security department should have automation in mind when creating requirements for security. DevOps in some organizations means that developers and operations work in the same team; it is not a bad idea to implement DevSecOps in a way so that security, developers, and operations staff at a minimum engage together as a team.
Conclusion
Briefly, security is as important for your automation as automation is for your security. Most of the challenges can be solved by a serious organization-wide automation strategy. Succeeding with both means your automation does not come with unmanageable security risk and that your security does not come with an unmanageable cost. In the upcoming chapter, Chapter 13 Explore we will dive into both how we can make automation available to more people and how we can automate the automation itself.
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
C HAPTER 13 Explore Automation-as-a- Service (AaaS)
Introduction
This is the last topic of this book, and there is a good reason that it is one of the final developments in organizations’ development journeys, going from opportunistic islands of automation to automated crossorganizational processes that enable new business or organizational capabilities. Automation-as-a-Service means that we provide a service that not only runs automation for others but also helps to create high-quality automation by providing related services. Automation-as-a-Service is not a hard requirement for everyone to be able to automate a complete organization, but it does help in different ways that we will explore in this chapter. When we scale out our automation, we naturally hit the type of challenges that we get in all scaled-out systems. Those challenges are solved with standardization and automation, both of which are things we do get from AaaS.
There is a good reason AaaS should not be one of the first things on your agenda, and that reason is complexity. The people in your organization will need to embark on many individual journeys for the organization to end up at the automation finish line. Those journeys are made more difficult if the adoptions to new ways of working are complex. Even though AaaS does solve serious problems, it is more complicated than running a simple command line tool or by clicking on a graphical user interface. So, when are you ready for this more advanced topic? A good indication is that your current way of working has significant issues. If you are doing all the right things and still are struggling with automation platform issues related to standardization, security, and scaling automation
across footprints such as on-premises and public clouds, you just may be ready for the deep side of the pool. As we are adding complexity, it is key to address that. We will cover some of that in this chapter, but briefly, what we need is training and interfaces that humans can easily use. Executed correctly, AaaS is just what the doctor ordered for an organization with growing automation initiatives.
Structure
In this chapter, we will discuss the following topics:
Managing adoption challenges
A minimal viable product
All the bells and whistles of a complete platform
Example architectural pattern for Automation-as-a-Service
Objectives
The objective of this chapter is to teach you both how to properly adopt AaaS and how it can be constructed as such. We will also review what a minimal viable version of AaaS can look like and what capability a fully featured version has. Finally, we will look at an example architecture pattern for AaaS.
Managing adoption challenges
Central to adoption challenges are that organizations (read: people) cannot go from being unfamiliar with a topic to mastering it in a short amount of time. Even when you get help from an external expert who helps you set up systems and processes, it takes time to mature skills. We have covered this challenge previously in Chapter Approach to Automation Skills But it is worth considering once more how the status of automation advancement at the company is almost completely tied to your ability to develop people in the organization.
Shown in the next picture, we line up how personal development maps well with distinct states of automation advancements in an organization. This is not to say that people cannot master automation if faced with advanced environments, but it does mean that it is a different challenge that you may want to face when your organization is more mature when it comes to automation. For comparison, people in general do not start to learn to run and jump before learning to walk.
Figure 13.1 depicts how people and automation advancement go hand in hand:
Figure How people and organizations’ automation maturity develop
Consider what happens if you implement very advanced automation systems, including Automation-as-a-Service, from day one, when most people in the company are still assessing if automation is for them at all. What they now must assess is not something simple, like a command line tool, which is easy to overview, but instead, something which requires actual training to understand. Automation-as-a-Service will normally include various abstraction layers and system which helps users to create, test and run automation. The more complicated something is, the more difficult something is to assess, which in turn increases the risk that people’s assessments will be negative.
Whether we like it or not, when we are changing a complete organization, people will need time to assess and adopt before they are thrown into the deep end of the pool. If we do not do this, it is easy to turn people hostile to the change we so desperately need. If you force someone to do something they are not comfortable with, they may end up inventing reasons for not doing it, just to get some breathing space.
The difficult part is to assess where people in your organization are now, but in general, if you do not have standardized automation platforms already, jumping straight to AaaS may be risky and may end up negatively impacting adoption. To reduce the risk of this happening, assess the maturity of your people and organization before going ahead and implementing AaaS.
If you are ready for AaaS, then you still need to ensure that you provide proper training for people. That includes hands-on exercises where people can experience the advantages of AaaS in a controlled environment. You do not want people to run into a wall and, based on that experience, conclude that AaaS is a dreadful thing. And as we have learned in Chapter Approach to Automation Skills Development, about skills development, it is better that you train too many people rather than too few; at a minimum, automation creators and management should be trained. If we also provide some training, at least to automation users, we lessen the gap between creators and users, and some users may transition to become creators.
A minimal viable product
Now, let us review what capabilities are included in the minimal viable version of AaaS. This constitutes a good place to start as it avoids us creating a complex big bang release that has a higher likelihood of failing.
Figure 13.2 depicts how these capabilities together create an AaaS platform:
Figure Features of a minimal viable AaaS platform
We will now explore these features in the context of AaaS.
High availability
Recently in Chapter Establishing High Availability and Disaster we reviewed why having HA/DR for your automation platform is a good thing. If you are to provide automation AaaS to large parts of your organization, it is easier for you to assume that people will automate things of the highest importance rather than setting up a complex assessment process for automation, which more than likely will fail.
Security compliance
Covered in the previous chapter about security, we have concluded that it makes sense for automation systems to automate our most secure systems and, at the same time, insecure automation systems easily become the weakest link in our security chain. Review what types of security compliance or levels are needed in your organization and ensure that your system passes muster.
Automation creation
A service should be provided which helps with the creation of automation. Consistency and standardization are fundamental to long-term life cycle management challenges, so when we provide automation to everyone, we need to ensure that those challenges are resolved; otherwise, we may be unleashing untold technical debt on the organization, where maintenance of the automation quickly becomes impossible to manage. One proven way to tackle this is to standardize how automation is created and written.
Common features of such an automation creation service are as follows:
Ability to create boilerplates for a new piece of automation
Automated flow, which makes said automation available via automation platforms and/or API gateway solutions
In a minimal viable version of AaaS, interfaces are often more technical and do not often include graphical interfaces. More commonly, we use existing interfaces such as version control and different command line tools.
Automation testing
A service should be provided which helps with the testing of automation. Automation is code, and code needs to be tested, which has been touched upon in previous chapters. If we do not provide an automated way to test automation when we roll out AaaS to an organization, chances are that people will not create testing for the automation. This creates a technical debt that will need to be resolved in the future.
In a minimal viable version of AaaS, interfaces are often more technical and do not often include graphical interfaces. More commonly, we use existing interfaces such as version control and different command line tools.
All the bells and whistles of a complete platform
Here follows descriptions of a fully featured AaaS platform.
First, we need to implement all features of a minimally viable product. Except for that, we both add new and expanded capabilities in the following areas.
Figure 13.3 depicts how we add two major capabilities to our platform to make it fully featured. Those are “hybrid and multi-cloud” and “human interfaces”, which we will explore shortly:
Figure A fully featured AaaS-platform
Hybrid and multi-cloud
Considering that most organizations are already rolling out workloads in public clouds in combination with other places of compute, being able to provide AaaS across the different footprints is a natural feature of a fully featured platform. From an architectural point of view, this often means establishing separate platforms in the various places of compute. This is due to availability requirements, and automation should be viewed as a base service, such as networking and storage.
Figure 13.4 shows how our AaaS platform is installed across our different compute footprints:
Figure Separate AaaS platforms per compute footprints
As we are already catering to high availability and disaster recovery requirements, this should not be an overly complicated task to achieve. It does mean that we must consider the impact on integration, meaning that different AaaS platforms talk to reach each other to realize services that span multiple footprints. Architecture that caters to this can be reviewed in Chapter Scaling up Automation to
Human interfaces and integration
When we mature our minimal viable product. We reduce the technical complexity of the interfaces to our AaaS platform and focus more on the human experience of creating automation. This often means more integration into existing and/or preferred development tool chains, and increased focus on human-type interfaces, such as described in Chapter Scaling up Automation to Examples of such integration would, for example, be a custom extension in developers’ favorite IDE, a custom graphical service portal where automation can be ordered, or the ability to create automation from a standardized CI/CD pipeline.
Automation creation
Where the amount of automation that is automatically created for users is more limited in the barebone version of AaaS, in the fully featured platform, we can create most of the automation automatically. Artificial intelligence holds great promise in this area, where there are already multiple AI-driven services that can convert natural language to programming code. In a non-AIdriven world, we use building blocks of already existing automation to help users construct their own automated workflows or processes.
Figure 13.5 depicts how the creation of automation can be served:
Figure Ways to deliver automation creation services
Automation testing
At a more developed level, the creation of automation and the creation of testing goes hand in hand. When we are talking about AI-driven large language models, if the model or platform we are using does not have a sense of truth, we need to add that.
Figure 13.6 depicts how we either identify created automation that comes out of used large language models and add tests or how we have pre-defined tests associated with all pre-created automation building blocks:
Figure Ways to deliver automation testing services
Example architectural pattern for Automation-as-a-Service
Putting all the pieces of the puzzle together is not always straightforward, and there is less than often a single size that fits all. With that said, to inspire how you can architect a fully featured Automation-as-a-Service platform, here is an example architectural pattern.
Figure 13.7 depicts an example architectural pattern for one installation instance (multiple for each compute footprint (on-premises, cloud x, and cloud y):
Figure An architectural pattern for a fully featured AaaS platform
Conclusion
This concludes the last chapter in this strategy guide to automation. Hopefully, it has left you with useful insights regarding what a successful automation strategy looks like and how you can create requirements and implementation for automation platforms that help you roll out your strategy. At the end of the day, my final advice is that the most fundamental challenges and difficult-to-find solutions are not technical but are related to humans’ ability to communicate and collaborate. Talk to each other, and the rest will follow.
Index
A
ad-hoc training 68
automation 4
applications
benefits
definition 14
DevSecOps
historical view 5
main components 26
modern time automation 6
security controls 110
successful companies 12
traditional view
use cases 103
Automation-as-a-Service (AaaS) 115
adoption challenges, managing
automation creation
automation testing 123
example architectural pattern 124
fully featured platform 121
high availability 119
human interfaces and integration 122
hybrid and multi-cloud 122
minimal viable product 119
security compliance 119
automation languages 49
automation modules 55
automation patterns 55
automation platforms 49
selection criteria 53
automation-related skill gaps 67
automation, scaling up
federated automation
organizational makeup 94
reflection of automation impact, on segmentation 93
reflection on automation strategy and budget, for segmentation 94
segmentation impact 94
automation skill development 62
automation strategy
components 26
implementing 24
key objectives 44
key outcomes 42
performance monitoring 39
sequence of implementation 29
automation systems
consolidation 50
automation tool chains
HA/DR specific requirements
automation tools 49
B
budget management challenges
common indicators 38
driver-based budgeting 35
traditional budgeting 35
Zero-Based Budgeting (ZBB) 35
C
change management 78
commercial training offers 67
communication
for availability and recovery requirements 102
components, automation strategy
budget and ownership 27
key processes, for development and cross-team collaboration 28
performance monitoring 27
skills development 27
tools strategy 27
cross-organizational collaboration 75
current trend
artificial intelligence (AI) 87
cloud
Internet of Things 85
statelessness 84
D
development 76
DevSecOps 108
in automation 110
disaster recovery (DR) 104
documentation 75
E
enablement 67
F
federated automation 95
H
HA/DR architecture, for automation platforms
common-purpose example 106
HA/DR specific requirements
on automation tool chains 104
High Availability and Disaster Recovery (HA/DR) 87
high availability (HA) 103
I
Information Technology (IT) systems 6
integration 77
ITSM system 57
K
key objectives, automation strategy 44
capability gains 46
efficiency gains 45
enablement 45
organizational changes 45
key outcomes, automation strategy 43
efficiency gains 43
employee retention 44
relationship, with business 43
speed gains 43
key processes 74
change management 77
cross-organizational collaboration 75
development 76
documentation 76
integration 77
M
modern IT workforce
state 63
O
organizational makeup 94
ownership
importance 37
P
performance monitoring, automation strategy 39
lack of
process and technology relationship 73
processes
changing 74
rolling out 74
public cloud 80
R
return on investment (ROI) 50
risk mitigation 50
automation platforms 52
delayed return on investment 51
no return on investment 50
saw, for hammering nail 52
S
security controls, for automation 110
logging 113
role-based access controls 113
supply chain security controls 111
vulnerability management 112
security cost
mitigating 114
selection criteria, automation platforms 53
ecosystem 56
hybrid compute support 55
integration 57
scalability 58
security 58
sustainability
skill gaps challenges 64
automation creators 66
automation users 67
management 65
Software Bill of Material (SBOM) 111
T
training 67
ad-hoc training 68
commercial training 67
train the trainer 68