File loading please wait...
Citation preview
Jim Keogh Need-to-Know Technologies for a Successful Future
Jim Keogh
Need-to-Know Technologies for a Successful Future
ISBN 978-1-5474-1692-9 e-ISBN (PDF) 978-1-5474-0081-2 e-ISBN (EPUB) 978-1-5474-0083-6 Library of Congress Control Number: 2018963410 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2019 Jim Keogh Published by Walter de Gruyter Inc., Boston/Berlin Printing and binding: CPI books GmbH, Leck Typesetting: MacPS, LLC, Carmel www.degruyter.com
This book is dedicated to Anne, Sandy, Joanne, Amber-Leigh Christine, Shawn, Eric and Amy. Without their help and support, this book couldn’t have been written.
About De|G PRESS Five Stars as a Rule De|G PRESS, the startup born out of one of the world’s most venerable publishers, De Gruyter, promises to bring you an unbiased, valuable, and meticulously edited work on important topics in the fields of business, information technology, computing, engineering, and mathematics. By selecting the finest authors to present, without bias, information necessary for their chosen topic for professionals, in the depth you would hope for, we wish to satisfy your needs and earn our five-star ranking. In keeping with these principles, the books you read from De|G PRESS will be practical, efficient and, if we have done our job right, yield many returns on their price. We invite businesses to order our books in bulk in print or electronic form as a best solution to meeting the learning needs of your organization, or parts of your organization, in a most cost-effective manner. There is no better way to learn about a subject in depth than from a book that is efficient, clear, well organized, and information rich. A great book can provide life-changing knowledge. We hope that with De|G PRESS books you will find that to be the case.
DOI 10.1515/9781547400812-203
Contents Chapter 1: Talking Intelligently About Technology 1 More than Companies and Industries 3 Make Your Life Easier 3 From Idea to Reality 4 Finer Details of Your Idea 5 The Business Case 6 The Project 7 Project Management 9 The Project Team 10 Virtual Project Teams 12 Shared Values 12 Relationship Capital 12 The Project Sponsor and the Steering Committee 13 The Project Plan 13 Project Phases, Milestones, and Deliverables 14 Work Package: Tasks and Subtasks 14 Sequencing Tasks 15 Duration 15 The Schedule 16 Resources 17 The Resource List 18 Resource Usage 18 Cost to Bring an Idea to Reality 19 Tools to Manage Development 20 Change Management 20 Risk and Managing Risk 21 Your Idea Is Delivered 23 Simple Project: No Need for Complex Project Management 25 The Focus Board 25 The Scrum 26 Sprint 26 Daily Scrum 27 Backlog Grooming 27 The Project Team 27 Extremely Quick, Changing Needs 28 Rapid Planning 28 The Design Process: Sketch the Idea 29 Pseudo Code 31 Object Oriented Design and Programming 32 Data Capture 34
x
Contents
Chapter 2: Talking Intelligently About Communication Networks 35 The Neighborhood of Networks 35 Converting the Message 37 Catching a Wave 39 Analog and Digital Waves 41 At the Speed of Light 42 From the Keyboard to the Communications Network 43 Protocols 44 Packets 44 A Closer Look at Communication Networks 45 Network Addresses 47 The Internet 48 Web Server 49 A Behind-the-Scenes Look at Email 51 SMTP 52 POP3 52 Internet Mail Access Protocol Server 53 Attachment 53 The Intranet 54 Wi-Fi 55 Cell Phone Technology 56 Transmission 56 Virtual Private Network (VPN) 57 Network Administration 58 Network Operating System 59 Network Monitoring 60 Chapter 3: Talking Intelligently About Computers 61 Larger Computers 62 A Box of Switches 63 Building Blocks of a Computer 64 The Central Processing Unit (CPU) 67 The Process Control Block 69 Memory Management 69 Device Management 70 Measuring the Processor 72 (Kind of) Doing More Than One Thing at the Same Time 74 How the Processor Makes Decisions 75 Memory 76 The RAM Divide 77 Types of RAM 78 Types of ROM 78
Contents
Video Memory 79 Inside a Memory Chip 80 Memory Errors: Few and Far Between 81 BIOS and Starting the Computer 82 Changing BIOS Options 83 The Operating System 83 Behind the Scenes of Running a Program 85 Interrupts 86 The Keyboard 87 The Mouse 89 Touchscreens 90 Permanent Secondary Storage 91 Saving Data to a Disk 92 Important Measurements of a Disk Drive 94 Deleting a File 93 Disk Fragmentation 93 Compact Disc (CD) and Digital Optical Disc (DVD) 94 Flash Storage 95 Monitors 95 Inside Look at LCD Monitors 97 Landscape, Portrait, or Multiple Monitors 98 Chapter 4: About Computer Applications 99 Application Architecture 100 Tier Architecture 101 Inside an Application 105 From Pseudo Code to a Program 107 High-Level Programming Languages 109 Breaking Down a Program into Parts 110 Function at Work 111 Decisions, Decisions, Decisions 113 Feeling Loopy 115 Sharing Parts of Programs 116 The Tool Box: Integrated Development Environment 117 A GUI Program 119 Web Applications 121 Cascading Style Sheets (CSS) 123 JavaScript 125 Mobile Apps 126 Building an App 127 Creating a Dynamic Webpage 128 Testing the Application 129
xi
xii
Contents
Chapter 5: Data and Databases 133 Storing Data 133 Memory and Disk 136 Developing a Database Application 136 More About Structured Data 138 Database Management System (DBMS) 141 Data Modeling 141 Relational Database 145 The Relational Database Advantage 146 Referential Integrity 146 Relational Database Design 147 Normalization 147 Index 148 Structured Query Language (SQL) 149 SQL Basics 150 Creating a Table 151 Automatically Validating Data 152 Modifying the Database 153 Creating an Index 154 Insert Data 155 Pattern Matching 157 Searching for Ranges 157 Changing Values 158 Deleting a Row 158 Calculating Values in a Column 158 Remove Duplicates 159 Organizing Data 160 Joining Together Tables 161 Writing SQL 163 Chapter 6: Talking Intelligently About Cybersecurity 165 Challenges of Cybersecurity 165 Cybersecurity Audit 168 Database Access Points 168 Physical Access 170 Biometrics 170 Handwriting 171 Hands and Fingers 172 Voice Prints 172 Iris Scanning 172 Fingerprints 173 Veins 173
Contents
xiii
Facial 174 Password 175 Access Rights 177 Disabling Services 178 Proxy Server 179 Demilitarized Zone (DMZ) 180 Firewall 180 Firewall Controls Traffic Flow 180 Configuring a Firewall 181 Breaking Through the Firewall 181 Encryption 182 Classes of Information and the Need For Protection 183 Categories of Encryption 184 Digital Certificates, SSL and TLS 184 Hash Value 185 Cyclic Redundancy Check and Checksum 185 Wireless Network Security 186 Bluetooth Security 186 Hacking 187 Computer Viruses 187 Inside a Computer Virus 188 Rootkit 189 Computer Virus Protection 189 Macro Virus 189 Keylogger 190 Denial of Service 191 Wi-Fi Access Points 191 Identity Theft 192 Cookie Theft 193 Network Sniffing 193 Detecting a Hacker 193 Mobile Computing Device 194 Chapter 7: Risk Management and Disaster Recovery 195 Disaster 195 Risk Assessment 197 Disaster-Based Risk Assessment 198 Asset-Based Risk Assessment 198 Business Impact Analysis 198 Legacy Systems 199 Points of Failure 200 Recovery Point Objective (RPO) and Recovery Time Objective (RTO) 201
xiv
Contents
Data Synchronization Point 202 Unseen Fail Points 202 Disaster Recovery 203 Disaster Recovery Team 204 Disaster Recovery Plan 204 Elements of a Disaster Recovery Plan 206 Assumptions 206 Risk Tolerance 206 Risk Management 207 Detail Analysis Is Critical 208 Low-Level Focus 209 Disaster Recovery Options 210 Service Level of Agreement 213 Disaster Recovery Operations 214 Emergency Operations Center (EOC) 215 Downtime Procedures 215 Contact Lists 216 Disaster Drills 216 Chapter 8: Vendor Negotiations and Management 219 Procurement 219 Procurement Process 220 Finding a Vendor 220 Contacting Vendors 221 The Proposal 222 Risks of Procurement 223 Negotiation 224 Preparing for Negotiations 225 Negotiation Strategy 226 Value 227 Do the Math 227 Payment 228 General Contractor 229 Face-to-Face Negotiation 229 Negotiating Terms 230 Terminate Negotiation 231 Conflict Resolution 232 Fact Finder, Mediator, and Arbitrator 232 Suing 233 The List 233 Stages of Adoption 234 Contract 235
Contents
xv
Elements of a Contract 236 Breach of Contract 237 Industry Standard 238 Contract Interpretation 238 The Uniform Commercial Code (UCC) in the US 239 United Nations Convention on Contracts for the International Sales of Goods (CISG) 239 Warranty 239 Remedies 240 Modifying a Contract 241 Memorandum of Understanding 241 Contract Termination 241 Working the Contract 241 Reasonableness 242 Penalty Clause and Performance Incentives 242 The Contract Manager 243 Service-Level Agreement 243 Chapter 9: The Importance of Cloud Computing 245 It Can Rain Too 245 Governmental Access 246 The Cloud and Data Science 246 The Cloud Services 247 The Private Cloud 249 The Public Cloud 250 Hybrid Clouds 250 Why Implement a Cloud? 251 Why Not Use the Cloud? 252 Mitigating Risk 254 The Cloud Life Cycle 256 Cloud Architecture 257 Serverless Computing 259 DevOps 260 The DevOps Maturity Model 262 Compliance 263 Cloud Security 263 Levels of Security 264 Chapter 10: Decision Support Systems, Data Analysis, and Big Data 267 The Decision Process 268 Business Intelligence 271 Data and Business Analytics 271
xvi
Contents
Technology Supports the Decision Maker 271 Simon’s Decision-Making Process 272 Business Reporting 273 Performance Dashboard 273 Models and How Models Are Used to Help Make Decisions 274 Mathematical Models 275 Certainty and Uncertainty 276 Decision Tree 277 Search 277 Simulation Model 277 Automated Decision Systems and Expert Systems 278 Knowledge Management and Collaborative Systems 280 Data Warehousing and Data Mining 280 Text Analytics, Text Mining, and Sentiment Analysis 282 Web Analytics, Web Mining, and Social Analytics 283 Big Data and Analytics 284 Chapter 11: Forensic Computing and How People Can Be Tracked 287 Protecting Your Computing Device 289 The Legal Environment 289 Criminal Trial 290 Civil Trial 291 Decisions and Appeals 292 Evidence 293 A Computer Forensics Investigation 294 Types of Computer Forensics Investigations 295 Tools of Computer Forensics 296 Legal Consequences of Computer Forensics 296 Conducting a Computer Forensics Investigation 299 Preserving Data Using Write Blockers 299 Hashing 300 Hexadecimal Level of Investigation 301 Offset: Locating Data 302 Mounting: Hiding Data 303 Bit Shifting 304 Bit Flipping 304 Live Data Acquisition 305 Remote Acquisition 306 Deleted Data 307 Anti-Forensics Tools 307 Cell Phones 308
Contents
Appendix A: Information Technology Auditing 311 The Information Technology Audit Process 311 Auditing Technology 313 Controls 314 COBIT 315 The Audit Charter 315 The Audit Committee 316 Preplanning the Audit 317 Audit Restrictions 318 The Audit Planning 319 Tasks and Subtasks 320 Duration 320 Dependencies 321 Critical Path 322 Resources 322 Resource Cost 323 Cost of the Audit 323 Responsibilities of the Auditor and Auditee 324 Audit Risk Assessment 326 Types of Risks 327 Audit Quality Control 328 Techniques to Ensure a Quality Audit 329 Data Collection 330 Review Existing Controls 332 Evidence Life Cycle 332 Identifying Evidence 333 Grading Evidence 334 Material Relevance 334 Competency of the Evidence Provider 334 Evidence Independence 334 Recording Evidence 335 Analysis of the Evidence 335 Preparing Audit Documentation 336 Audit Samples 336 Statistical Sampling 337 Non-statistical Sampling 337 Compliance Testing 337 Substantive Testing 338 Reporting Audit Findings 339 The Audit Report and Stakeholders 340 Detecting Irregularities 341 Index 343
xvii
Introduction The tidal wave of technology over the past decade is rapidly changing the world in which we live. Automation and computing power have led to massive changes in virtually every job category. The need for re-education in areas that were formerly thought to be “safe” has become intense. Obviously, the rapid change in all aspects of medicine and scientific research are well worn examples, but as a prime example, marketing is no longer just the 4Ps. Advertising, marketing and social media campaigns driven by data analysis, artificial intelligence and other applications are changing the landscape completely. In criminal law, forensic technologies must be understood in detail. In most factory and warehousing environments, robotics, nanotechnology, packaging and delivery methods have been driven by the desire for efficiency which has often become the difference between success and failure. Most importantly, in all areas of business management, if you are not aware of and able to understand the technologies of today in data analysis, business intelligence, big data, artificial intelligence-based media, supply chain, and more, you really better be great at what you do. In finance and technology, the availability of smartphones, combined with open API technologies, have driven a revolution in investment and banking technologies around the world. These and many other technologies are redefining every business and managers MUST know these technologies and many more to remain competitive. What does this mean for you? It means that those without this knowledge have fewer options and frankly can expect worse pay. A lot worse. Some jobs have simply been eliminated, while others have been given new birth. This trend is going to continue. The demand for people with the right technology skills is gaining intensity and around the world governments are doing what they can to promote technology skills to make their citizens competitive in the worldwide job market. But this is not a book about jobs. It is a book about helping you get a good one. We are not talking here about a basic knowledge of using a cell phone, Word or Excel, though all of those can be impressive and necessary skills for many jobs. We are talking about a broad understanding of technologies that are used today with a focus on business that we believe to be ESSENTIAL to your success in the future. That is why Need-to-Know Technologies for a Successful Future has been written. It attempts to introduce you to a very broad spectrum of computing, information technology, and business technologies. The idea is to provide the most information possible in the least amount of space in a palatable way. Everything you need to know and then some. Caution! Learning—really learning—about technology is addictive— so much so that you might get the “byte” and find technology as a way to spark your career in new directions. If you can perform addition and subtraction and find a way of doing something, then you have the basic knowledge to understand basic computing technology—not a joke. Need-to-Know Technologies for a Successful Future won’t make you a genius, but it demystifies all aspects of computing and related technologies giving you a founDOI 10.1515/9781547400812-205
xx
Introduction
dation to build on your technological knowledge and technical skills and place you squarely on the road to mastering technologies that professionals need to know. You’ll be exposed to a broad array of technologies in rapid succession. The idea is to give you clear and efficient instruction on the foundations that will enable you to pick out areas that may be of interest to you. If you are a business manager, it likely may be more than you bargained for, but if you stay with it, you will learn a lot and be able to understand what your developers, network engineers, and your IT department can do and what they are telling you. In fact, the book was conceived to improve on what has been taught in basic computing or computing and society courses, especially in terms of new technologies, but with far more in terms of breadth of coverage. So, we invite professors to consider it as a colloquium course book or a first course in technology at the college level. But the book was written with a goal to provide people of all ages with an opportunity to improve their chances of success in the future. Some may wonder why we chose to present topics in a certain way or in a certain order. In part it was to make what might have been a deadly dull book more interesting and part was to weave in different aspects of technology that most people do not know. Even people in the profession. It is likely that we missed some things or you may want more depth (and it would be next to impossible to include everything). But the book will serve as a launching pad into your investigation into technologies that are in high demand. We hope that you will enjoy the book and it is our sincere wish that if we can improve your chances of success just a bit that the effort to pull these technologies together as efficiently as we could has paid off.
Chapter 1 Talking Intelligently About Technology Technology drives business, and you need to embrace technology to be competitive in the market place. These are words that you’ve heard before; however, you might be less than enthusiastic to apply technology because technology is too complex to understand. It is not, as you’ll discover throughout this book. Technology seems complicated at first, but when reduced to fundamental elements, technology is no more challenging than addition and subtraction. Technology seems magical until you see how the magic is performed—then it all makes sense, and you can put technology to work for you. However, sitting back and hoping that you’ll get by without knowing technology is wishful thinking. Why? Because you are missing career and business opportunities, and someday you’ll wake up one morning to discovery that your value in the market place has diminished. Sears was America’s number one retailer with stores in practically every community across the United States and Canada. And if you couldn’t get to a Sears store, you could order anything from the Sears catalog and have the merchandise delivered to your door. Americans then moved to online shopping, and Sears didn’t. Sears stayed with its time-tested, brick and mortar and catalog sales model rather than embracing technology to become a leading online retailer. The rest is history. Kodak was synonymous with photography. Kodak cameras and film were used in every household—in the United States and abroad—all trying to capture that Kodak moment. Commercial photographers didn’t use Kodak cameras, but they did use professional-grade Kodak film. Kodak processed film too. Kodak’s business fell apart when digital photography took hold and Kodak failed to embrace digital photography. But this is not what you might be thinking. In 1975, Kodak invented the first digital camera, and then kept it under wraps to protect its film business. Competitors redesigned their business model around digital imaging while Kodak focused its marketing efforts pushing film-based imaging. The rest is history. Xerox was on the forefront of computer technology. Mention Xerox and you think of copiers, yet in 1973 Xerox developed an innovative computer featuring a mouse and a graphical user interface (GUI) that enabled the user to click and drag rather than type commands into the computer. Xerox PARC researchers also developed “what you see is what you get” (WYSIWYG) editing, the concepts of folders and icons, and developed Ethernet computer networking, enabling computers to talk to each other. Xerox showed off the technology to a few interested members of the public that included Steve Jobs, who then hired engineers from Xerox. Xerox continued to see itself as a document company while its computer technology was implemented by Apple and eventually Microsoft, revolutionizing the way everyone interacts with computers. The rest is history.
DOI 10.1515/9781547400812- 001
2
Chapter 1: Talking Intelligently About Technology
There are moments when technology disrupts the ways of business and practically brings down mainstay products. There is simply no longer a need for the product or service. The U.S. Postal Service is a profound example, as email replaced sending letters at the turn of the 21st century. E. Remington and Sons, IBM, Royal, Smith Corona, and Underwood—all leading typewriter manufacturers—realized by the 1990s that word processing software used on personal computers made typewriters obsolete. There was no way to compete with disruptive technology. New technology can practically destroy an industry and then give it life again, as in the recording industry. Vinyl records became the backbone of this industry in 1948. They were relatively inexpensive to make and distribute. Although the size of the record changed over the years, the basic technology remained the same—and secure. There wasn’t any practical way of copying the record. That was until 1964 when cassette tapes took over the record industry. Recording tape was nothing new. Reel-to-reel recording tape was used to record songs that were later transferred to vinyl records for distribution. However, Philips introduced a 30-minute tape cartridge that could be played on portable audiotape players. Cassette tapes sold for a dollar more than a vinyl album. This led the way for equipment manufacturers to sell inexpensive cassette tape recorders that were easier to use than reel-to-reel tape recorders. Consumers could record their own tapes and make copies of their favorite songs from vinyl records and cassette tapes. Music sales gradually dropped in the 1970s as teens used blank cassette tapes to record their favorite recordings. The record industry wanted to impose a tax on blank cassettes that would go to the record industry to replace lost sales. The record industry pushed Congress to pass the Sound Recording Amendment to the 1909 Copyright statue, making it illegal to copy recordings. Technology changed again in 1980 with the introduction of the Compact Disc (CD) and CD players. The CD materially increased the quality of the recording and challenged cassette recordings. The quality of a recording decreased each time the recording was re-recorded on a cassette. The 1990s brought in another recording phenomenon called the Moving Picture Experts Group-1, Layer-3 better known today as MP3. MP3 is a format used to store audio in a computer file that makes it easy for anyone to send the audio file to another computer. The quality of the audio is unchanged. Although efforts were successful by the recording industry to require manufactures to incorporate features in their digital recorders to prevent serial copying, consumers used personal computers rather than digital recorders to play recordings. Small computers called MP3 players were sold that were not considered a digital recorder. The record industry spent the next fifteen years battling file-sharing networks that posted downloadable MP3 files of practically any song; battling peer-to-peer networks that permitted consumers to post and share any files including recordings; and even suing individuals who downloaded recordings from those sites. Then in 2003, the recording industry embraced the new technology and changed its business model when Apple Computer created an online music store, iTunes, that
More than Companies and Industries
3
offered consumers more than 70 million songs to purchase legally at only $0.99 cents each, and the recording industry received a dependable revenue stream. A decade later, the recording industry saw another change in technology: streaming. Consumers can subscribe to services that stream MP3 files to their devices—at home, in the car, on your computer, and on your TV. Instead of selecting a song, consumers can select a music genre, and the need to illegally download music has diminished while the recording industry has found a new revenue stream.
More than Companies and Industries Big-name companies who missed embracing new technology are barely holding on to their business. However, there are countless smaller companies who supplied the industrial giants who lost their business because they didn’t keep up with technology. It is convenient to focus on companies, but we are really talking about you: employees—throughout the ranks—who too have not embraced new technology. Companies that miss out on business because new technology got away from it survive in some form. Employees don’t fare as well unless they gain a firm understanding of new technology. You need knowledge of how new technology works and how new technology can enhance business operations. You don’t need to know how to implement new technology. Technology changes rapidly. What was science fiction yesterday is in every home today. Knowing how new technology works broadens your imagination, enabling you to come up with ideas to change business and then let the team make those ideas into reality.
Make Your Life Easier A last-minute review of your lengthy proposal by your colleague reveals that you incorrectly misspelled the product name . . . hundreds of times! A nightmare for sure. Fixing the problem could take hours or a few seconds, depending on your knowledge of how to use everyday technology. The brute force fix is to carefully proof the proposal word by word, hoping to find all the misspellings. Alternatively—if you know how to use technology—you could use the find-and-replace feature of the Word document and let Word do the work without missing a misspelling. Every day technology illustrates the importance of embracing technology and putting technology to work for you. Basic office software products such as Word and Excel have a wealth of features that make your life easy if you know how to use them. Fundamental knowledge of Excel is all that is needed to perform simple math and develop your own complex formulas that streamline your job. There’s the brute force way of entering each calculation into a formula or the smart way of using one of the
4
Chapter 1: Talking Intelligently About Technology
many formulas and built-in functions available in Excel. The smart way requires you to supply the necessary values and let Excel do the work for you. There is a broad range of built-in functions available for you, from simple addition functions to sophisticated financial, engineering, and statistical formulas. All you need to is to know how to use the technology. You can have Excel make decisions for you using logic statement as part of your formula. Logic statements such as “if . . . then” include a condition that you create. Excel “looks” at the data and performs tasks specified in the logic state based on your condition. You can have Excel perform a calculation and then do something based on the result (condition) such as calculate a salary increase based on an employee’s performance appraisal. You don’t make the decision. You tell Excel how to make the decision and then let Excel do the work—if you know how to use the technology. Repetition is a tedious aspect of every job. You may want to hand off the task to a colleague, but there is a risk that your colleague will make errors. You could continue using the brute force way—doing it yourself—or the smart way, by automating the process using a macro. A macro is a series of instructions stored under a name. Enter the name when you want to execute the macro and each instruction in the macro executes in sequence. A macro can be recorded—start recording, perform each task, then end the recording and give the macro a name. A macro saves time and ensures that a process is repeated accurately by anyone who knows how to run the macro. Word, Excel, and other software products let you automate processes through the use of a macro—if you know how to use the technology. You can create your own software application (kind of) using Visual Basic for Applications (VBA), which is available in Office. VBA is a tool built into Office that enables you to create a professional-looking software application by dragging and dropping (and perhaps writing instructions here and there). Your software application has all the bells and whistles (push button, radio buttons, drop-down lists, scroll bars) found on your favorite commercial software application. It’s a kind of software application because it works only with Word, Excel, and other Office applications. It has the look and feel of real software application built by software engineers, but only works if you have Office. It simply provides a fancy way to use Office (the interface). This is all possible today—if you know how to use the technology.
From Idea to Reality The marvels of today started as an idea, a dream, someone’s curiosity, and a will to make something better, often through technology or simply by finding a better, more efficient way. Visionaries are the dreamers with a general knowledge of technology; who therefore know their ideas are possible; and know how to convey their ideas to engineers who make those dreams a reality. It was once said that Steve Jobs wanted a computing device in the palm of his hand that could play music, store and retrieve
Finer Details of Your Idea
5
information, and let him speak to anyone in the world with the swish of his thumb. Jobs was not an engineer or designer; however, Jobs knew technology and was able to explain his vision to engineers and designers who then created the iPhone. The pathway from an idea to reality is at times long and challenging both for the visionary, designers, and engineers. Some wishes seem impossible to grant at first but come about after head scratching and thinking out of the box. The glass face on test models of the iPhone was prone to scratches. Jobs presented the problem to Wendell Weeks, the CEO of Corning. Corning engineers developed a highly durable glass product called Gorilla Glass, which became the face of the iPhone and other similar devices. An idea can translate to a goal, and then an outcome of a process, such as entering text into a smartphone. The first step on the road to making this idea a reality is to describe the process of how you want to enter text. This group of steps in the process is referred to as an algorithm. An algorithm is a process for doing something. You develop algorithms all the time probably without realizing it. Any time you come up with a way of doing something, you’ve developed an algorithm. An algorithm is described at a level of detail that is needed to convey your idea. You might say, “Drive to the supermarket and buy a loaf of bread.” This is a high-level description of the algorithm to purchase a loaf of bread and is fine if the person knows how to find the supermarket and knows how to drive. If not, a more detailed description of the algorithm is necessary. Drive down Gordon Street to the corner; make a right turn; drive three blocks; and make a left into the supermarket parking lot. This provides a level of detail missing from the original description, yet still describes the algorithm. It assumes the person doesn’t know how to find the supermarket but does know to how to drive and knows how to make the purchase. Additional levels can be used to describe the algorithm in finer details until nothing is assumed. Everything needed to complete the process needs to be defined. The lowest level contains the logic of each step in the process. A leveling diagram (see “The Design Process: Sketch the Idea” later in this chapter) is used to define an algorithm. The highest level presents an overview that is easily understood by everyone. The lowest level contains the details needed perhaps by engineers to translate the process into a reality. Levels in-between provide details needed to explain the algorithm to others who are affected by the process, referred to as stakeholders, who may require more information than is provided by the overview and less information than is provided by the fine details of the process.
Finer Details of Your Idea “The devil is in the details” is a true expression, especially when translating your idea into plans to make your idea come to fruition. You probably don’t know what it takes to make your idea a reality. Jobs didn’t know how to make scratchproof glass for the
6
Chapter 1: Talking Intelligently About Technology
iPhone, but he knew that scratchproof glass was necessary. Engineers figured out the rest by identifying the finer details Jobs couldn’t identify. A systems analyst (see “The Project Team” later in this chapter) is the type of engineer who translates an idea into step-by-step instructions, thinking through every aspect of the idea down to its finer elements so that other engineers can make it a reality. Translation begins at the highest level with you telling the systems analyst your idea in your own words. The systems analyst identifies processes and sub-processes and sub-sub-processes. A process is a series of steps that produces a desired outcome. The lowest level contains the logical steps necessary to complete the subsub-process. Logical steps are written in a language referred to as pseudo code (see Pseudo Code) consisting of conditional statements (if . . . then) and English words, making it easy for anyone to understand the detailed steps in performing a process. Your idea is only the starting point. No matter how well you thought through your idea, the systems analyst will verify everything by exploring each element of your idea with subject matter experts who know more than you about the elements of your idea. A subject matter expert is anyone who can shed light on an aspect of a process. For example, an accounts payable specialist in your organization is a subject matter expert on how the current accounts payable process works as compared with the accounting manager, who is likely to have a general knowledge of the accounts payable process and at times how the process should work, which is not necessarily how the process is really working. The systems analyst is helped by the accounts payable specialist to validate the pseudo code of your process, looking for confirmation that each step is feasible and to identify missing steps in the proposed process. Transforming your idea into a reality is a team effort where systems analysts, engineers, stakeholders, and subject matter experts come together to work out details, identifying and resolving problems along the way. Maybe you are the visionary and the team of experts assesses the feasibility of the vision and together brings it to reality. Or perhaps you are a part of the team.
The Business Case The idea must be a worthwhile venture before an organization invests time and money to bring the idea to fruition. You convince the organization that your idea is a good investment by building your business case. The business case is the reason why the organization should invest in your idea. The organization’s leadership sets goals for the organization and develops a strategic plan for reaching these goals. The business case illustrates how your idea complements the strategic plan—how your idea helps the organization achieve its goal. There is a structure within the organization for how leadership evaluates ideas proposed from within the organization and from outside the organization. Formally this is a component of enterprise project management. Enterprise project manage-
The Project
7
ment is a way that projects can be selected and controlled within the organization. Each project sponsor presents an idea to leadership in the form of a business case. Proposals that make the strongest business case are approved and leadership issues a project charter—the formal authorization of a project. There are many ways in which leadership selects worthwhile ideas. Some ways are strictly based on the financial value to the organization or the return on the investment. Other ways use a project prioritization worksheet that considers more than an idea’s financial value. A project prioritization worksheet consists of selection criteria such as strategic value, ease, financial benefit, cost, and resource impact. Each member of the leadership team scores each business case using the grading system (Table 1.1). Priorities for each criterion are summarized in an overall score. Business cases are then ranked and a selection is made. (Of course, ideas proposed by top leadership members such as Steve Jobs probably get approved without going through a vote.) Table 1.1: Project prioritization worksheet used to rank business cases. Strategic value: The importance of the idea to the organization’s strategy 1 = Highly important to 5 = Not important Ease: Is the idea easy to bring to fruition? 1 = Very easy to 5 = Very difficult Financial Benefit: Will the idea produce a financial benefit to the organization? 1 = Highly likely to 5 = Unlikely Cost: Is the idea costly to produce? 1 = Low cost to 5 = Highly expensive Resource Impact: What impact will the idea have on resources? 1 = Low impact to 5 = High Impact
The Project A project is created once leadership gives the go ahead to develop the idea. A project is a temporary set of activities that has a beginning and an end and delivers an outcome. It sounds like a technical definition and it is. Jobs idea was to create the iPhone. A project was launched that set out to make the iPhone. Once the iPhone was in production, the project ended. Enhancements to the iPhone became a new project. The authorization of the project is in the form of a project charter. The project charter is a formal document that translates the business case into a general understanding of the project based on what is known. Elements of the charter are described in broad terms to convey the intent of the leadership and serve as a guide throughout
8
Chapter 1: Talking Intelligently About Technology
the development of the project. Details of the project are developed during the planning stage of the project. Elements of the project charter typically include: –– The business case: The charter contains the agreed upon business case that links the project to the strategic goals of the organization. This might be a refined summary of the business case that was proposed to leadership. –– Deliverable(s): The charter contains a statement that describes the project deliverable(s): the expectation that serves as an objective measurement of the project’s success. –– Constraints: The charter contains limitations within which the project must be developed and within which the product of the project must operate based on current knowledge and the leadership’s expectations. Constraints can include budgets, deadlines, resource availability, and existing contracts. –– Assumptions: The charter contains a statement of assumptions that were understood when the leadership approved the project. An assumption is a projection of fact that is not verified at the time when the charter is written and becomes the foundation for making decisions. Some assumptions may be proven invalid during the execution of the project, at which time the project manager and project sponsor may need to re-evaluate the feasibility of the project. –– Risks: The charter contains a statement that identifies potential risks that may jeopardize the project. Based on the risk tolerance of the organization defined by leadership, the leadership identifies and accepts potential risks stated in the charter. The project manager must take steps to mitigate those risks during the execution of the project plan and address contingencies for risks that might arise after the product of the project is delivered. –– Stakeholders: The charter may contain a list of stakeholders. A stakeholder is a person who may have an interest in the project or whose work activity may be affected by the development of the project or the product that results from the project. This is a preliminary list of stakeholders that changes once the project is underway. –– Change control system: The charter may identify the organization’s change control system, if one exists. A change control system is a structure that is followed to implement a material change to the organization or change elements of the project once leadership approves the project. –– Project impact statement: The charter may include a statement that clearly acknowledges how the project will affect other initiatives within the organization, including other projects. The project impact statement helps project sponsors and project managers of those initiatives set priorities when initiatives have competing interests. –– Operational procedures: The charter may reference the organization’s operational procedures that must be followed during project development. These can include
Project Management
9
purchasing, financial control, and purchasing procedures. The charter refers to the related organizational policies and procedures rather than defining those procedures in the charter. –– Budget: The charter may set a magnitude budget for the project, generally based on the expected value the product of the project will add to the organization’s assets. The business case usually contains a cost benefit analysis that projects an estimate of investment and expected returned value. The returned value may or may not be a dollar value.
Project Management Making your idea a reality involves time and resources to flush out every detail of the idea and then come up with tasks that are necessary to bring the idea to fruition. And of course, hiring the right people, giving them the right tools, and providing the financing to make this all happen within an acceptable timeframe. This process is called a project and managing the process is called project management. Project management is a discipline that uses a systematic approach to transform the raw concept into a reliable asset to the organization. The approach is much like a waterfall where there is a linear sequence that moves from one phase to the next, eventually ending in a working product. This is referred to as the waterfall model. Let’s take a look at the inner workings of project management. The pathway to making your idea a reality is called the development process. It is also referred to as the project life cycle and consists of five stages: initiation, planning, execution, monitoring/control, and closure. Project initiation occurs once leadership approves the project; issues a project charter; and the project sponsor appoints a project manager. The project manager then takes the lead to manage the project through its other stages. Once assigned, the project manager carefully examines the project charter to ensure there is a clear understanding of what leadership expects during the project and with the project deliverables. Not all the information needed to assess the project is in the project charter, however. The project charter clearly delineates authority to manage the project, especially discretionary powers such as freedom to hire, manage, and terminate members of the project team. There is a clear statement indicating when the project manager is required to receive approval of the project sponsor and when the project sponsor requires approval from leadership. The project charter also identifies the process for resolving conflicts. The conflict resolution process clearly specifies the decision maker who will resolve conflicts; the process for presenting the conflict to the decision maker; and the timeline for resolving the conflict. The conflict resolution process should be in place before the project is launched.
10
Chapter 1: Talking Intelligently About Technology
The project manager reviews completed projects in the organization that can provide insights into how well those projects met expectations. Past projects can provide a wealth of lessons learned and provide a project template for future projects. A project template is project plan that contains tasks, subtasks, durations, dependencies, and estimates based on empirical data from previous projects. This can be used as a guide to determine if high-level schedules and cost estimates contained in the charter are realistic. Looking back to completed projects also gives a glimpse into relationships among leadership, project sponsors, the project team, and stakeholders. Past relationships might indicate relationships in the new project. The project manager understands what needs to be performed to complete the project, but they do not need to develop a complete project plan at this point. The project manager walks through general tasks necessary to meet expectations of leadership as stated in the project charter. Each general task is characterized as feasible, questionable, or not feasible based on the project manager’s experience, the organization’s culture, and outside influences. Further analysis is necessary to clarify questionable general tasks. Concerns about general tasks that are not feasible are discussed with the project sponsor and leadership before moving to planning. The project manager identifies and assesses the impact that current and potential projects throughout the organization might have on the new project. Competing interests for scarce resources from other projects might interfere with the project manager’s capability to deliver the product specified in the project charter. The project manager meets with stakeholders to identify stakeholder alignment and expectations. Stakeholders are characterized as supporter, neutral, or detractor. Further assessments are made if stakeholders are biased against the project to determine the reason for the negative opinions and the feasibility of changing those opinions. The project requires stakeholders’ support to succeed. Stakeholder expectations are incorporated in the project specifications. Meeting expectations determine whether the project is successful or not. A successful project is one that meets expectations. During the course of the project, stakeholder expectations are managed by the project manager to ensure that deliverables meet expectations. The project manager also enlists influential stakeholders who might assist with expectations of other stakeholders.
The Project Team The project sponsor presents the business case and advocates for the implementation of the idea to leadership and is authorized to oversee the project and to form a project team to develop the idea. Consider yourself as the owner of the project. The project team is led by a project manager usually of your choosing. A project manager is the person responsible for planning the project and executing the plan to deliver the end product.
The Project Team
11
Certified Project Manager Some project managers are certified as Project Management Professional (PMP) by the Project Management Institute, which is a nationally recognized credentialing organization, and follow project management standards specified in the Project Management Institute’s The Project Management Body of Knowledge (PMBOK). Other project managers are not certified but usually have the experience to form a project team and manage a project.
Those who use your idea—once it is fully developed—are considered customers. Customers can be in-house staff who use the idea as part of their job. Customers could also be actual customers not employed by the organization. Regardless, customers are part of the project team and provide valuable input on how they expect the product or service to work. Small groups of customers called a focus group may be formed at various stages of the product to evaluate the idea and prototypes of the product or service. Their recommendations help other project team members to finetune the project. The development team consists of a group of developers, engineers, and technicians responsible for transforming the idea into a reality. The development team also consists of one or more systems analysts whose job it is to identify and translate requirements into specifications that tell everyone how the idea should work. The systems designer then designs the idea, providing technical specifications for the engineers who will build it. There may be many systems analysts, designers, and engineers on the project team, each focusing on one or more aspects of the development. Projects may be created to build any number of products. Often though, they involve the use of software, or the end product includes or actually is some sort of software solution. For example, a typical computer application has a user-interface (screens used to interact with the application); back-end processing (the process that occurs after you click the OK button); and the database (the place where data is stored). Each requires expertise in that area. RACI Chart A common technique a project manager uses to organize the project and the project team—and reduce opportunity for conflict—is to develop a RACI chart for the project. A RACI chart defines: –– Responsibility: Each task is assigned to a resource or a group of resources who are responsible for completing the task. –– Accountability: A resource is assigned accountability for the delivery of the task and that the task meets expectations. –– Consulted: The project manager identifies resources who must be consulted about how a task is performed. –– Informed: The project manager identifies resources that must be kept informed on the progress of developing the task. For example, a work group may be responsible for developing specifications. The project manager, the work group leader, and members of the work group are held accountable that the specifications are developed and meet expectations. Stakeholders must be consulted during the development of
12
Chapter 1: Talking Intelligently About Technology
specifications. The project manager and project sponsor must be kept informed about the progress of developing the specifications.
Virtual Project Teams The project team typically consists of a small group that is embellished by a virtual project team. A virtual project team consists of subject matter experts who join the project to address a specific issue related to the subject matter. Once the issue is resolved, the subject matter expert returns to normal duties. Virtual project teams offer flexibility and reduced cost because the size of the team is scalable to the needs of the project at that moment in time. Issues can be addressed without hiring full-time staff and existing staff subject matter experts can be temporarily assigned to the project as needed. Virtual team members can work part-time on the project or on ad-hoc committees as a consultant. The primary drawback of using virtual teams is that they have little or no loyalty to the project because they focus on one aspect of the project. Virtual team members share little experience and friendship with other team members that could lead to limited trust.
Shared Values The success of the project depends on shared values among members of the project team. Shared values are the elements that bond together the team: –– Trusting: There must be mutual trust among the project team, stakeholders, and the project manager; otherwise, mistrust may perpetuate and build during the project, presenting a high risk for failure. –– Openness: There must be a free flow of information among the project team. All information must be presented factually without embellishment and without half-truths. –– Proactive: Each member of the project team must be proactive in identifying problems and suggesting solutions. –– Participative: Each member of the project team must have meaningful participation in the project. Each must help work toward completing the project.
Relationship Capital The project manager brings to the project team members who are influential within the organization and have a professional or informal relationship with stakeholders.
The Project Plan
13
Their reputation establishes creditability and trust and is referred to as relationship capital that can be used to effectively manage the project. The relationship has intrinsic value. For example, the stakeholder may have doubts about the project until a respected colleague endorses the project and asks the stakeholder to reserve judgment until evidence supporting the project materializes. The stakeholder is likely to give the benefit of the doubt and consider the project worthwhile—for now. The relationship prevents concerns from escalating and negatively impacting the project. Relationship capital diminishes in value each time the relationship is used to enlist support. At some point the stakeholder wants to see the issue resolved. Relationship capital is accumulated over time by delivering on promises and speaking the truth. If an issue is resolved, then the promise is kept and the project manager’s relationship capital becomes stronger than when the project began. Until that time, the project manager needs to bring on the staff who already have relationship capital built throughout the organization.
The Project Sponsor and the Steering Committee You as the project sponsor are an integral member of the project team and more than the overall boss of the project. You must be proactive and become an active participant in the project when necessary. You make critical decisions quickly to keep the project on target. For example, you might need to choose between delaying the project or authorizing expenditure to bring in a vendor to pick up the slack. You are also called upon to intervene in conflicts among stakeholders. This is referred to this as the Captain Kirk (Star Trek) project sponsor. Captain Kirk was noted for beaming down to personally resolve difficult issues on a planet. A steering committee is a group of advisors composed of influential leaders within the organization who know policies, procedures, and workflows from both a theoretical and pragmatic position. Steering committee members are middle-managers and other recognized leaders within the organization. Collectively, they have the experience to either directly respond to issues or know the right parties who can respond to issues. The steering committee is the sounding board where issues and proposed solutions are presented; steering committee members also provide advice to the project manager and the project team. The steering committee also monitors key project metrics to determine if the project is on track.
The Project Plan The project manager’s focus is to develop a project plan that lays out the direction as to what needs to be done; when it needs to be done; how long it will take; who is going
14
Chapter 1: Talking Intelligently About Technology
to do it; and its cost. This is a massive responsibility because the project plan dictates how to make the idea a reality. The challenge is to understand the details of the idea enough to develop the initial draft of the project plan. The project manager begins developing the project plan by using the work breakdown structure. The work breakdown structure is a process of decomposing a simple request into smaller and smaller components that are easy to comprehend and analyze. The process begins with the whole—the outcome—such as the iPhone. The whole is dissected into logical components, and then each component is dissected into smaller components. These are called levels of the work breakdown structure. Work breakdown levels are related using a number system. 1 is always the whole. 1.1, 1.2, and so on are second-level components. The progression continues until there is a manageable component that can be easily analyzed. This is called a work package. Each work package is analyzed to determine tasks that are necessary to complete the work package. It is at the work package level where all the work is performed. Think of a work package as a piece of the whole thing. When one work package is completed, then one piece of the whole is completed. The entire project is completed when all work packages are completed. Completed work packages are then assembled to form the whole—for example, the iPhone.
Project Phases, Milestones, and Deliverables The project can be divided into phases. A phase is delivering a discrete product that can stand alone without the need of future phases. For example, the initial phase of the iPhone may require plug-in earphones. A later phase might use wireless Bluetooth earbuds. Each phase has a start and finish date and its own project plan, schedule, and budget. The project team moves on to the next phase of the project once a phase is completed. Progress toward the end of the project is measured in milestones and deliverables. A milestone is an event that identifies a significant accomplishment in developing the project. For example, transmitting a text message in a prototype iPhone is a milestone. A deliverable is the result of performing a task. For example, the design team produces the design of the iPhone case. This is not a milestone, because it doesn’t show significant progress toward the end of the project. A milestone is a measurement of significant progress. A deliverable is the result of a task.
Work Package: Tasks and Subtasks A work package is divided into tasks. A task is an action that has a beginning and an end that produces a result. A task can be divided into one or more subtasks. A subtask
The Project Plan
15
is an action that has a beginning and an end and produces a result that is used as an action of another task. A task can be related to other tasks. For example, one task may not be able to start until another task finishes. Meaningful examples are called task relationships. The project manager identifies tasks and subtasks and the relationships between them.
Sequencing Tasks Once task relationships are identified, the project manager then determines the sequencing of tasks by deciding which tasks should be performed first, second, and the tasks that can be performed concurrently. Sequencing is based on dependencies of tasks. Task sequencing impacts a project schedule.
Duration Duration is the time to perform the task plus elapsed time. For example, a task might take one week to complete but the duration may be two weeks due to resource availability. Duration is estimated by a standard measurement of time that is determined at the beginning of a project. Duration can be measured in days—an 8-hour day or a 7.5-hour day—or work hours, which are referred to as effort. Determining duration is challenging if the project manager never before performed any tasks. Here are common techniques used to estimate duration: –– Prior Experience (analogous) Estimating: No two projects are alike. –– Historic Statistical Relationships (parametric) Estimating: History may not repeat itself. –– Expert Judgment (Subject Matter Expert): The expert may not have the time or inclination to gather all the facts to produce a realistic estimate. The expert may not be an expert and may have no vested interest in the estimate—there is nothing to lose if the estimate is wrong. –– Bottoms-Up (Decomposition) Estimate: Analysis of the details of all tasks and subtasks takes time. The best approach is to ask the person who is going to perform the task how much time is necessary to perform the task, because the person has something to lose if the task is behind schedule. Any assumptions made by the person giving the estimate— such as hours in a day, availability of staff, and availability of resources—are noted and monitored during the project. Changes in the underlying assumptions might affect the actual duration of the task. A task may be effort-driven (of variable duration) or duration-driven (fixed duration). Effort-driven means more resources can be assigned to the task to reduce
16
Chapter 1: Talking Intelligently About Technology
duration. Duration-driven means that adding more resources to the task will not reduce duration.
The Schedule The schedule of when the idea becomes a reality is established once tasks and subtasks are identified, the duration is established, and relationships among tasks are defined. The schedule specifies when each task begins and ends and collectively when the project begins and ends. The schedule is also used as the basis for the project manager to estimate the impact a change to a task has on the project. Tasks that, if delayed, delay the project are identified as being on the critical path. The critical path is tasks that impact the delivery of the project. Increasing the duration of a task that is on the critical path extends the delivery of the project. Shortening the duration of a task that is on the critical path may shorten the delivery of the project, although tasks tend to fall off the critical path depending on the schedule. Shortening a task that is on the critical path may cause the task to fall off the critical path and another task take its place on the critical path rather than speed up the project delivery. Project management software such as Microsoft Project is used to develop and manage the project plan. Tasks and subtasks are entered into project management software that automatically generates an image of the schedule in the form of a Gantt chart (Figure 1.1). A Gantt chart lists tasks and subtasks and plots durations as bars across a calendar. Dependencies are shown as a line that connects the bars of the tasks. Changes made to tasks are automatically reflected in the Gantt chart.
Figure 1.1: Tasks, duration, and dependencies are viewed on a Gantt chart.
The Project Plan
17
Resources As mentioned earlier, a resource is a thing or person required to complete a task or subtask. The project manager identifies all resources needed to complete each task— some resources are obvious and others are easily overlooked. For example, a systems analyst is a resource necessary for identifying specifications (task) for the project. Less obvious are the resources that the systems analyst requires to perform the task—such as a computer, a telephone, office space, a desk, chair, electricity, heat, air conditioning, lights, parking space, access to the facility—which are all typically assumed to be available but may not actually be available. Tasks necessary to acquire a resource are also part of the project plan. Here’s a partial list of tasks that might be required to acquire the systems analyst. Each task has duration, dependencies, and its own resources that are necessary to perform the task. Some tasks such as acquiring office space may have a six-month duration and affect the completion of the project. –– Write a job description. –– Set a salary range based on the healthcare facility’s salary scale. –– Write the justification for the position. –– Submit a form for new headcount for approval. –– Submit a request for a new employee to Human Resources. –– Human Resources posts the job position in-house for ten days. –– Human Resources reviews in-house applicants. –– Human Resources posts the job position for non-employees. –– Human Resources reviews resumes. –– Human Resources gives the project manager five qualified candidates. –– Human Resources interviews candidates. –– The project manager interviews candidates. –– The project manager tells Human Resources to hire one candidate. –– Human Resources sends the candidate an offer and terms of employment. –– Human Resources negotiates terms of employment with the candidate. –– The candidate accepts the offer. –– A start date is set. And then there is always the chance that a resource is inappropriate for the task—for example, the systems analyst doesn’t work out and it becomes known several weeks into the job. During this honeymoon period the project manager decides if the new systems analyst remains on the project team and it becomes apparent that the systems analyst is not a good fit. If the systems analyst is terminated, then the hiring process starts from beginning, delaying the project.
18
Chapter 1: Talking Intelligently About Technology
The Resource List The resource list contains resources that can be assigned to tasks. Think of this as a roster of players who are ready to go into the game. The resource list contains a resource by title, the availability of the resource to the project, and the cost of the resource. The availability of the resource is critical because not all resources are available to work 100% of their time on the project. The cost of the resource enables the project manager to assign the most economical resources to a task. Resource allocation is the process of assigning resources to tasks. Only resources that appear on the resource list can be allocated to a task because they have been identified and acquired to work on the project. The project manager assesses the allocation of resources using the allocation chart after allocating resources to tasks. The allocation chart shows the percentage of the resource’s available time that is allocated each day of the project. An allocation over 100%—called overallocation—indicates that the resource is asked to perform more than a day’s work in a day. In this case, the project manager then reallocates resources as necessary to avoid overallocation, or asks the resource to work overtime. Once resources are properly allocated, the project manager produces a work schedule for each resource that clearly shows what the resource is scheduled to do and when to do it.
Resource Usage The project manager controls the cost of a project by adjusting resource usage. Here are techniques used to adjust resources: –– Use a less expensive resource: This may lower the cost of performing a task; however, the project manager must assess the impact on the quality of work and if the resource may be less productive, leading to increased duration of the task. –– Use a more expensive resource: This may increase cost of the task; however, productivity may increase and the duration of the task may decrease. –– Overallocate: A resource that works overtime may decrease duration of a task; however, the cost of the task may increase. –– Outsource: Assigning a task or a group of tasks to another organization may decrease overallocation of existing resources and provide an alternative to the time-consuming task of hiring a resource. However, outsourcing may decrease control the project manager has over performance and may increase costs since outsourcing typically requires a priming price for the service. –– Reassign resources: Resources can be reassigned from tasks that are not on the critical path to tasks that are on the critical path in order to keep the project on schedule.
Cost to Bring an Idea to Reality
19
Cost to Bring an Idea to Reality How much will it cost to bring your idea to fruition? There are two techniques used to estimate cost: a top-down estimate, and a bottom-up estimate, sometimes referred to as zero-based budgeting. A top-down estimate is based on a gross comparison of the idea to other similar ideas that have already been implemented and whose actual cost is known. The estimate is adjusted for escalated costs over the years since the implementation and for contingencies. A top-down estimate is arrived at quickly and isn’t expensive to produce. The drawback is that no two ideas are exactly the same, making the top-down estimate less accurate than the bottom-up estimate. The bottom-up estimate technique requires the project manager to identify tasks, duration of tasks, and resources for the project first, and then summarize the estimate cost of each task into the cost estimate of the project. The bottom-up estimate is more accurate than the top-down estimate because the cost analysis is based on your idea, not a different idea. The drawback is that the bottom-up estimate technique is time-intensive and costly to produce. Both cost-estimating methods are used. The top-down estimate is used to produce a magnitude estimate commonly called a “ball park” estimate to determine if your idea is economically feasible. This is similar to finding the price range for a car to determine if it is within your budget. Once your idea is approved by leadership based on the top-down estimate, the project manager uses the project specifications to develop the bottom-up estimate technique. Estimates are divided into developmental cost and ongoing cost. Developmental cost is the expense needed to bring your idea to fruition. Ongoing cost is the annual expenditure to maintain the product or service for a projected five-year period. The projection reflects changes to the organization required to support the product or service. Costs are monitored throughout the life of the project and the bottom-up estimate is revised as necessary as the project incurs real expenditures. The difference between the estimate and the actual cost is called a cost variance. Although the goal is to have zero cost variance, but there will always be a cost variance because not all factors were known when the estimate was created. In addition, assumptions that are the basis for the estimate may change during the course of development. A new cost estimate is required each time the scope of the project changes to reflect the financial impact of the change. Cost variance is measured using a cost performance index (CPI) and is calculated using the following formula: CPI = Estimated Cost/Actual Cost. A perfect CPI is 1. Less than 1 means that the actual cost is higher than the estimated cost. More than 1 means that the actual cost is lower than the estimated cost. Project managers use the CPI throughout the development process to gauge. A cost variance of +/- 10% is generally acceptable. A greater cost variance requires the project manager to justify the difference.
20
Chapter 1: Talking Intelligently About Technology
Tools to Manage Development The project team begins developing your idea once planning is completed. The project manager uses a tool set to manage development and keep the project moving according to the project plan. A project management tool such as Microsoft Project contains the primary tool set used by project manager. The tool set consists of: –– The scope statement: The scope statement is a brief statement—a sentence or paragraph—that specifies the goal of the project and reminds the project team of the goal. You may see the scope statement as a banner or poster in meetings or on the header of every document related to the project. –– The project charter: The project charter is the contract between the project team and the project sponsor/leadership team and lists the goals, constraints, and other terms within which the project is developed. The project charter is referenced before decisions are made to ensure that the decision complies with the contract. –– The project plan: The project plan is the step-by-step guide on how the idea is transformed into reality. The project plan is also referenced before any decision is made about the project. A key element of the project plan is the critical path because it contains tasks where changes in duration may influence the duration of the project. –– The Gantt chart: The Gantt chart is a visual image of the project plan showing all tasks, duration, dependencies, and resource allocation and serves as a touchstone for the project team during the development of the project. –– The schedule: The schedule is a list of task assignments along with start and end dates for each project team member. It is distributed to team members weekly or monthly. –– Status reports: Status reports are updates on how well the project is following the project plan. There are various status reports, each designed to meet the needs of the project team and stakeholders. At the beginning of the project, the project manager identifies who requires updates on the project and the type, format, frequency, and method of delivery of the report.
Change Management In the ideal world, you are able to provide the project manager with all the information needed to bring your idea to fruition. However, this rarely occurs. There are unforeseen challenges during development that aren’t anticipated during planning that lead to changes in the project charter and the project plan. In addition, analysis during development may suggest improvements to your idea. A change management process is created at the beginning of the project to manage these changes.
Risk and Managing Risk
21
Change management is a process that is initiated at the beginning of the project to deal with changes to the project scope once leadership approves the project. The change management process requires that an impact analysis be performed on each proposed change to assess the impact the change will have on the project. The impact analysis defines new tasks required to make the change including duration, resources, and costs. The person proposing the change, called “the owner of the enhancement,” presents the advantages and disadvantages of incorporating the change to the change management committee, usually consisting of the project sponsor and the leadership team. The change management committee reviews the analysis and approves or rejects the change. Once the change is approved, the project manager revises the project plan, deliverables, milestones, and other aspects of the project to incorporate the change into the project plan. The existing project plan and budget are replaced with the revised project plan and budget. Previously announced schedules, deadlines, and budgets are ignored because there are new schedules, deadlines, and budgets based on the revised project plan. Change management provides “scope creep,” where additional requirements are added to the project without considering the impact the change will have on the project. This avoids new changes being squeezed into the existing project plan and budget, resulting in a missed deadline and an over-budget project.
Risk and Managing Risk Risks are inherent in every project and there is no way to eliminate them. Funding may no longer be available; required resources may not be available; and underlying assumptions are wrong. Steps are taken to reduce exposure to risks and mitigate risks should they occur by creating a risk management plan. A risk management plan identifies fail points in a project. A fail point is a situation where something can fail and jeopardize successful completion of the project. The project manager identifies fail points by examining each facet of the project and assessing the probability that what is in the plan will be realized. A fail point typically falls into one of several categories of risk. These are: –– Inherent Risk: Inherent risk is a natural risk that exists in any project. For example, key employees are unable to work due to illness. –– Business Risk: Business risk is a risk associated with doing business. That is, the business falls under adverse market conditions. –– Detection Risk: Detection risk is a risk that occurs because the clinical project manager or an advisor uses faulty procedures or bases a decision on unknowingly false information. For example, duration for a task is set based on invalid historical data.
22
Chapter 1: Talking Intelligently About Technology
–– Technological Risk: Technological risk is the chance that a key system will fail, such as the infrastructure being unable to handle the data load. –– Operational Risk: Operational risk happens when the business logic is faulty, resulting in unexpected outcomes from the project. –– Residual Risk: Residual risk is a risk that remains after the clinical project manager has mitigated other risks. The response to the risk assessment is based on the risk tolerance of the project manager and the organization. In a highly profitable organization where there is a sustainable growth in revenue, the organization has a high risk tolerance level since there is revenue to cover failures compared to less financially strong organizations. There is less risk tolerance in organizations that are facing market challenges. There are four common responses to a fail point. These are: –– Acceptance: The project manager acknowledges and ignores the risk. This is a common response for fail points that have a low probability of occurrence or require a large effort to mitigate or avoid. The decision is to take your chances rather than take a proactive stance to address the risk. –– Mitigation: A fail point that might have a relatively low chance of occurring but has a major impact on the project if it does occur, should not be ignored. Instead, a reasonable effort can be made to do something that lowers the probability that the fail point will occur. This is referred to as mitigation. Mitigation lowers the probability of occurrence by taking a proactive role in trying to prevent the occurrence. –– Transference: A risk can be moved to another party, thereby limiting the risk to the clinical project manager. For example, the clinical project manager may decide to outsource a portion of the project. Risk associated with fail points associated with that portion of the project are transferred to the outsource firm. Of course, the risk still exists and an occurrence may impact the project, which is something that the clinical project manager must address. –– Avoidance: The project manager may decide that the fail point has a high probability of occurring and therefore an effort must be made to remove the fail point from the project. This is typically addressed by redesigning the project to avoid the fail point. The risk management plan identifies risks and states how the risk is managed: a fallback plan, should the risk occur. The objective is to devise solid fallback plans prior to a failure. Fail points are clearly identified and a response predefined. If a failure occurs, the risk management plan is executed. This reduces the need to devise a response at the time the event occurs. For example, preliminary arrangements with an outsource firm to take over a key component of the project are made in advance and implemented should resources be unavailable to work on the project. If resources
Your Idea Is Delivered
23
become unavailable, the project manager calls in the outsource firm. There is no scrambling trying to identify and evaluate options when the event occurs.
Your Idea Is Delivered Many tests happen before the project team delivers the product. The objective is to be sure that the deliverable meets minimum requirements (specifications stated in the project charter) and anticipated expectations (stakeholders’ expectations). The deliverable undergoes a set of progressive tests to ensure that it hits the mark. These tests are: –– Unit Test: A unit test, sometimes referred to as a component test, determines if a functional piece meets specifications. For example, a fuel pump is a component (unit) of an automobile. The fuel pump is assembled and tested (unit test). If it passes, then the fuel pump is installed on the engine and the entire engine is tested. –– Regression Test: A regression test is performed to determine if the functional piece changes other functional pieces that have already successfully been tested. –– Integration Test: The integration test tests all components—existing and new—to determine if components work well together. –– Stress Test: The stress test determines how all components perform under higher than expected usage to determine at what point they will stop functioning: the failure point. Results of the stress test are later used to monitor operations to ensure that performance doesn’t reach the failure point. –– Quality Assurance Test: The quality assurance test determines if the deliverable meets standards of quality established by the organization. An internal group of quality testers or a vendor tries all possible scenarios for using the deliverable including scenarios that are unlikely to occur. The goal is to find faults. –– User Acceptance Test: The user acceptance test determines if the deliverable is acceptable to the project sponsor and stakeholder (see the “acceptance process” outlined below). –– System Test: The system test is the final test performed before the deliverable is used in production. The goal is to ensure that the deliverable works with other components that are live. –– Installation Test: The installation test occurs immediately after the deliverable is installed in the production and before the operation goes live. The goal is to uncover any problems that may prevent the organization from using the deliverable. The organization reverts back to the existing system if the installation test fails, at which point the project team addresses problems that prevented the deliverable from going live.
24
Chapter 1: Talking Intelligently About Technology
The acceptance process is the way that the project sponsor and stakeholders verify that the project deliverable meets expectations and terms that were specified in the project charter. The project charter specifies objective acceptance criteria that can be measured during the user acceptance test. The actual user acceptance test is developed shortly after the project is launched and before development begins. The objective measurements are used by the project team throughout development to ensure that specifications and expectations are met. Any defects are identified are fixed long before the project sponsor and stakeholders perform the acceptance test. Once the deliverable is accepted, the project manager develops a migration plan to move the deliverable into production. The migration plan consists of: –– Conversion of existing data, if necessary –– The effect migration will have on the organization –– Operational constraints, including the effort required to migrate the system –– Deciding criteria for a successful migration –– Developing and implementing the installation test –– The level of business disruption that is acceptable –– The latitude to adjust the go-live date The project manager works with the project sponsor and stakeholder to decide on the best migration strategy for the organization. Migration strategies are: –– Gradual: A gradual migration occurs when the deliverable is introduced to a portion of the organization at a time, usually beginning with the unit who would be least affected should problems arise. –– All-at-once: The all-at-once migration strategy requires the entire organization to use the new deliverable. The old system is turned off and the new system is turned on. All data is converted prior to the changeover. –– Hybrid: The hybrid strategy is to require the entire organization to use the new system, however, only recent data is converted to the new system at launch. Older data is gradually converted to the new system. –– Parallel: The parallel migration strategy requires that both the old and new systems work at the same time. New data is entered into and processed by both systems. The project manager and the project sponsor compare results. Once the project sponsor is comfortable that the new system is producing acceptable results, the old system is turned off. Once the deliverable is implemented, the project manager and the project team close out the project. The team reflects on success and failures, referred to as lessons learned, that are used to better manage future projects. Project documents are archived, to be used as references for revisions for a new project.
Simple Project: No Need for Complex Project Management
25
Simple Project: No Need for Complex Project Management Not all projects are large and complex. Many are relatively small, routine projects that require a relatively small project team. The goal is set and the project team determines tasks, subtasks, duration, and resources necessary to perform the task without a detailed formal project plan. The methodology most often used for managing small projects is called agile project management and is a way to organize and manage small- to medium-sized ones that require relatively small project teams of specialists who require general direction from the project manager. Agile project management focuses on small batches of tasks that require intense collaboration of team members and stakeholders in face-to-face communication. The team is self-organizing. That is, the team sees the problem and determines what has to be done and who should solve the problem. Each team member has a primary skillset and one or more secondary skillsets. Team members are cross-functional and can step into multiple roles as required. The goal is to deliver a small batch of tasks and then continuously improve those batches to deliver quality results in a short timeframe. There can be multiple agile teams, each working on different sets of items that overlap or require integration. In this situation, representatives of each team form a cluster. The cluster addresses mutual issues. The project manager provides light-touch leadership. Each team member is empowered to learn about a problem, determine a solution, and engage stakeholders directly without experiencing the structure and constraints familiar with complex projects’ project management. Conversations between a team member and a stakeholder likely elicit new needs that can be brought back to the team and addressed based on the priority of the need. The objective is to develop small features at a time from a work list called a focus board. The small feature is ready to use, which is referred to as a product-ready deliverable. Small features are continuously integrated, addressing the larger needs of stakeholders.
The Focus Board Stakeholder needs are identified and entered into a focus board. The focus board is placed in a common area that every team member can view. The focus board is divided into three columns: To Do, In Progress, and Done (see Table 1.2). An entry on the focus board is called a story, referred to as a churn. A story is a brief narrative that describes a need, such as to install a downtime computer/printer as illustrated in Table 1.2. No tasks or resources are listed on the focus board. The project team knows what needs to be done and how to do it. The team acquires additional resources as needed to address the need. The team talks with stakeholders to determine priority and then works on the story with the highest priority.
26
Chapter 1: Talking Intelligently About Technology
Table 1.2: The focus board contains stakeholder needs. To Do
In Progress
Done
Install a downtime computer/printer
The Scrum Agile project management uses the rugby approach. In rugby, the game is restarted after each minor infraction using a scrum. Initially, stakeholders describe a need in general terms. Requirements are not fully defined. Stakeholders are likely to change the need throughout the project. The project team begins with a scrum—working on the initial request. When a change occurs, the project team stops, reassesses, and starts another scrum. The team’s focus is on the fast delivery of emerging requirements and then modifying delivered results to conform to the new requirements.
Sprint A sprint is the basic unit of development and has a fixed duration such as a week, which is referred to as a time box. Sprints can be as long as a month. The priority item on the focus board is worked on during a sprint. If requirements are not completed, then the item is returned to the focus board and becomes a focus of another sprint. Priorities can change between sprints, requiring the team to focus on completing an item different from the incomplete item of the previous sprint. The sprint-planning meeting is an eight-hour-long meeting held at the beginning of a sprint. The first four hours are spent reviewing work completed and presented to stakeholders and work that remains incomplete. The team determines what went well during the sprint and things that need to be improved during the next sprint. The backlog of items on the focus board is their priority. The remaining four hours are dedicated to planning the next sprint—to identify work has to be done and work that is likely to be accomplished during the sprint and the amount of effort required to perform the work. Each team member agrees to an assignment based on the priorities and then the sprint begins.
Simple Project: No Need for Complex Project Management
27
Daily Scrum The project team meets daily at the same time in the same location to hold a team meeting called the daily scrum. The meeting lasts no more than fifteen minutes and everyone stands at the meeting. Each member is asked three questions. These are: –– What have you done since yesterday? –– What are you planning to do today? –– Are there any impediments in your way? The focus is to place the responsibility for planning and accomplishing daily activities on each team member. Each team member is encouraged to ask for assistance when something impedes the team member’s capability to deliver the result.
Backlog Grooming Backlog grooming is the process of reviewing items on the focus board—by the team within an hour. The review process involves dividing large stories into smaller stories and refining the requirements’ stakeholder to create more manageable stories. Stories are not broken into tasks and subtasks.
The Project Team The agile project team consists of: –– Scrum master: The scrum master facilitates the scrum, removes impediments to completing stories, and keeps the team focused on sprints by enforcing rules and buffering the team from distractions. Although the role of the scrum master has similarities to that of a project manager or team leader, the scrum master does not take on a leadership role. –– The Team: The team consists of up to nine members who have cross-functional skills and are responsible for delivering work at the end of each sprint. –– Project Owner: The project owner represents stakeholders who requested the work. The project owner writes stories, prioritizes stories, and adds stories to the focus board. The project owner also ensures that the team delivers value to stakeholders. –– Stakeholder: A stakeholder identifies a need for the organization and requests that the need be addressed by the project team. The stakeholder receives deliverables and either accepts and implements the deliverable into the organization’s workflow or rejects the deliverable, requiring the project team to rework the item. The stakeholder has no formal role in the project and is rarely involved in the agile process.
28
Chapter 1: Talking Intelligently About Technology
Extremely Quick, Changing Needs A problem may arise urgently with little time to develop a solution, requiring a team to quickly assemble and immediately respond to the problem. Such a situation happened when a fire destroyed a part supplier’s factory that provided three key parts to General Motors, Fiat Chrysler, Mercedes-Benz, and Ford Motor Company, and halted production of its most profitable vehicles. Real-time planning is used to react to a problem that occurs suddenly and has a mission-critical effect on the organization and effects that jeopardize the sustainability of the organization. Real-time planning is a technique for modifying plans based on the current situation. It was championed by General Norman Schwarzkopf during Desert Storm when he said, “Timing is everything in battle, and unless we adjusted the plan, we stood to lose the momentum of the initial gains.” The challenge is how to respond to unexpected events not covered in the operational plan. General Schwarzkopf determined that the higher in the chain of command, the less knowledgeable a person is about the unexpected event and the less reliable the person’s input into the decision on how to address the issue. Traditionally unexpected events are analyzed and a potential solution is proposed to the change management committee, who determines if the change should be implemented. General Schwarzkopf questioned the value of the chain of command being involved in every unexpected event and solutions to address the event. The change management process delayed addressing the event. Furthermore, the remoteness of the change management committee to the event meant that they did not add any value to the decision. Their involvement did not change the decision that would have been made by those closer to the event. Real-time planning requires meetings between the project manager, the project team, and relevant stakeholders to review current events. The meeting is short and focuses on the managerial issues related to the changing event. The goal is to keep everyone updated on changes, not how to address change.
Rapid Planning Rapid planning occurs when little is predictable about the situation. This can be used with agile project management or any project where there are unknowns that are identified during the development process. Initially the project team and appropriate stakeholders gather to review changes and collectively determine the first task. The task is performed and the result is reviewed. If the task is successful, then the next logical task is added to the plan and is performed. If the task is unsuccessful, the gathering decides on a different direction or task. These concepts are referred to as planning, de-planning, and re-planning.
The Design Process: Sketch the Idea
29
This is similar to how we find our way to a new destination (before GPS). There are three roads. Pick one, and drive down the road looking for indications that you are on the right road (planning). After a mile or so, you determine this isn’t the right road. You backtrack (de-planning) to the starting point and then pick another road (re-planning). A systematic way to plan, de-plan, and re-plan is to use the OODA loop model. The OODA loop model is used to support rapid planning (see Figure 1.2) by defining four stages in the decision process. These are: –– Observe: The initial step is to gather current information from as many sources as are available within the short timeframe needed to make the decision. –– Orient: Next, analyze the information to determine if the information changes your current sense of reality related to the situation. –– Decide: Next, determine the next action that needs to be taken. –– Act: Act on your decision. The action provides feedback, and then you return to the beginning of the process, and start over. –– This type of process, where the process repeats is referred to as a loop. Loops are quite commonly used in programs as simple ways to handle repetitive (iterative) processes where the computer truly excels. More on loops in programming can be found in Chapter 4. Observe
Orient
Decide
Act
Feedback Figure 1.2: The OODA loop is a model used to support rapid planning.
The Design Process: Sketch the Idea Your idea needs to be fleshed out before a team of engineers can begin to transform the idea into reality. The idea can be sketched using several analysis tools that help engineers identify and understand details. A data flow diagram is one of those tools. A data flow diagram, also known as a context diagram, is used to analyze a work package by depicting a workflow (Figure 1.3). Symbols are used to define elements of the workflow. Here are symbols commonly used in a data flow diagram. –– A circle is a process –– An arrow is a data flow –– A rectangle is an element external to the system –– Parallel lines are used to indicate a place to store data called a data store
30
Chapter 1: Talking Intelligently About Technology
Figure 1.3: This data flow diagram shows the workflow for entering and processing an order.
A leveling diagram is another tool used to analyze a process (Figure 1.4). A leveling diagram is a form of a data flow diagram that defines levels of detail in the process. Each level is used to discuss the process at a specific level with stakeholders. The director discusses the processes at level 1, the most general level, and the operational manager discusses the process at a lower, more detailed level.
Figure 1.4: The leveling diagram describes further details of the ordering process.
Pseudo Code
31
Pseudo Code Fine details of each process are defined using pseudo code. Pseudo code is a stepby-step description of the logical flow of a process. Pseudo code represents, in plain words, what a computer programming language would implement in code that the computer will understand. The computer does not understand pseudo code, which is meant to be rewritten into a programming language for use on the computer. Programming languages are roughly characterized as compilers and interpreters. In an interpreted language, the computer interprets the programming code as it is read by the interpreter software and interpreted into the machine language (assembler) that the computer understands. Machine language is not terribly friendly, but many types of programmers need to understand it. Somewhat similarly, in a compiled language, the computer takes multiple lines of code and “compiles” the code. This gains speed and so most large, fast applications are written in compiled languages such as C, Java, and so on. A C compiler or a BASIC interpreter refer to the programming language software, which converts the program to machine language. So, a C compiler will run a C program on your computer. A BASIC interpreter will run a BASIC program. Today, many languages compile and interpret depending on the task at hand. Both eventually convert a program into the machine language that the computer understands.
Using pseudo code, a system analyst walks stakeholders step-by-step through the process, enabling stakeholders to correct any misunderstandings and avoid logical errors. Pseudo code contains steps. Steps can be: Sequential: Prompt the user to enter a user ID Prompt the user to enter a password Validate the password IF-THEN-ELSE statements: If the medication is due within an hour then Display a yellow indicator Else If the medication was due an hour or more ago then Display a red indicator Else Display a green indicator Iteration (loop): Until the patient is discharged Determine if medication is due for the patient End Until As discussed before, loops in computer programs are iterative; that is, they repeat a process until a condition is reached, or satisfied. In the example above, until the
32
Chapter 1: Talking Intelligently About Technology
patient is discharged, someone or something is to determine if medication is due for the patient, over and over again. In some cases, a loop may never reach the condition, and so is an infinite loop, and theoretically would continue forever.
Object Oriented Design and Programming Pseudo code is the basis for programmers to design and program the application. Object oriented design is a common design technique used to translate pseudo code into a working application. Object oriented design is simple yet at times challenging to understand. Object oriented design uses a natural approach to designing. That is, we see things as objects that have certain detentions and functions. It makes sense that an application is designed the same way as we see things. An object is a thing such as a car. Each object has information that describes the object. For a car, this information includes make, model, color, dimensions, and other types of information that tells you about the car. This information is referred to as attributes of the object. Each object has functionality. A car, for example, turns on/off, moves forward/back, turns left/right, doors open/close. All of these and more are functionalities of a car. Functionality is referred to as a behavior of an object. Behaviors are also known as functions and methods. A programmer defines an object within a program by creating a class. The class has attributes and behaviors. The attribute portion of the definition describes the type of data (information) associated with the object. For example, the programmer creates a class called car. The class definition it states that the car must have attributes of make, model, color, dimensions and other data familiar with a car. Each behavior is defined based on pseudo code that describes how the behavior works. For example, turning the key far right causes the starter to start the car. Turning the key far left disrupts the flow of fuel to the engine and the engine turns off. Each of these behaviors are described in sufficient detail to have the program in the car to actually start the car and turn off the car. Associating attributes and behaviors with a class is called encapsulation. This is a natural way we see things. When we think about starting a car—we know we need a car. Likewise, when we say the car seats five people—we know we need a car. We never separate starting a car or maximum occupancy of a car from a car. We can say that attributes and behaviors are all encapsulated within the class that we call a car. Once the class is defined, the programmer can create as many copies of the class in the program as needed. This is referred to as instances of a class. Let’s say the programmer is creating computer animations that displays cars on the screen. The programmer defines a class called car that contains specific attributes and behaviors. The programmer then creates one car within the program—this is an instance of the class car and there is one instance for each car that will appear in the animation.
Object Oriented Design and Programming
33
Attributes for each instance (each car) can have different values (i.e. makes, colors, dimensions) depending on the type of car that is required by the program design. This enables the programmer to create cars of different makes, models, and sizes. Some behaviors are the same for each instance (each car) because each car has the same behaviors (engine on/off, doors open/close). However, other behaviors such as image of the car on the screen, acceleration, stopping distance and similar actions that are affected by the make, model and size of the car are unique for the instance (car) based on attributes of the instance (car). For example, a heavier car requires more time to stop than a lighter car. The behavior that stops the car is modified based on the weight of the instance of the car. The concept that a class can appear in different forms is referred to as polymorphism. For example, the class car can appear as coupe, hatchback, sedan, convertible sport utility vehicle—and each made by different manufacturers—yet each is a car. Attributes and possibly behaviors are similar to each car, but are modified for a specific make and model car. That is, they take on different forms. In the real world an object is frequently composed of other objects. A car, for example, has an engine, transmission, wheels and other components that are in themselves objects—each having its own attributes and behaviors. Just like components of a car are put together on an assembly line to create a car, the programmer creates classes for each object (engine, transmission, wheels) and assembles them into the class called car. The class car is said to inherit classes engine, transmission, wheels and other components of the car. And similar to designing a real car, component classes (engine, transmission, wheels) are designed and programmed by different programmers. For example, a programmer focuses on designing an engine (actually converting pseudo code that describes the engine to designing a class call engine). The same process occurs for each component. The programmer who designs the car (defines the class called car) often assembles the design from other classes that may have been created by others (for example, the engine, transmission, wheels). The programmer who is designing the car only needs to know that an engine is required—not how the engine works. That programmer needs to know how and when to start/stop the engine. The programmer who designed the engine is responsible for what happens to the engine when the start/ stop engine commands are executed. This use of other classes is referred to inheritance where the car inherits attributes/behaviors of other objects (classes). A key element of inheritance is that changes made to a class flow through to classes that inherited the class. For example, changes made by the programmer to the engine class flows through to car classes that inherited the engine class. Only the programmer who created the engine class needs to know how the change is implemented. The programmer who designed the car class isn’t concerned because the change flows through to the car class.
34
Chapter 1: Talking Intelligently About Technology
Inheritance, polymorphism and encapsulation are traits of object-oriented programming that make programming much more flexible, simpler, and they enable designs where different parts of the program can be effectively developed at the same time, thus large-scale projects where one programmer could not possibly handle it all can be developed efficiently and professionally. Object-oriented design and programming is more involved; however, this overview gives you a good idea of how it works.
Data Capture Data capture is another step in the design process that stems from pseudo code. Data capture describes how an application will acquire data. Primarily data capture focuses on the screen of the application and devices used to capture data. These devices are referred to as input devices and include the keyboard, mouse, touch screens, bar code reader, smart cards and a biometric reader. Design of the data capture screen follows a well-accepted rule of thumb: Keep the number of clicks to perform a task to a minimum. The data capture screen should: –– Only capture data that cannot be calculated or looked up. This is called variable data. –– Highlight errors on the screen so the user can easily see and fix the problem such as undo a task. –– Use a menu bar containing common functions that can be quickly selected –– Enable short cuts to functions such as a right click popup menu. –– Provide navigational and procedural help—where to find something—how to do something. –– The flow of data entry should be logical. –– Make the screen appear similar to the paper form currently in use. This is referred to as a metaphor for a screen and reduces the need for training. Users can intuitively navigate the screen because the screen design is familiar. –– Use appropriate graphical user interface controls to for data entry. Most controls are familiar. –– Validate all data—if possible. –– Make sure required data are entered. –– Enforce data format. –– Make sure that data makes sense by checking for an acceptable data range. Let’s say that a typical order is never greater than $1,000. The application should automatically question orders that are greater than $1,000. The programmer creates prototype screens based on the pseudo code and object-oriented design. A prototype screen is a working screen of the application that stakeholders can test and recommend changes.
Chapter 2 Talking Intelligently About Communication Networks It was the eureka moment. On vacation in Paris a marketing executive stumbled across a unique display design in a shop and snapped a picture, attached it to a text message, then sent it to the marketing team in the States who was preparing a new marketing strategy for the annual meeting in London the following week. The next few hours were spent sipping coffee at the Coutume Café while having a group text discussion on how to incorporate the design into the new strategy. The focus was on the design, not the technology. No one gave a second thought on how the marketing executive and the team switched gears in seconds, effectively collaborating on a new strategy separated by thousands of miles and an ocean. And this was more than texting. Slides for the presentation were updated and shared during the impromptu meeting. Several cups of coffee later, the strategy and presentation were revised and approved. Communication network technology made the eureka moment possible by providing the means to exchange text and pictures practically anywhere in the world in a blink of an eye. Here’s what wasn’t seen during the impromptu meeting. Each text character entered into a cell phone is digitized then assembled into an electronic envelope and sent—with hundreds of thousands of other people’s messages—along a complex communication network of cables, radio transmitters, and routers to each member of the marketing team, arriving within a fraction of a second where it is converted to text and displayed on the screen. Each picture element is also digitized, placed into an electronic envelope, and follows the pathway similar to the text. The communication network is really a network of communication networks, many owned and operated independently yet able to talk with each other without impairing the flow of messages. The same basic communication network technology used in cell phone transmission is also used to share computer and voice information at home and in the office. You know this as Wi-Fi, the internet, the intranet, or simply the computer network. Communication network technology is the backbone of every business and knowing how it works gives you the foundation to incorporate communication network technology into strategies that let you out pace the competition.
The Neighborhood of Networks Let’s begin by exploring a typical office communications network. A communications network consists of both physical and wireless connections among network devices much like roadways connect houses and communities together. A network device is any device connected to the network—such as computers, but also includes printers,
DOI 10.1515/9781547400812- 002
36
Chapter 2: Talking Intelligently About Communication Networks
scanners, copiers, telephones, and even light switches and refrigerators—that can be accessed remotely. Each network device has a unique address called an Internet Protocol (IP) address similar to the street address of a house. Information such as text is digitized, then placed into an electronic envelope called a packet. The packet is addressed to the destination network device’s IP address and sent along the network to the destination network device where the packet is opened; the digitized text is extracted, decoded, then processed by the network device and placed on the screen. The office communications network is comprised of smaller communications networks called subnets. Think of the office communications network as a group of towns and a subnet as your town. A subnet and network devices on the subnet are uniquely identified by the IP address much like how a town and a house in the town are identified by a combination of postal code and street address such as 07660-55 Smith Street. The postal code of the town is 07660 and the address of the house is 55 Smith Street. Likewise, the first portion of the IP address identifies the subnet and the latter portion identifies the network device. Each subnet has a router. A router is a network device that redirects packets to the destination network device. Think of a router as a post office. A packet is sent to the network router. The router examines the destination IP address in the packet. If the destination is within its subnet, then the packet is delivered to the corresponding network device, otherwise the packet is sent to the router of the appropriate subnet for processing. Think of this as dropping off a package to your local post office. The postal clerk looks at the postal code. If it is your town’s postal code, then the postal clerk has the package delivered—otherwise, the package is sent to the post office in the town with the corresponding postal code for delivery. Subnets that comprise your office communication network are connected by routers much like how regional post offices link together town post offices to form the postal network. Packets are sent along the shortest path to their destination—the town post office to the regional post office to the town post office that corresponds to the postal code on the package. There are times when the shortest path is blocked, such as in a traffic jam of other packets or when the router isn’t working. The packet is then diverted to the next best pathway to the designated network device and takes a detour to the next available router. Say that the closest regional post office (router) is unavailable when the package arrives, so the mail truck is rerouted and delivers the package to the next closest regional post office for processing. Each router (regional post office) that processes or attempts to process a packet is referred to as a hop. The goal is to deliver the packet with the fewest number of hops—the more hops, the longer it takes to deliver the packet. No doubt, sometimes your office communication network is slow; it takes forever to communicate with the printer or to send email. The time it takes to receive a response is referred to as response time. Poor response time is usually caused by problems with the communications network. A router malfunctions and packets are rerouted (many
Converting the Message
37
hops) which is usually a sudden, short-term problem rectified by fixing the router. One or more network devices on your subnet might be sending or receiving lots of packets; more than the router or the network cables can handle—simply a traffic jam. This usually gradually develops, the problem takes time to recognize, and the fix is to move you or users causing the problem to a different subnet—like moving to a different town. No one really moves. The network device is given an IP address of another subnet. Network devices including routers are connected primarily by network cables and secondarily by Wi-Fi. There are three types of cables commonly used for communication networks. These are twisted pair, coaxial, and fiber optic cables. Twisted pair cables are thin cables you’ll find connected directly to most network devices (your telephone). Coaxial cables are thicker cables connected to some network devices (cable TV). Fiber optic cables are comprised of glass-like fibers that connect to high-traffic network devices (internet cables), although fiber optic cables are replacing coaxial cables. Think of network cables as roads in your town. Outside your house is a single-lane road that can handle one car at a time (twist pair cable), which is fine because there isn’t a lot of traffic. Single-lane roads lead to roadways with more lanes to carry more traffic (coaxial cable). And superhighways have lots of lanes to handle high-speed traffic (fiber optic cable). Cables have logical—not physical—lanes of traffic and those lanes have a similar effect on network traffic (packets) as roadway lanes have on automobiles. The number of lanes is referred to as the bandwidth of the cable. Narrow bandwidth cables (twisted pair) have fewer lanes than broad bandwidth cables (fiber optic cable). The more lanes, the faster packets (traffic) move. The office communication network has a special subnet commonly referred to as the backbone. The backbone is a superhighway that connects network devices to data centers that contain mission critical applications and services necessary to keep the organization operational. Requests for data from the organization’s database travel from your subnet along the backbone to the organization’s database application. If the backbone subnet slows down, so do business operations.
Converting the Message Now that you have a general understanding of communication networks, let’s take a closer look at how characters entered at the keyboard or in a cell phone are transmitted over the communication network. The initial step is to translate characters into a number that is converted into a format that can be stored in a network device and sent over cables and Wi-Fi. This is referred to as digitizing. Each letter, number, and symbol on the keyboard is assigned a number by a standards body such as the International Organization for Standardization (ISO), the American National Standards Institute (ANSI), and the Unicode Consortium. A standard is an agreed upon way of doing something that enables technology developed by different firms to interact with each other. The Unicode Standard is the coding
38
Chapter 2: Talking Intelligently About Communication Networks
system used to encode letters, numbers, and symbols on the keyboard. You really don’t need to know this standard, but seeing how the coding system works helps to understand how your message entered into the keyboard is transmitted. Table 2.1 shows the numbers assigned to the more common numbers, characters, and symbols on your keyboard. There are more than a million possible numbers—enough to represent practically every character in every language in the world. Table 2.1: Numbers used to represent numbers, letters, and symbols on the keyboard Decimal Value
Character
Decimal Value
Character
Decimal Value
Character
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
SPACE ! “ # $ % & ‘ ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
@ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Notice that numbers are specified as a decimal value. A decimal is a numbering system that you learned in elementary school that contains 10 digits from 0 to 9. After
Catching a Wave
39
9 we carry over one place—we call it 10—and then start counting again from zero in the rightmost digit. There are other numbering systems; binary is one of them. The binary numbering system contains 2 digits: 0 and 1. And like the decimal numbering system we carry over one place when we go beyond 1—the value looks like the decimal number 10 but it isn’t. The decimal value is 3. The count is 0, 1, then carry over to the next place. You can perform math using any numbering system, which is fundamentally how computers process data. Stop! Let’s leave numbering systems here because they become unnecessarily confusing. Simply remember that every number, letter, and character on the keyboard has an equivalent numeric code that can be written as a decimal number and a binary number. The binary number representation is critical to encoding numbers, letters, and characters because each binary digit (bit) is either a 0 or a 1 and can be represented as off or on. Let’s see an example to better understand this principle. The capital letter “J” is assigned the number 74 (decimal value). This is 1001010 in the binary number system (trust me on this). Say that you want to “send” the letter “J” to your neighbor who lives across the street. You can do so by using a flashlight. Turning the light on is a 1 and turning the light off is a 0. Your neighbor needs to write down the sequence of the flashes of light as 0s and 1s and then convert the binary value to decimal value and look up the decimal value on the code sheet to identify the letter. Table 2.2 shows the name Jim encoded in decimal and binary numeric values. Yes, this is more work than necessary, but this illustrates how any character on the keyboard—and characters not on the keyboard, called control characters—can be transmitted. A control character is a character you don’t see but is inserted into the text to help format the text such as indicating a new line. Table 2.2: Here the word “Jim” is encoded in decimal and binary numeric values.
Jim
Decimal Values
Binary Values
74 105 109
01001010 01101001 01101101
Catching a Wave Imagine a still body of water interrupted by a stone dropped in the water, causing the formation of a wave. The energy of the stone striking water molecules causes each water molecule to push its neighboring molecule. This appears as a wave motion over the water that continues until the energy dissipates—water molecules stop pushing, causing the body of water to be still again.
40
Chapter 2: Talking Intelligently About Communication Networks
Your message is transmitted using waves—not water—of electrons or photons (light particles) in the case of fiber optic cables. A wave has properties, regardless if it is a wave of water molecules, a wave of electrons, or a wave of photons. A wave has a baseline—there is no wave. This is the body of still water. The height of the wave above the baseline is called the amplitude. The number of waves that occur in a second is called the frequency (Figure 2.1). The no-wave state (baseline) and the wave state (amplitude) are used to represent characters in your message once it is encoded into binary numbers. A zero value is assigned to the baseline and a one value is assigned to the amplitude of the wave. By controlling when the wave is at baseline and at its amplitude, the message can be transmitted as 0s and 1s on a wave.
Figure 2.1: The baseline is where there is no wave action. The amplitude is the height of the wave above the baseline. The frequency of the wave is the number of waves that occur per second.
The frequency of a wave determines the wave’s characteristic and is defined in the electromagnetic spectrum (Figure 2.2) where waves are grouped into bands of frequencies, each having similar characteristics. Waves in the audio frequency band can be heard by humans. Waves in the radio band can travel 360 degrees and through walls. Waves in the microwave band travel in one direction. Waves in the light wave band travel approximately 90 degrees and cannot go through most objects. Although wave bands have their own characteristics, waves have the same properties of amplified frequency and, therefore, can be encoded with information. A transmitter is a device that generates a wave at a specific frequency. The power applied to generate the wave determines the distance that the wave travels. A receiver is a device that detects a wave of a specific frequency. These functionalities are combined into one device called a transceiver. The circuits that connect your computer to a network are transceivers, as is your cell phone. They can send and receive waves at a specific frequency.
Analog and Digital Waves
41
Figure 2.2: Waves of similar characteristics are defined as bands in the electromagnetic spectrum.
Analog and Digital Waves Some waves fluctuate above and below the baseline. These are called analog waves (Figure 2.3). The height of the wave (amplitude) above the baseline is called positive voltage and the wave below the baseline is called negative voltage (going in the opposite direction). Voltage is the force used to push electrons. Fluctuations are caused by alternating the current. Information is encoded on an analog wave by designating the negative voltage as zero and the positive voltage as one.
Figure 2.3: An analog wave fluctuates between being above and below the baseline.
A digital wave is another type of wave used to transmit information, and is sometimes referred to as a square wave because the digital wave appears to have corners. A digital wave has only positive voltage, no negative voltage. Information is encoded on a digital wave by designating the baseline as zero and the positive voltage as one (Figure 2.1).
42
Chapter 2: Talking Intelligently About Communication Networks
The analog wave is the foundation of communication transmission because it can easily be generated using electricity and can travel over long distances. Telephone communication uses analog waves. Waves generated by vocal cords cause a microphone element in the mouthpiece of the telephone to vibrate. The microphone element is attached to an electromagnet that causes an electrical current to fluctuate in synchronization with the audio. The electrical signal is transmitted to the receiver where the electronic signal causes the electromagnet in the earpiece of the telephone to vibrate, an element that generates sound waves that can be heard. Although an analog wave can transmit over long distances, other analog waves that have similar frequencies can disrupt the wave. This is referred to as interference. Digital waves are not prone to interference because the wave is discrete—it is either there (amplitude) or not (baseline). However, a digital wave cannot be transmitted over long distances, which is why analog waves are commonly used to transmit information. Interference common in traditional voice communication is dramatically reduced by digitizing information—including voice—before it is transmitted over the analog wave. Information is encoded as 0s and 1s and then represented on the analog wave. The receiver is looking for positive or negative voltage and not sublet voltage changes that can interfere with other analog waves.
At the Speed of Light No one wants to wait. Even the slightest delay in receiving information is frustrating. You expect to see any movie immediately on any device regardless of your location— even walking the street. The speed of information across a communication network is a critical component in meeting the expectation of nearly instant transmission. Speed is measured in the number of bits per second that is successfully transmitted over the communication network. This is referred to as the data transfer rate. A bit is a binary digit (0 or 1) that represents the encoded information. The data transfer rate is influenced by many factors, including the type of network transmission media (twisted pair, coaxial, fiber optic) used to create the communications network; the length of the communications network; and the network devices. Engineers need to know these details—you don’t. You do need to know that fiber optic cables transmit at the speed of light, over long distances, and with little interference. Light waves are transmitted over thin tubes of glass isolating the light signal from ambient light. Coaxial cables transmit data over a solid core of copper at less than the speed of light, over long distances, with the risk of interference from electronic waves of similar frequencies. Twisted pair cables use thin strains of copper wire that transmit at a slower rate than coaxial and fiber optic cables, and are also at risk for interference. (Twisted pair and coaxial cables are enclosed in a coating that reduces the risk of interference.) Communication networks are a combination of twisted pair, coaxial, and fiber optic cables based on the transmission requirements and the cost of installing and maintaining the media.
From the Keyboard to the Communications Network
43
From the Keyboard to the Communications Network As you can probably imagine, there are lots of instructions to follow to send a message across a communications network. Engineers handle details of transmission; however, a general knowledge of these steps helps you speak intelligently about communication networks. Instructions are grouped into steps called layers. Each layer handles a specific aspect of the transmission. Layers are defined in the Open Systems Interconnection (OSI) model developed by the International Organization for Standardization and widely used by nearly all networks. The OSI model is organized into seven layers—each layer prepares the message for the next layer in the transmission process. Layers are divided into two sets: the application set and the transport set. The application set prepares the message created by an application (such as Outlook) for transmission over the communication network. The application set also prepares the message when it arrives at the destination network device so the message can be available to the application. The application itself (Outlook) decides what to do with the message, such as displaying the message on the screen. The transport set focuses on transporting the message over the communications network. The application set consists of three layers numbered from the highest to the lowest: –– Layer 7 Application: The application layer prepares the message from the application to be transmitted over the communication network when selecting the “send” button in Outlook. –– Layer 6 Presentation: The presentation layer converts the message into a format that is understood by other layers in the OSI model. –– Layer 5 Session: The session layer establishes communication with the receiving network device and maintains communication until the message is delivered. The transport set consists of four layers. These are: –– Layer 4 Transport: The transport layer controls the flow of data and determines if there were any transmission errors. The transport layer also integrates data from multiple applications into a single stream of data. –– Layer 3 Network: The network layer determines the way the data is sent over the network (IP address). –– Layer 2 Data: The data layer defines the network type, packet sequencing (the order in which packets are sent), and physical network protocols (rules) to use for the transmission. –– Layer 1 Physical: The physical layer is the hardware that controls the timing of data, connections, and voltages.
44
Chapter 2: Talking Intelligently About Communication Networks
Protocols A protocol is an agreed upon way of doing something. Saying hello when answering the telephone is a protocol for answering the telephone. There are protocols for transmitting over a communication network. Two such protocols are transmission control protocol (TCP) and internet protocol (IP). TCP/IP specifies how to communicate over the internet including data formatting, addressing, transmission, routing, and how the information is received at the destination. TCP/IP defines layers that correspond to some extent with the OSI model. These layers are referred to as protocol stacks. Each protocol defines how to perform an aspect of transmission so that the output of one process can be read as the input to the next process. Software engineers who make software that interacts with communications networks write instructions in the software that follow protocols. That is, the standards body states “here’s how it should work” and software engineers make it work that way. Here are the protocol stacks for TCP/IP: –– Layer 1 Network Interface: The network interface is a combination of the physical and data layers of the OSI model. This manages data exchange between the network and other devices. –– Layer 2 Internet: Internet is similar to the network layer of the OSI model and determines the address of devices on the network. –– Layer 3 Transport: Transport is similar to the transport layer of the OSI model as it is responsible for initiating communications with other network devices. –– Layer 4 Application: Application is a combination of the application layer, presentation layer, and the session layer in the OSI model and interfaces with applications that want to send or receive information from the network.
Packets Let’s take a closer look at packets to get a better understanding of how they are used to transmit information over the communication network. As you’ll recall, information is transported across the communications network in an electronic envelope called a packet. Don’t confuse a packet with email. They are two different things. An email message—like any information created by an application—is divided into pieces and placed in packets much like how a large order from an office supply vendor is sent in multiple boxes if the order doesn’t fit in one box. The one order is divided into pieces and placed into separate boxes. Boxes are then numbered such as 1 of 3; 2 of 3; and 3 of 3 so the order can be reassembled once delivered. Each packet may be a fixed size measured in bytes, just like a box has a fixed size. A byte is eight binary digits (bits) of 0s and 1s. A packet is also known as a frame, block, cell, or segment depending on the type of network. Each packet passes through
A Closer Look at Communication Networks
45
a router (see “The Neighborhood of Networks” previously in this chapter) that directs the packet through the maze of network connections to its destination IP address. This process is referred to as packet switching.
Figure 2.4: A packet is an electronic envelope that has three parts: the header, body, and trailer.
A packet has three parts (Figure 2.4). These are: –– Header: The header contains information about the packet and includes the packet length, the network protocol, destination address, the origination address, synchronization, and the packet number. The packet length is included because some networks have fixed or variable length packets. The network protocol is specified because some networks can use different types of network protocols (rules). Synchronization enables the packet to match the network. And the packet number is the number of the packets in the message (1 of 3, 2 of 3, 3 of 3) that enables the destination network device to reassemble the body of packets into the message. –– Body: The body is the pieces of information that are contained in the packet. Sometimes the body also contains padding if the piece of information is less than the fixed length of the body of the packet. –– Trailer: The trailer contains information that identifies the end of the packet. The trailer may also contain error check information called the cyclic redundancy check (CRC) that determines if part of the packet was lost in transmission. CRC is the sum of all 1s in the packet. The receiving computer totals the 1s and compares it to the CRC value. If they match, then there is a likelihood no error occurred. A mismatch indicates an error occurred and a request is made to resend the packet.
A Closer Look at Communication Networks Now that you have a general idea of how communication networks work, we’ll take a closer look at communication networks. There are two categories of network technologies. These are: –– Local area network (LAN): A LAN is network technology that connects network devices that are relatively close to each other, such as in the same facility. –– Wide area network (WAN): A WAN is network technology that connects a smaller number of network devices over a larger area, such as connecting two or more facilities together.
46
Chapter 2: Talking Intelligently About Communication Networks
The design of a communications network is called a topology. Two of the more common topologies are the star topology and the bus topology. In the star topology each network device sends packets to a central hub (the center of the star) where the packet is routed to its destination network device. In the bus topology using the Ethernet protocol each network device is connected to a central network cable and sends packets along the same central network cable as other network devices. Each network device examines each packet to determine if the packet is addressed to it. If so, the packet is processed. If not, the packet is discarded. Each network device uses carrier-sense multiple access with collision detection (CSMA/CD) to determine if there is traffic on the network cable. The network device “listens” to hear if the network cable is clear so it can transmit the packet and determines if the transmitted packet collided other packets on the central network cable. If so, the packet is retransmitted. The length of the communication network also affects transmission. The longer the distance the transmission travels, the weaker the signal becomes because there is insufficient power left in the signal to continue. Imagine the wave in a pond of water. The wave is strongest at the site when the rock is thrown into the pond. The wave diminishes greatly the further the wave moves away from the site. The same happens on the communications network where the signal is effective for a specific distance determined by the network cable and then dissipates—it runs out of power. For example, the signal over a coaxial cable is effective for 500 meters (about 1,640 feet). In order to extend the effective transmission distance, a repeater is installed on the communication network. A repeater is a device that receives the signal, and then retransmits the signal using its own power. Many repeaters are used to effectively extend the transmission distance of the communication network. Communication networks are divided into subnets (see “The Neighborhood of Networks” previously in this chapter). A router, as mentioned before, is a network device that connects together networks and lets different networks communicate with each other. Earlier in this chapter, a router was referred to as a post office because it routes packets to other network segments. A router is more than a post office since it contains a table (router table) that identifies the shortest pathway to a destination IP address and alternate pathways should the shortest path be blocked. The router also contains protocols for different types of communication networks enabling the router to route packets to unlike networks. A switcher is another network device that connects different network devices on the same network; however, a switcher, or simply a switch, does not connect networks. The internet is a network of networks—the Internet is composed of other networks mostly operated by major telephone companies referred as Tier 1 carriers. Tier 1 carriers create a circuit between two points on the network using an assortment of routers and cables. A circuit is a connection. It is common for a circuit to begin with a coaxial connection from a facility to a fiber optic connection that carries the packet to the Tier 1 carrier’s switcher. From there, the packet may travel to and from a satellite using microwaves, to a Tier 1 carrier’s receiving station, then over a fiber optics con-
Network Addresses
47
nection, to the coaxial cable that leads into the destination facility. The Tier 1 carrier’s network is commonly referred to as a packet-switching network because the primary purpose of the network is to redirect packets. A packet-switching network is designed with redundancy using a technique called load balancing to ensure packets reach their destination within an acceptable time period measured in milliseconds. If a delay is detected, the packet-switching network automatically reroutes the packet of a different circuit. Packet-switching networks are also used for telephone calls. The caller’s voice is digitized and divided into packets.
Network Addresses Each network device has two addresses: the physical address and the logical address. The physical address is called the Media Access Control (MAC) address that is permanently stored on the network device’s network interface card (NIC). The MAC address is six bytes. The first three bytes identify the manufacturer of the NIC and the last three bytes are the NIC series number. (Three bytes can hold a value of 16,777,215. Three bytes have 24 bits.) The combination makes the MAC unique. The logical address is the address on the network, which is dependent on the type of network in use. The most commonly used logical address is the Internet Protocol (IP) address. The format of the logical address is determined by the network protocol such as TCP/ IP for the internet. An IP address is either a static address or a dynamic address. A static address is a fixed IP address for the network device and doesn’t change. A dynamic IP address is assigned by a network program using the Dynamic Host Configuration Protocol (DHCP) each time the network device connects to the network. The assigned IP address is valid until the network device disconnects from the network. (You have a dynamic IP address when you connect to your internet service provider at home.) An IP address is divided into four bytes—each can be a number from 0 to 255. Some numbers are reserved by the Internet Assigned Numbers Authority (IANA) for special network functions. Each subnet is assigned an IP address. Subnet addresses are grouped into classes identified by letters that are reserved for subnets. Classes A, B, and C ranges are commonly assigned to a subnet. Classes D and E are used for special purposes. The IANA assigns a block of IP addresses to businesses and government agencies. Each business and government agency then reassigns IP addresses to network devices and subnets within their organization. Most consumers connect to the internet through an internet service provider (ISP). An ISP is a business that is assigned a block of IP addresses. The ISP assigns a dynamic IP address to the consumer’s computing device when it is connected to the ISP service.
48
Chapter 2: Talking Intelligently About Communication Networks
Home computers are usually connected to a home router, either using a cable or through a wireless connection to form a home network. Each computing device on the home network has its own IP address that identifies the computing device. Each computing device connects to the internet through the router. The dynamic IP address assigned by the ISP is assigned to the router. The router receives packets from computing devices on the home network and forwards those packets to the ISP. Packets received by the router are forwarded to the appropriate computing device on the home network. A subnet’s IP address has two parts: one part identifies the subnet and the other part identifies the network device (node) on the subnet. An IP address is divided into four sections (255.255.255.255). Each section has 8 bits (combinations of 0s and 1s). The parts of the subnet IP address are created by grouping the first, second, and/or third sections to identify the subnet. The fourth section is always used to identify the node on the subnet. A node is a computing device on the network. A subnet mask is used by the TCP/IP protocol to determine if a computing device is on the local subnet or on a remote network. Subnet masks are usually represented as a decimal value. Zeros are used to mask the node portion of the IP address. 255 masks the subnet portion of the IP address. For example, a subnet mask of 255.0.0.0 indicates a network with one subnet and many computing devices. In contrast, a 255.255.255.0 subnet mask has a network with many subnets and few computing devices.
The Internet The internet is more than Google, Facebook, Amazon, and other popular websites. It is a global network of networks operated by telecommunications companies connected by routers. Telecommunications companies have backbone networks divided into regions. Each region has a point of presence (POP) where the network can be accessed, referred to as a network access point (NAP). Each backbone network connects to other backbone networks, enabling the transmission of packets over the internet to any NAP throughout the world. A web server is the primary network device on the internet. A server is a computer that serves up files; a webserver serves up web pages and related files (i.e., video, pictures, audio) using the Hypertext Transfer Protocol (HTTP). Collectively this forms the world wide web (WWW) that shares information over the internet. There are two methods for sharing information over the internet: using the File Transfer Protocol (FTP) to request a file from a server, and using a browser. When using FTP, a network device sends a specially formatted request to a server asking that the server send it a specific file. The server then sends the file to the network device if this device has rights to receive the file.
Web Server
49
FTP FTP enables you to send or receive files on your computer. There are software programs that are available that put a nice “user interface” on FTP to presumably make it simpler to use. However, you do not have to download a software program to use FTP as it is built into your operating system, the software that runs your computer. In this case, we will use Windows to explore FTP using the command prompt in Windows by following these steps. 1. Select the Search Windows icon (the magnifying glass, toward the bottom-left of your screen). 2. Type Command Prompt. 3. Select the Command Prompt link to open the Command Prompt screen. The command prompt screen is the way old-timers used to use the computer, by typing in commands. Today, programmers and network engineers still regularly use the command prompt to do all sorts of tasks by using commands or writing scripts, and using little programs to do some amazing things. The trick is in knowing the commands to do so. 4. Type FTP at the prompt (>) and press Enter. You are now in FTP and all you have to do is know the commands to use it. There are plenty of resources on the net and videos about how to use FTP commands to connect to a remote network device. However, network administrators tend to deactivate FTP capabilities for security reasons. Still, it is a simple and useful example of using the command line.
The most common way to access information over the internet is by using a browser. A browser is an application that enables you to request a web page from a web server by using the website name. Alternatively, you can enter the IP address of the web server that contains the website. The website name is called the Uniform Resource Locator (URL). You recognized this as www followed by a unique name and extension such as www.jimkeogh.com. The extension such as .com, .edu, and .gov are referred to as a top-level domain. The website name must be unique within the top-level domain. That is, there can be only one www.jimkeogh.com. The request made by the network device flows through an internet service provider (ISP). An ISP is an organization that has a network connected to a telecommunications carrier’s backbone network. The website name is looked up in a domain name server (DNS) that has a list of all website names and the corresponding IP address of the web server that hosts the websites. The DNS returns the IP address to the ISP if the website name is found. The ISP then sends the request to the web server’s IP address.
Web Server A web server is a network device (a computer) that stores websites and files associated with the website such as video and pictures. Each web server is assigned a unique IP address on the internet. A web server has two communication connections, called ports, each identified by a port number. Port 21 is used for FTP communication and port 80 is used for website requests. These port numbers are standard for enabling requests to be sent to the appropriate port. Once connected to the port, the web server transmits the requested data using the corresponding protocol: FTP or HTTP.
50
Chapter 2: Talking Intelligently About Communication Networks
–– FTP requests are authenticated based on the sender of the request’s right to access the requested file. Once authenticated, a copy of the file is transmitted to the sender over port 21. –– HTTP requests are not normally authenticated. The web server transmits a copy of the index.html file—commonly known as the home page—to the sender over port 80. The content of index.html is written in hypertext markup language (HTML). HTML consists of text that is to be displayed on the screen and HTML’s standardized instructions (which are a set of codes known as HTML’s markup language) that tell the browser how to display the text. In addition to text, there can be references to graphics, videos, audio, and other files that are displayed on the webpage. Links to other files, called hyperlinks, can be embedded in the webpage and appear highlighted. Selecting a hyperlink causes the browser to make another request from the web server. Here is a very simple example of a webpage. Useful webpages are far more complex than this example. There are many YouTube videos that can help you explore HTML and learn how to create a website. Also visit www.w3schools.com, which is a good source of website commands. –– Text within less than () is a tag—a markup command—that tells the browser how to display the webpage. HTML commands don’t appear on the webpage. –– This webpage begins with the tag, indicating that this is a HTML file. –– The tag indicates where markup commands begin and the tag indicates where commands end. –– The tag tells the browser where the body of the webpage begins and the tag indicates where the body ends. –– The tag tells the browser that the following text should be displayed as the largest headline on the webpage. The browser decides the font and size of the text. Text between the tag and the appears on the webpage as a heading. –– Text between the
and
appears as a paragraph on the webpage. The browser determines the font and size of text in a paragraph. So here is an instance of a very simple little “webpage” in HTML:
My First Heading
My first paragraph.
A Behind-the-Scenes Look at Email
51
When the web server displays the HTML, it might look like this:
My First Heading My first paragraph.
Remote Login A frustration all of us experience is trying to explain to the help desk what is wrong with software running on our computer. For many, the frustration is short-lived when the help desk tech remotely logs in into the computer and sees the problem. This is made possible over the network through a process called remote login. Both the help desk computer and your computer must be running a desktop-sharing software program. The help desk computer sends packets containing keystrokes and mouse clicks and your computer treats them as if they were entered on your computer. Your computer responds by sending the help desk computer packets that contain the screen images that appear on your computer. The help desk tech has the look and feel of using your computer. Although there are commercially available remote access software packages such as RemotePC or TeamViewer, remote login software called telnet is usually available as part of the computer’s operating system. It can be run by typing at the command line prompt: telnet followed by the IP address of the remote computer. Once an ID and password are entered—usually required—the command line prompt of the remote computer is displayed. Although telnet is available on most networked devices, sometimes telnet is deactivated as a security precaution to prevent hackers from gaining access to the computer. The command prompt is a holdover from the days before computers used icons and other graphics to interact with programs. The screen shows a flashing cursor prompting you to type in a command at the prompt (the right arrow on your keyboard). The command is processed and results are displayed as text on the screen, along with another flashing cursor waiting for the next command.
A Behind-the-Scenes Look at Email There is more to email than Gmail, Outlook, and other commonly used email applications. Those applications, referred to as email clients, are tools for creating and formatting an email message into a format that is necessary to send the message over the internet. Here’s a look what happens behind the scenes. There are two email servers that are involved in sending and receiving an email. These are the Simple Mail Transfer Protocol (SMTP) server and the Post Office Protocol 3 (POP3) server. The email application sends the message to the SMTP server, which in turn sends the email to the destination email server called the POP3 server. Each email server has an IP address and a port connection that connects to the network. The SMTP server sends the email message using its port 25 and the POP3 receives incoming emails on port 110.
52
Chapter 2: Talking Intelligently About Communication Networks
SMTP The email has a destination address such as
[email protected]. The SMTP server receives the email on port 25 from the email client, then decomposes the destination address into two parts: the name of the person receiving the email (myname) and the domain name of the POP3 server (myserver.com). The SMTP server uses the DNS to retrieve the IP address of the domain myserver.com. The SMTP server then sends the email to the POP3 server associated with the corresponding IP address. The POP3 server reads the destination address and places the email into the person’s email box. If the SMTP server is unable to connect to the IP address, the email is placed in the send mail queue. Every 15 minutes for four hours, the SMTP server resends the email. If the email still isn’t delivered, the email client is notified of the delay, and after five days the email is returned to the email client as undeliverable. The email client reformats the email message into a format that is understood by the SMTP server. The format includes text commands to tell the SMTP what to do. Table 2.3 contains a sample of these commands. As an end user, you, of course, will not be using these commands. Table 2.3: A sample of SMTP commands Command
Description
HELO EHLO MAIL FROM RCPT TO DATA
Introduction Introduction and requesting extended mode Sender address Destination address The first three lines are TO, FROM, Subject followed by the body of the email message. Reset End session
RSET QUIT
POP3 The email client is software on the computer used to send or receive messages. The email client connects to the POP3 server to retrieve emails for a specific account using the user ID and password. Once the login is authenticated, the email client requests that emails be transmitted from the POP3 server to the email client. Table 2.4 contains a sample of POP3 commands. The email client requests a list of emails in the user’s email account and then displays the list on the screen. The user selects an email to open and another request is made by the email client to the POP3 server to retrieve that specific email. The email client can also request the first several lines of the email
Attachment
53
to preview on the screen. The user can select an email to be deleted from the user’s email account. Table 2.4: A sample of POP3 commands Command
Description
USER USER QUIT LIST RETR DELE TOP
User ID Password End session Retrieve a list of emails and their size Retrieve a specific email based on the position of the email on the list of emails Delete a specific email based on the position of the email on the list of emails Retrieve the first X lines of a specific email.
Internet Mail Access Protocol Server There is another type of email server called the Internet Mail Access Protocol (IMAP) that handles both incoming and outgoing emails on port 143. The IMAP server enables users to organize emails into folders that appear on the email application screen and on the IMAP server, enabling the email to be read by another email client. You can log onto IMAP from two email clients at the same time and read the same email on both email clients. In contrast, the POP3 server doesn’t allow you to read the same email on multiple email clients. Searching for an email occurs on the IMAP server and not on the email client as is the case in using a POP3 server. However, the email client must be connected to the IMAP server to read emails unless the email client copies all emails to the local computing devices. This process is called a cache (the use of software or hardware to store often used data so that future requests for that data can be served faster) and it enables the email client to access emails even when the computing device is not connected to the network.
Attachment Large text—such as a book, photos, and other large documents—can be inserted into an email. This is referred to as embedding and makes the email unnecessarily long. An alternative is to attach the large file to the email. The email client (e.g., Outlook) shows that attachment as a line on the screen that is a link to the separate file that contains the attachment. The attachment is sent separately and stored separately from the email.
54
Chapter 2: Talking Intelligently About Communication Networks
When the email/attachment reaches the destination, the email client displays the email showing the link to the attachment. Selecting the link causes the email client to request the attachment file from the email server. Once received, the attachment is displayed using the appropriate application.
The Intranet The intranet is similar to the internet except that the intranet is private within an organization. The intranet has a web server that contains webpages and an email server that facilitates sending and receiving emails. When you open your browser on your work computer, the browser typically requests that the intranet web server send the index.html file—this is the intranet home page that contains announcements and links to other resources such as applications that are accessible within your organization. Information is shared within the organization across the intranet using TCP/IP protocols, the same that are used to share information over the internet. The organization uses the intranet to access the internet using a proxy server. A proxy server is a network device that is the gateway between the intranet and the internet. The proxy server is usually the only network device on the intranet that is also on the internet. The proxy server is the only network device to have an IP address that is known to the public. Your organization’s domain name mycompany.com is linked to the IP address of the proxy server. The proxy server usually contains your organization’s home page and other webpages that are available to the public. All requests (including emails) from the public are sent through the proxy server. Requests to access the internet from within the organization are sent over the intranet to the proxy server, then sent over the internet. The destination internet web server knows that the request is coming from the proxy server. The network device within the organization that made the request is not identified. Requested files are sent to the proxy server. The proxy server then forwards the requested files to the appropriate network device within the organization. The same process is used with emails. The proxy server also contains a program called a firewall that monitors the content of all outgoing/incoming requests to ensure that the communication adheres to the organization’s policies. You might have noticed the workings of the firewall if your request to access Facebook was blocked. Your organization probably has a policy that prohibits access to Facebook using the organization’s intranet. However, your online marketing team may need to access Facebook as part of their job. Their access is not blocked by the firewall. This is possible because firewall applications can be set to apply restrictions to specific intranet IP addresses. Likewise, the firewall can block incoming requests from the internet such as spam, malware, and from websites that the organization’s policy deem inappropriate, which are then isolated from the intranet.
Wi-Fi
55
Wi-Fi Wi-Fi is a term used to describe a wireless connection to a network. The network within the organization is created using various types of cables to connect network devices. Cables are laid through ceilings, walls, and floors that involve many hours of labor and disruption to the work environment. At times, hard-wiring the network is not cost-effective. An alternative is to install a Wi-Fi network that reduces the need to hard-wire network devices. A Wi-Fi network is created using Wi-Fi hotspots. A Wi-Fi hotspot is a network device that contains a transmitter and is connected via cable to the network. A Wi-Fi hotspot has a coverage area of approximately 100 feet within which it can reliably receive or send a signal. Transmission becomes less reliable the further the mobile device is away from the coverage area. The network connection works, and then it doesn’t work. Some organizations experience dead spots when implementing a Wi-Fi network. A dead spot is an area that is outside a Wi-Fi hotspot coverage area. There is no network connection. The solution is to install another Wi-Fi hotspot that is hardwired to the network; however, running cables might not be economically feasible. Installing a repeater is an alternative. A repeater is a type of Wi-Fi hotspot that is not connected to the network via cable. Instead, a repeater receives a signal from within its coverage area, and then retransmits the signal to the next Wi-Fi hotspot. Wi-Fi transmission uses the 802.11 network protocol to govern how packets are transmitted. Wi-Fi transmission is conducted at 2.4 GHz or 5 GHz radio frequency band. A GHz is 1 billion cycles (waves) per second. There is a range of frequencies within the radio frequency band. A Wi-Fi___33 network device may transmit on one set frequency to transmit packets; however, there are two drawbacks. Radio transmission is vulnerable to interference by nearby frequencies that disrupt and slow down transmission because packets need to be resent. The use of a signal also might cause a traffic jam as packets queue to be sent over the single frequency. Frequency hopping is an alternative and avoids the drawbacks of transmitting on a single frequency. Frequency hopping requires a transmitter to transmit using a set of frequencies hopping from one to another, thereby reducing the effects of interference and making multiple frequencies available to other transmitters. Bluetooth Bluetooth is a protocol that enables computing devices and electronic devices to connect with each other using a low-powered radio frequency transmitter transmitting at 2.45 GHz. The signal is weak, requiring the sending/receiving devices to be within a 32-foot radius of each other. This is referred to as a personal-area network (PAN) or a piconet. When two Bluetooth devices are within range of each other, both devices automatically adjust to transmission so not to interfere with each other.
56
Chapter 2: Talking Intelligently About Communication Networks
Cell Phone Technology Cell phone technology is used by mobile communication devices—smartphones, tablets—for email, texting, interacting with apps, and for phone calls. Mobile communication devices use radio waves to transmit and receive data over the cell phone network that is also connected to a telecommunications carrier larger network—the same network that connects most communication devices to the internet. A 10-square-mile hexagonal grid formed by the telecommunications carrier is called a cell. Each cell has a base station that sends and receives transmissions from mobile communication devices within the cell. Each cell has about 800 radio frequencies that can be used by mobile communication devices within a cell. Two frequencies are logically grouped into a duplex channel. The duplex channel is used per call—one to transmit and the other to receive data. The transmission range is relatively short and requires low power, a factor that helps designers and engineers reduce the size of mobile communication devices. The short transmission range also enables other cells to use the same frequencies without experiencing interference. A base station is connected to the Mobile Telephone Switching Office (MTSO) of the telecommunications company. The MTSO forwards data to and from the landline telephone system.
Transmission As the mobile communications device moves about, it automatically monitors the nearby cell. Each base station transmits a System Identification Code (SID) over a control channel. The mobile communications device displays a signal icon indicating the signal strength. A “no service available” message is displayed if a SID is not received. A SID is also programmed into the mobile communications device by the telecommunications carrier, enabling the telecommunications carrier to identify the mobile communication device. Once connected, the mobile communication device sends the base station a registration request that tells its possible connections that the mobile communication device is active. The base station forwards the registration request to the MTSO so the MTSO knows that the mobile communication device is ready to receive incoming calls. The MTSO also sends over the control channel to the mobile communication device, telling the cell phone which two frequencies to use for transmission. Communication between the base station and the mobile device is made using a control channel. The mobile computing device then begins transmission. As the cell phone signal weakens, the base station hands off the mobile computing device’s transmission to the neighboring base station, enabling transmission to continue seamlessly. A process called roaming occurs if the SID programmed into the cell phone does not match the SID of a base station. Roaming means that the mobile communication device is outside its telecommunications carrier’s base station. Another telecommu-
Virtual Private Network (VPN)
57
nications carrier’s base station receives the transmission and the SID to its MTSO. The MTSO contacts the MTSO of the mobile communication device’s telecommunications carrier to confirm the mobile computing device SID. Once confirmed, the base station’s MTSO processes the transmission. Most mobile communication devices display a message indicating that roaming has occurred and that roaming charges apply. You can set your smartphone to disallow roaming so that you do not get hit with extra changes. Besides SID, there are two other codes in the mobile communication device. These are the Mobile Identification Number (MIN) that identifies the mobile communication device based on its cell phone number and the Electronic Serial Number (ESN) that is entered by the cell phone manufacturer to identify the mobile communication device. The combination of MIN and ESN uniquely identifies the mobile communication device. Generations: The Gs Mobile networks are commonly identified by generation: as 3G and 4G. The number refers to the generation of the mobile network. First-generation (1G) mobile networks used analog technology that was prone to interference. Second-generation (2G) mobile networks used digital technology. Besides reducing interference in transmission, digital technology enabled more channels (bandwidth) to send and receive communications. Third-generation (3G) mobile networks further increased bandwidth and transmission speed, enabling the transmission of multimedia. Fourth-generation (4G) mobile networks further increased transmission capacity to deliver HD TV, teleconferencing, and other applications that were traditionally reserved for desktop computing devices for basically for receiving and not sending: one way communication. Fifth-generation (5G) mobile networks enable reliable, quality, two-way video transmission that can be used for telemedicine, training, and collaborative teamwork from anywhere.
Virtual Private Network (VPN) Although the internet provides a remote connection, it does so over a public network that can be hacked, which is why organizations tend to use a virtual private network (VPN) to access the organization’s network from remote locations. An organization can run cables from the organization network to remote locations to create a private network—one that only the organization can access. Of course, this is not economically feasible. The alternative is to create a dedicated electronic pathway to remote locations using the telecommunications carrier’s network, referred to as a virtual private network since only the organization can use the dedicated electronic pathway. The virtual private network sometimes referred to as a tunnel is secure in that only those authorized can access the VPN to send data, audio, and video. Two common uses of the VPN are to connect a remote computing device to a network, referred to as remote access, and to connect together two networks, referred
58
Chapter 2: Talking Intelligently About Communication Networks
to as site-to-site. Remote access requires that the computer have a program called a VPN client, such as Cisco AnyConnect Secure Mobility Client. The VPN client authenticates the user and connects the remote computer to the network, then facilitates transmission. Once connected, the remote computer device can access any program, data, and resources that are available on the network. Site-to-site connects similarly, except that a VPN network client runs on a network device that connects the remote network to the network. Network devices on the remote network send packets to the VPN network client, which then facilitates communication to the network, giving remote locations the feeling that they are on the same network. The VPN can connect together two different kinds of networks, each using a different protocol. It is common for organizations to send encrypted packets over a VPN. However, some organizations choose to use a trusted VPN. Instead of the organization encrypting packets, the organization relies on the telecommunications carrier to protect packets along the network. Encrypting data yourself is the best approach when using a VPN. A mobile virtual private network is used to connect mobile computing devices to a network using a VPN. VPN uses a fixed IP address when creating the VPN. Mobile VPN uses a fixed identifier for a computing device. The mobile VPN software handles the necessary network connection management while the computing device moves between cells. Mobile VPN is widely used by law enforcement and service organizations to ensure a reliable, secure, private network connection. Consumer VPN There is a trend for the general public to subscribe to a VPN service to protect their online communication, especially on publically available Wi-Fi hotspots. A VPN service enables the customer to anonymously access the internet. All transmissions are encrypted. However, some internet sites block VPN services to enforce geo-restrictions. A geo-restriction is limiting access to the internet site from within a geographical location. For example, some internet sites located outside the United States are not accessible from computing devices based in the United States.
Network Administration There is a team of engineers and technicians who install and maintain a network. Network designers and construction architects begin the process of building the network. The network designer assesses the needs of the organization for a network by identifying computers, printers, and other devices that need to be connected to the network. Once the needs assessment is complete, the network designer determines the appropriate layout of the network (called a topology), and then references building plans to decide the pathway of network cables and other network devices (i.e.,
Network Operating System
59
routers, Wi-Fi hotpots). The network designer also includes specifications for communication closets. A communication closet is a small room usually on each floor where the ends of cables from network devices on the floor are aggregated. The communication closet also contains routers and other network devices used to manage the network. Communication closets are connected via cables to a central communication center. The construction architect works with the network designer to devise a plan for running the network cables and building communication closets. The construction architect determines the cost-effective means to install the network in the office. Sometimes cables are run through walls or installed in dropped ceilings and raised floors. A construction team that includes electricians follows the network design and construction plans to build the network. Many times, the organization hires a general contractor to execute these plans. The network administrator, sometimes referred to as a systems administrator, oversees the operation of the network once the network is installed. The network administrator manages the network. This includes granting and removing the right access to all or part of the network to the network; creating, monitoring, and modifying network segments; monitoring, maintaining, upgrading network management devices; developing, implementing, and enforcing network security procedures; troubleshooting network problems; and handling customer support and vendor relationships.
Network Operating System An operating system (OS) is the name given to a group of programs that enable a computing device to work. Windows is an operating system for PCs (see Chapter 3). UNIX is another type of operating system that can be used for different kinds of computing devices including web servers. Networks also have an operating system, called a network operating system (NOS), that enables the network to work. The network operating system coordinates activities of network devices across the network to ensure that the network runs without incident. The network operating system typically contains operating system programs and systems programs sometimes referred to supportive programs because they are not directly involved in managing the network. You can run the network without systems programs but not operating system programs. Operating system programs include the user interface which the screen uses to interact with the network operating system; job management, which enables multiple network devices to access the network; manage network protocols (rules for using the network); file services used to manage network related files; and other essential programs to keep the network running. Depending on the type of network operating system there will probably be a web server program used to create and manage webpages; an application server used
60
Chapter 2: Talking Intelligently About Communication Networks
to share computer applications (e.g., Word) with network devices; mail server, which is the email program; and database management software used to manage data.
Network Monitoring A critical responsibility of the network administrator is to monitor network operations. The network operating system contains system software that is used for monitoring; however, other commercial software products are usually acquired to enhance monitoring capabilities. Expert systems are now coming into play attempting to automate corporate datacenters. Enterprise level software has taken many of the tasks such as upgrading applications and has automated them across the network, saving immense amounts of time and reducing errors. Flow data: Monitoring the flow of data across the network identifies the path taken by a packet from origin to its destination and is also used to monitor traffic volumes over network segments. Packet data: Monitoring packet data is used to inspect the contents of packets to determine how applications are being used, and, for security purposes, to look for malware and investigate a security breach. Packets are copied from the network traffic flow, converted into a readable format, and then examined. Monitoring points: A monitoring point is a location on the network where the network administrator will monitor the network flow. In theory all network access points should be monitored but this is not practical because there is simply too much data to analyze. Typically, a location where packets merge, such as the proxy server or a link between networks (gateway) is a preferred monitoring location. Additional monitoring points are added to examine access points once suspicion is aroused. Historic data: Historical data is data capture over time. Real-time monitoring provides a glimpse of current network traffic flows. Analyzing historical network data helps to identify trends that might project how the effect events may have on the network such as seasonal holidays, major organizational meetings, and to identify details for an ongoing security breach. Associate data: Monitoring packets identify network devices that sent and received the packet but not the persons. Examining login information associated with the network session helps to identify the people involved in the communication.
Chapter 3 Talking Intelligently About Computers Mention “computer” and you probably think of a laptop, desktop, tablet, and even your smartphone—or maybe Alexa (from Amazon), your car, your television, your refrigerator, or your watch. All are computers, or at least have a computer as a major component driving the device. Computers are everywhere today, each different yet basically the same technology. Regardless of the size, power, and cost, all computers are a box of switches used to add and subtract. Nothing more. Yet setting those switches in the right positions can make cars drive themselves, identify a face in a crowd, answer any question in a fraction of a second, and remind you of important meetings. Computers make businesses work and open opportunities for businesses to grow. Understanding computer technology helps to cut through the hype and focus on how to tap computer technology to compete in the market place. Computers are commonly classified by the type of work performed by the computer. Personal computers are general computers used by a single user such as a desktop computer and a laptop computer. The difference is that the laptop is portable. Netbooks are ultra-light, portable computers that have the basics capable of running the most common applications but not necessarily all the power found in a laptop. A workstation is another computer category. A workstation is an enhanced desktop computer designed to perform special tasks besides running basic computer applications such as Office. For example, there are workstations used for design graphics, video editing, and other similar memory- and processor-intense work. Some workstations have large screens while others have multiple screens enabling multiple applications that run simultaneously to have their own screen. Programmers typically have a workstation where one screen is used to write the program and the other screen shows the output of the executing program. A tablet is still another computer in the form of a screen, but has no mouse, no disk drive, and, until fairly recently, no keyboard. Applications called apps appear as icons on the screen. The screen is a touch screen where tapping the icon runs the application, and then a finger is used to point, tap, and drag to interact with the application. Tablets have a pop on-screen keyboard that can be used to enter text. Today there are multimode tablets—think of these as wannabe laptops—that have a removable keyboard and can be used as a laptop. A smartphone is both a cell phone and a tiny tablet that can be used to make and receive calls, text, and run applications (apps), many of which run on a desktop, laptop, and tablet. Smartphones have access to the internet and can send and receive email; take, view, and edit digital pictures and video; and use a global positioning system to find your way practically anywhere in the world. Today some of the same capabilities are available in wearable computers such as smartwatches.
DOI 10.1515/9781547400812- 003
62
Chapter 3: Talking Intelligently About Computers
Larger Computers While computers are shrinking in size from a desktop computer to a watch, an organization still uses larger, powerful computers to manage business operations. The backbone of large organizations is the mainframe computer. A mainframe computer is a relatively large computer housed in a data center that can quickly process very large amounts of information such as order management, accounting functions, and distribution management. Other computers throughout the organization use the intranet to connect to the mainframe computer (see Chapter 2). A data center is a location within the organization—often a free-standing building—that contains mainframes and other computers that are central to the organization’s operations. Today, as an alternative to investing in more and more hardware, some organizations outsource the data center to a vendor and connect via the cloud (see Chapter 9). A server is another category of computers found in organizations. A server hosts programs and information that are shared by multiple users over a network. Technically any computer (including some tablets) has the basic capacity to be a server—a very slow server, in some cases. However, servers used by organizations are computers that have robust capabilities to meet high-demand usage. They have hardware and programs that are tuned to respond quickly to requests made from other computing devices connected to the network. Here are the more common servers: –– Database server: A database server is a computer that runs a database management system (DBMS) (see Chapter 5). Think of a database server as an electronic filing cabinet and the DBMS as the librarian who processes requests for information and stores information into the library. –– Application server: An application server is a computer where applications are stored and distributed when requested by users. For example, Office is typically stored on the application server rather than on each computer in the organization. –– Mail server: The mail server is a computer that runs the email service for the organization. –– Print server: The print server is a computer that manages printing documents on printers connected to the network. The document selected to be printed is sent over the network to the print server. The print server sends the document to the designated printer. –– Network server: The network server is a computer that manages the computer network. The network operating system (see Chapter 2) runs on the network server and grants permission to use the network and routes network traffic to the desired destination. –– Web server: The web server is a computer that manages the organization’s intranet and internet.
A Box of Switches
63
–– File server: The file server is a computer that enables multiple users to save and use files stored on the file service. Think of a file server as a shared disk drive.
A Box of Switches Regardless of its category, a computer is a box of switches. This is hard to believe, since there is a tendency to think of computers as near-human brains that can solve complex problems at the speed of light. This perception is more science fiction than reality—at least for now—however, artificial intelligence will soon become embedded in computers, giving humans a run for our money. We’ll leave artificial intelligence for a later chapter. For now, let’s see how the box of switches we call a computer really works. A switch has two states: off and on. In Chapter 2, you saw how each state represents binary digits. Off is 0 and on is 1. One switch limits the size of the number that can be stored to 0 and 1. Larger numbers can be stored by grouping switches into a set. A set of eight switches called a byte (8 bits) can store up to 11111111 (all eight switches turned on). We know this as the decimal number 255. A set of two bytes, referred to as 16-bit, can store up to the decimal number 65,535; four bytes, referred to as 32-bit, can store up to 4,294,967,295; and eight bytes, referred to as 64-bit, can store up to 9,223,372,036,854,775,807. The terms 32-bit and 64-bit probably sound familiar, especially if you are in the market for a computer. Advertisements usually describe a computer as having a 32-bit processor or 64-bit processor. This implies the maximum amount of memory supported by the computer. A 32-bit processor supports a maximum of four gigabytes of memory and a 64-bit processor support memory well over four gigabytes of memory. More about processors and memory and what this means are discussed later in this chapter. For now, simply know that the higher the number of bits, the more data can be stored in memory. One kilobyte of memory is roughly 1,000 bytes; one megabyte of memory is 1,000 kilobytes; and one gigabyte of memory is roughly 1,000 megabytes. Fractions Binary digits are whole numbers, not fractions. You can’t position the switch halfway. If you could, the binary number would be 0.5 but that’s impossible. However, fractional numbers (fractions) are represented using two components referred to as floating-point. The first component contains all the digits in the number—whole and fractional. The second component is a number that implies the position of the separator (period or comma). Let’s say you want to store the number 15.5 using floating-point. The first component is 155. Notice there isn’t a separator between whole and non-integer numbers. The second component is 1 (decimal number). The first component is read 155 then the second component tells you to place the decimal point to the left of the last digit.
64
Chapter 3: Talking Intelligently About Computers
Figure 3.1: Conversion from decimal to floating point.
Building Blocks of a Computer Let’s build the box of switches: the computer. We’re not really building a computer from parts; instead we’ll use boxes and lines to represent parts. For now, consider the computer as a city (motherboard) where building blocks—memory (the switches), processor (the computer within the computer), and input/output ports (a way to receive/send information outside the computer) are buildings in the city (Figure 3.2). The screen, keyboard, mouse, printer, and the network are in neighboring communities connected by cables or Wi-Fi to the town through an input/output port.
Building Blocks of a Computer
65
Source: From Pixabay under Creative Commons CC0 license Figure 3.2: Consider the motherboard a city, with components on the motherboard as buildings.
The motherboard (the city) is the biggest circuit board inside the computer. Memory is a computer chip—many chips—that look like long, black bugs with many legs that are attached to the motherboard. The processor, referred to as the central processing unit (CPU), is a big rectangle attached to the motherboard. Input/output ports are usually outside the computer, although some computers also have input/output ports as slots attached to the motherboard where you can insert other circuit boards. Building blocks are connected by a street called an external bus. A bus is a set of thin copper lines on the circuit board—not a bus you board at a bus stop. Each copper wire is used to transmit an electrical signal that is encoded with a 0 or 1—a binary digit. The bus is defined by the number of copper lines. A 32-bit bus has 32 copper lines that can transmit 32 bits at a time and a 64-bit bus has 64 copper lines that can transmit 64 bits at a time. Every town has a rhythm of activity visible using time-lapse photography. There is little traffic in the wee hours of the morning; then the pace picks up at sunrise and slows down in the evening. The computer too has a steady rhythm set by the clock called an oscillator that creates pulses of electricity. Electricity comes from the power outlet and enters the computer power supply that converts incoming alternating current (AC) to direct current (DC), which is used to power other building blocks inside the computer including the clock. (No conversion is necessary if the computer runs on a battery since the battery supplies direct current.) The rate of pulses gener-
66
Chapter 3: Talking Intelligently About Computers
ated by the clock is the speed at which zeros and ones travel along the external bus. Speed is measured in gigahertz (GHz)—billions of pulses per second. Don’t confuse the clock with the system’s clock, which is the real clock in the computer that displays time. Memory consists of electronic switches—a whole bunch of switches—used to store numbers. Each byte of memory has an address on the bus. Instructions (a computer program) tell the processor what numbers to store in specific memory addresses and when to retrieve those numbers for processing. Other instructions tell the processor how to process those numbers. Think of the processor as a computer within the computer because it has its own building blocks within the processor, including a memory, internal bus, and internal clock that function similarly to the computer itself (Figure 3.3). The processor has an oscillator (clock) that creates pulses of electricity that moves bits along an internal bus. The processor also has its own memory, referred to as a register, used to temporarily store numbers while it is processing. Yet for all its power, the processor only adds and subtracts numbers but it does so in a way that enables the computer to make decisions based on instructions written by programmers. This might seem magical, but it isn’t. Let’s say that you want to compare two names. You’ll recall from Chapter 2 that letters of a name are represented by a number in the Unicode Standard. Both names are really numbers in the computer as a series of binary digits. The processor determines if they are the same name (numbers) by subtracting the numbers. If the difference is zero, then the names are the same. If the difference is not zero, then they are different names (Figure 3.4).
Source: From Pixabay under Creative Commons CC0 license Figure 3.3: The processor is a computer within the computer and has similar components as the computer itself.
Building Blocks of a Computer
67
Figure 3.4: The processor uses subtraction to decide if two data elements are the same.
Input/output ports can be considered a way to extend the external bus, enabling you to add additional building blocks to the computer (keyboard, mouse, monitor, and printer). Each input/output port has an address on the external bus used by the processor to retrieve/send zeros and ones to the external device. The external device usually has its own building blocks—memory and processor—and instructions on how to process the incoming and outgoing numbers.
The Central Processing Unit (CPU) The processor is the computer component that performs all the operations that make your computer work. Some like to think the processor is the brains of the computer, but really it consists of circuits that perform addition and subtraction, which is the basis for performing complex calculations that enable the computer to outpace humans for processing information and making decisions. The processor is the computer within the computer because it has its own processor, memory, an internal bus, an internal clock, and input/output ports. The processor within the processor is called the arithmetic logic unit (ALU), which is a circuit that performs arithmetic and logic operations. Memory, called a register, is used to temporarily hold numbers that are used for processing. The internal bus is used to move bits within the processor. The internal clock generates pulses that move bits throughout the processor. The input/output ports are circuits that connect the processor to the external bus on the motherboard.
68
Chapter 3: Talking Intelligently About Computers
The first processors had circuitry that performed two processes: addition and subtraction. Multiplication can be performed by repeating addition; likewise, division can be performed by repeating the subtraction. You could say that the processor “knows” how to add and subtract numbers. All that is needed is to “tell” the processor to “add” and provide it with two numbers. Addition and subtraction are considered part of the processor’s instruction set. An instruction set is a set of instructions that is understood by the processor. The programmer doesn’t have to tell the processor how to do it—only to do it. A programmer writes instructions (a program) to tell the processor how to perform processes from each line in the program. Programs are written using one of many programming languages that get translated by other programs into the commands that the processor understands—the processor’s instruction set. More on programming in Chapter 4; for now, let’s assume that the programmer writes the program using a programming language so that it will use the instruction set that is understood by the processor and the programmer does not generally have to worry about that. The instructions supplied by the programmer are stored in memory outside the processor. The processor retrieves and processes each instruction in sequence. This is referred to as executing the program. The instruction travels from the memory location over the external bus into the processor’s input port and is stored temporarily in the processor’s register before the ALU executes the instruction. The result of the executed instruction is sent to the processor’s output port to a memory location outside the processor. This complete process is commonly known as the instruction cycle (Figure 3.5). Today the speed of computers is so fast that generally, millions of instructions are executed without any noticeable delay.
Figure 3.5: The instruction cycle is the process used by a processor to execute an instruction.
Let’s consider a program that draws a box. In the early days of personal computers, the programmer had to write explicated instructions to tell the processor how to draw a box—where to place the dots that form each line segment because the processor “knew” only how to add and subtract. Today’s processors have a robust instruction set—they “know” how to do a lot of things without requiring lots of instructions. Instead of writing lots of instructions to form lines of a box, the programmer writes one instruction “telling” the processor to draw a box. The programmer gives the processor the starting point of the box and the lengths of each line, which are called parameters. Circuitry in the processor contains step-by-step instructions on how to
Building Blocks of a Computer
69
draw a box. This means that there are fewer times the processor needs to go outside the processor to retrieve instructions, resulting in faster processing time.
The Process Control Block The process control block is the area of the processor that keeps track of all the processing occurring within the computer. Processing requires a process identification number, the memory address (the place on the motherboard that contains the current instruction), and data required for the process. The process identification number is a unique way each process is identified within the computer. The process control block also tracks the contents of registers within the processor; electronic files that are opened and used for processing, the status of input/ output devices (whether or not they are available), and the priority of processes (which process is more important than other processes). For example, processing instructions to print a document usually have lower priority than an instruction to write the document in Word. The process control block works differently on computers that have one processor than on computers that have multiple processors. On a one-processor computer, the process control block works asymmetrically dividing up the difference processes for the processor to accomplish. The processor control block, on computers that have multiple processors, works symmetrically, balancing the workload across all processors. It is the process control block that manages processing so that you feel that multiple programs are processing simultaneously on a one-processor computer. Yes, not all computers run two or more programs at the same time. In reality the process control block on a one-processor computer suspends one process (printing) while processing an instruction from another process (an incoming email message). A working process control block does this seamlessly without anyone noticing these slight delays in processing. Sometimes, however, too many processes are running; more than the process control block can handle—known as thrashing. You might see this when your computer slows to a crawl. This problem is solved by displaying the Task Manager in Windows (Ctrl-Alt-Delete) and closing some of the processes. Alternatively, rebooting your computer resets the process control block and also fixes this problem.
Memory Management Think of memory as one huge block of switches. As part of the booting process, the process control block starts at the first memory location and starts loading the operating system and then loads programs required to control hardware. The remaining
70
Chapter 3: Talking Intelligently About Computers
memory is then logically organized into blocks of memory. Each block, called a page, has a fixed size with a starting and ending memory address. Each application that runs on the computer is assigned to a memory block. Your computer has a fixed amount of memory. You’ll see a message stating “insufficient memory” to run a process if all memory blocks are in use by other programs. Rarely is this message displayed today because the process control block uses the hard disk as extended memory called virtual memory. The contents of some blocks of memory are moved to the hard disk temporarily and replaced with the next program. The process control block selects memory blocks to switch out to the hard disk that contains programs that have lower priority based on the frequency with which instructions are processed. For example, you may have Outlook running in the background while writing a report in Word. If memory gets tight, the process control block might decide to move the memory block containing Outlook to the hard disk because you are not sending or receiving emails at the moment. The process control block also decides what information should be stored in the cache and what information should be stored in main memory. The cache is memory located near the processor that is used for frequently needed instructions and data. It takes less time for the processor to access cache memory than main memory because of the distance between the processor and the cache.
Device Management Think of a device as something that isn’t built into the computer, such as a disk drive, DVD player, mouse, keyboard, screen, or printer. Even a flash drive is considered a device. Each device is independent of the computer and has its own circuitry, processor, and program that make the device work. In a sense, the computer doesn’t know how the device works and the device doesn’t know how the computer works. A device is connected to a computer using an input/output port sometimes referred to as an expansion port, which is really a direct electronic connection to the computer’s motherboard. A device has circuitry that communicates with the computer. Sometimes this circuitry resides inside the device, such as with an external disk drive, and connects via wire to the computer (USB port). Other times the circuitry is on a circuit board that plugs directly into the motherboard, such as a video card for the monitor. Many devices today connect to the USB port rather than being plugged into the motherboard. There is usually a language problem between a computer and a device. The computer doesn’t speak the language understood by the device and the device doesn’t speak the language understood by the computer. This becomes a problem when you want to use the device. For example, the processor sends a document to the printer connected to the USB port (input/output port) not knowing anything except that a printer is connected to the port. That is, the processor doesn’t know the instructions
Building Blocks of a Computer
71
to tell the printer how to print the document. Those instructions are known only to the printer. A program called a device driver bridges this language gap by translating the processor instructions into instructions understood by the device. Each device has a device driver provided by the device manufacturer. This is why you might see the message “device driver loaded” when you plug a new device into a computer that has a plug-and-play option, which will save you the trouble of finding and loading the device driver. When a new device is plugged into the computer, the operating system loads the device driver into memory either from the operating system itself or from the internet. Computers that don’t offer the plug-and-play feature require you to load the driver either from a CD supplied by the device manufacturer or from the manufacturer’s website before you can use the device. Each input/output port has an address on the motherboard, similar to a memory address. Instructions contained in a program tell the processor when to send or receive information from a specific input/output port address. The request is then sent to the input/output port where the device driver translates the request into instructions understood by the device. The computer manages devices in a number of ways. First, it keeps track of devices that are attached to the computer. For example, you won’t be able to print a document unless a printer is connected to the computer either by wire or wirelessly. The computer detects the printer connection. Printing options within programs running on your computer may be deactivated if a printer is not detected. Attempts to print will likely cause a message display stating that no printer is connected to the computer. Communications with a device continue while the computer is running. This is how the computer “knows” that the printer is ready for printing or not. The printer, for example, may send the computer a message that the printer is out of paper, causing the “out of paper” message to be displayed on the screen. The computer manages information going to a device so the device doesn’t become overwhelmed and the processor isn’t waiting for the device to process a request. This happens when printing. The computer creates and manages a print queue. When you select print, the document is sent to memory called a print buffer and a print request is entered into a print queue. In turn, each print request on the print queue is processed by sending the corresponding document to the input/output port connected to the printer. You can view the print queue on the screen to see the status of print requests and manage printing (see Figure 3.6).
72
Chapter 3: Talking Intelligently About Computers
Figure 3.6: Here is a print queue that you can display on your computer by selecting the printer from the Devices and Printers menu in Windows.
Types of Ports Here are common types of input/output ports. –– Universal serial bus (USB): The USB port is the most widely used port today and is used to connect practically any device to the computer. Data travels over the USB port cable in sequence—one at a time—compared with the bus inside the computer that transmits several bits simultaneously. The USB port uses the computer’s power source to power the transmission, which is why you can use the USB port to charge your smartphone. There are different versions of the USB port designated by a version number. Version 2.0 transfers information at a speed of 480 million of bits per second (Mbps). Version 3.0 transfers information at 5 gigabits per second (Gbps). –– High-definition multimedia interface (HDMI): The HDMI port is used to transmit audio and video signals to devices such as Blu-ray players, high-definition televisions, and high-definition DVD players. –– Modem port: The modem port is used to connect an internal modem to a network device such a router used to connect the computer to the internet. –– ExpressCard: The ExpressCard slot is used to connect the computer to the network at work (for instance) commonly referred to as an intranet. –– Thunderbolt: The Thunderbolt port is used to connect together up to six devices in a daisy-chain. A daisy-chain is a way to connect each device to another device. Only one of those devices is connected to the computer. Transmission from the computer to devices flows through each device. –– Peripheral component interconnect (PCI): The PCI port is a slot on the motherboard used to directly connect to circuitry of a device commonly referred to as a card, such as a video card. –– Memory card reader: The memory card reader port is used to connect memory cards from devices such as a digital camera to directly transfer pictures to the computer. This port can typically read a variety of memory cards types. –– External serial advanced technology attachment (External SATA): The external SATA is used to transfer data from mass storage devices such as hard disks for storage in an external device to the computing device.
Measuring the Processor The processor is a key selling point of every computer just like an engine is a selling point of a car. Processors could be measured by millions of instructions that the pro-
Building Blocks of a Computer
73
cessor could execute per second (MIPS) and computers measured by throughput— the time that is necessary to process an instruction (measured as MIPS)—and other factors outside the processor that influence processing speed. Let’s say that the size of an instruction is 16 bits. The external bus can carry 8 bits and the processor reads 8 bits at a time. The act of retrieving and reading an instruction is called a fetch. The number of bits read at a time by a processor is referred to as a word. In this example, the processor needs to fetch twice for each instruction. Now suppose the external bus carries 16 bits and the processor fetches 16 bits. Only one fetch is needed to read an instruction, giving it a higher throughput than the first processor (Figure 3.7).
Figure 3.7: The number of bits carried by the bus should be at least the same number of bits that a processor can read.
Older computers at times had a mismatch between the processor and the external bus that slowed the throughput. The process could fetch 16 bits but the external bus carried 8 bits, so in a sense, the processor had to wait. Today’s processors process billions of instructions per second and are usually waiting for other components of the computer to catch up, making this measurement a moot point. Likewise, computers are designed for maximum throughput. The internal clock speed is 1 gigahertz (1,000,000,000 waves per second). Processors are commonly described by the term core. A core is another term for a processor. You’ll see computers with dual, quad, and even more cores. Each is a processor—that is, the computer’s processor contains multiple processors which each can work independently, meaning that the computer can perform multiple calculations simultaneously required to display streaming video or computer games and other information intense applications. You’ll also see the size of the processor’s cache as another measurement of a processor. Recall that cache is a term used to describe memory located close to the processor that is used for temporary storage of data. Cache also refers to the amount of internal memory within the processor, also known as registers. The larger the cache means the fewer times that the processor needs to fetch information from computer memory. This is critical when the processor performs repetitive tasks and when the processor multitasks, since that data remains within the processor.
74
Chapter 3: Talking Intelligently About Computers
The instruction set is another critical measurement of a processor, especially when programs require high-definition graphics. Consider that a picture is composed of picture elements (pixels). Each pixel is represented by 24 bytes (color). High-definition (HD) video is composed of 1,080 lines, and each line is made up of 1,920 pixels. And that’s just one screen. The screen refreshes 120 times per second (called the refresh rate), and some screens refresh 240 times per second. Don’t get lost in the numbers. You only need to know that there are lots of bytes that need to be processed every second. In not-so-old computers, the processor’s instruction set didn’t have instructions for processing HD (high-definition), which meant chips outside the processor had to pre-process the information before sending it to the processor. The latest and greatest of today’s processors have an instruction set that can process HD itself. This factor becomes important if you need to select a workstation that will be used to create and edit graphics and video. Overdriving the Engine (Processor) You’ve probably heard that you can decrease fuel cost by buying a car that has an overdrive transmission. An overdrive transmission has a gear that lets the wheels turn faster than the engine. Some of today’s processors do practically the same, referred to as turbo boost. That is, getting more speed from the processor than is available from the processor’s clock. The processor speed is determined by the oscillation of the processor’s internal clock. Turbo boost, which is called overclocking the internal clock, allows the processor to run faster than the internal clock speed. Overclocking is used when only one or two cores are needed, such as when a complex task needs to be executed immediately.
(Kind of) Doing More Than One Thing at the Same Time A processor reads, executes, then moves on to the next instruction. It does one thing at a time. However, computers seem to do multiple things simultaneously, such as receiving emails and printing a document, all while you are writing a report in Word. There is a bit of slight-of-hand magic going on that gives the appearance of running multiple programs at the same time. We tend to look at each program as a separate entity. The processor “sees” an instruction and doesn’t care which program sent the instruction. The operating system keeps programs organized. (More about the operating system later in this chapter.) Each program sends the processor an instruction called a thread. The processor executes the instruction, then reads the next thread, usually from a different program. Instructions are typically executed so quickly that there is no delay in running any program, giving us the feeling that the processor is processing multiple instructions at the same time when it is really processing one instruction at a time. Another aspect of this magic is that we work slower than the processor. When typing a report, you pause between keystrokes. The processor could be waiting for
Building Blocks of a Computer
75
you, but it is really “looking” for other instructions to process, such as checking the network for incoming emails or sending your last report to the printer. The processor is multitasking. Even a pause for a fraction of a second is enough time for the processor to process an instruction. Today’s computers use hyper-threading (thousands of threads) and are truly multiprocessing because of their multiple cores (processors) in the computer. While yesterday’s computer had one processor that used magic to give the appearance of multiprocessing, today’s computers actually have multiple processors, each simultaneously processing instructions without the need for magic.
How the Processor Makes Decisions You don’t really need to understand the finer points on how the process makes decisions, but if you’re curious, here are the basics. Although the processor is at times considered the brain of the computer, it isn’t a brain. Engineers designed the processor’s circuitry to perform Boolean logic—true or false, which is a form of algebra. The value 1 is true and the value 0 is false. The processor has specially designed circuits called gates that compare two values and determine if a logical condition is met. There are seven types of gates that are combined to enable the processor to make complex decisions based on instructions written by a programmer. Here’s how those gates work. –– The AND gate determines if two values are the same. Remember that the values are either 1 or 0, a binary digit. For example, if value A = 1 and value B = 1 then the output of the AND gate is true, represented as 1. Likewise, if value A = 0 and value B = 0 then the output of the AND gate is also true and is represented as 1. However, if either value A or value B have unlike values, then the output of the AND gate is false, represented as a 0. –– The NOT gate determines if two values are different. If so, then the output of the NOT gate is true, represented as 1. If values A = 1 and B = 1 are the same then the output of the NOT gate is false, represented as 0. –– The OR gate determines if either of the two values are 1. If A = 0 and B = 1, then the output of the OR gate is true, represented by 1; otherwise, if neither value is 1 (A = 0 and B = 0) then the output of the OR gate is false, represented by 0. –– The NAND gate combines the NOT gate and the AND gate functions. The only way the output of the NAND gate is false is if both numbers are 1, such as A = 1 and B = 1. If at least one number is 0 such as A = 1 and B = 0, then the NAND gate is true, represented by a value of 1. –– The NOR gate combines the NOT gate and the OR gate functions. The only way the output of the NOR gate is true is if both numbers are 0, such as A = 0 and B = 0. If at least one number is 1, such as A = 0 and B = 1, then the NOR gate output is false.
76
Chapter 3: Talking Intelligently About Computers
–– The XOR gate requires that at least one number is 1, such as A = 0 and B = 1 for the output of the XOR gate to be true. If both numbers are either 0, such as A = 0 and B = 0, or 1, such as A = 1 and B = 1, then the XOR gate output is false. –– The XNOR gate requires that both numbers be the same for the output of the XNOR gate to be true, such as A = 0 and B = 0, or 1, such as A = 1 and B = 1. If either number is different, then the output of the XNOR gate is false. Let’s say you are looking for information about an employee in a database. You enter into the computer application the employee number 1425. The computer compares that number to each employee number in the database using the AND gate. The processor moves to the next employee number if the AND gate is false, but stops searching and retrieves the employee’s information if the AND gate is true. Stop! This explanation is an oversimplification. Searching a database for information is more involved than this simple explanation. The employee number is a decimal number that must be converted to its binary version and the AND gate compares each digit. The search itself is explained in great detail in Chapter 6.
Memory Computer memory is where various types of information are stored. Memory is a switch that can be set off or on by electronically “flipping the switch.” Flip the switch one way (off) to store the binary digit 0. Flip the switch the other way (on) to store the binary digit 1. The bit size of the processor determines the number of bytes (8 bits) of memory that can be accessed at one time. A 32-bit processor accesses 4 bytes of memory and a 64-bit processor accesses 8 bytes at a time. Another way to look at this is that a 64-bit processor running at 1 gigahertz can process 8,000,000,000 bytes per second. That’s a lot of memory. Most memory is temporary, referred to as random access memory (RAM). The contents of the RAM go away when the computer is turned off. RAM is one or more chips on the motherboard identified by the number of bytes on the chip. One of the largest has 128 gigabytes (GB) of memory on one chip. RAM is also identified by the read/ write access speed. A speed of 1600 MHz, for example, means that 1,600,000 bits can be read/written per second. Memory speed becomes important for programs that need to process a lot of data quickly, such as streaming video. For example, workstations used for video editing use a fast processor speed, external clock speed, and memory speed to ensure the level of performance required to work with video. RAM is called random because any part of memory can be accessed directly. You don’t need to start at the first memory address and then read other memory addresses sequentially to access a specific memory address when using serial access memory (SAM). Another type of memory is read-only memory (ROM). The contents of ROM
Memory
77
remain intact when the computer is turned off. ROM is used to store the basic input/ output system (BIOS). The BIOS is a set of relatively small programs that contain instructions on how to load the operating system and programs to start your computer into RAM. You see BIOS in action each time the computer starts. Initially, the processor retrieves the first instruction from ROM that tells the processor to load the BIOS into RAM. The processor then executes the BIOS. The BIOS performs the power-on selftest, referred to as POST, to make sure all primary components are working, including a check that read/write operation for all RAM work. The BIOS then begins the boot sequence of instruction that includes loading the operating system from a permanent storage device (disk) into memory. Once loaded, the operating system takes over the computer operation.
The RAM Divide RAM is divided into sections, each designated for a specific operation. –– System RAM is general memory whose speed is controlled by the bus. The time it takes data to travel from memory to the processor is referred to as the bus cycle and is determined by the number of bits that can travel along the bus at the same time (called the bus width), and the clock speed, which is commonly referred to as the bus speed. –– Cache memory is RAM located close to the processor and contains data most often required by the processor. Cache is relatively small in size and is primarily used to increase throughput by eliminating the need for the processor to wait to retrieve the data from the system memory. High-performance processors have cache built into the processor. Static random access memory (SRAM) is a type of RAM chip used primarily for cache because SRAM is extremely fast but very expensive too, which is why SRAM is not used for system RAM. Depending on the design of the SRAM chip, the speed of SRAM can run as fast as the processor, referred to as synchronous, or at a slightly different speed as the processor, referred to as asynchronous. For programs that require fastest throughput possible, SRAM synchronous is important; otherwise, you probably won’t notice the difference in response. There are two general categories of memory. These are volatile and nonvolatile memory. Volatile memory loses data when the power is turned off and nonvolatile memory retains the data when there is no power. Most RAM is volatile memory and ROM is nonvolatile memory. Other nonvolatile memory includes flash drives and smart media cards. Although the term memory chip is used, computers use memory modules rather than single memory chips. Think of a memory module as a circuit board that con-
78
Chapter 3: Talking Intelligently About Computers
tains many bytes of memory. For example, a dual in-line memory module (DIMM) can have 1 gigabyte of memory. There are also small outline dual in-line memory modules (SODIMM) with a capacity of up to 1 gigabyte of memory.
Types of RAM Here are common types of RAM: –– Static random access memory (SRAM): SRAM is used for cache memory and does not require constant refreshing. –– Dynamic random access memory (DRAM): DRAM requires constant refreshing. –– Fast page mode dynamic random access memory (FPM DRAM): FPM DRAM waits for processing of locating data to be completed before reading the next bit. The transfer rate is approximately 176 million of bits per second (MBps). –– Extended data-out dynamic random access memory (EDO DRAM): EDO DRAM is similar to FPM DRAM except it does not wait for the processing to be complete before reading the next bit. The transfer rate is approximately 264 million bits per second (MBps). –– Synchronous dynamic random access memory (SDRAM): SDRAM is the most commonly used RAM because it assumes that data needed by the process is in sequence in memory. The transfer rate is approximately 528 million bits per second (MBps). –– Double data rate synchronous dynamic random access memory (DDR SDRAM): DDR SDRAM has greater speed with a transfer rate of approximately 1,064 million of bits per second (MBps). –– Rambus dynamic random access memory (RDRAM): RDRAM uses a high-speed data bus (Rambus channel) with a transfer rate of approximately 1,600 million of bits per second (MBps). –– CMOS RAM: CMOS RAM is used to store the BIOS and maintains its contents with a small battery when the computer is powered down. –– Synchronous graphics random access memory (SGRAM): SGRAM is memory used for video memory.
Types of ROM Here are common types of ROM. –– Programmable read-only memory (PROM): PROM is a ROM chip that can be programmed once. A blank PROM chip is placed in a programming device that modifies the circuitry in a process called burning that permanently sets values in the circuitry on the chip. Engineers use PROM for prototyping ROM. That is, they burn a PROM and then install it in a computer to see if the data burned into the
Memory
79
PROM works. If so, then the design used for the PROM is used to create a ROM chip. If not, then the data is changed and the process begins again. –– Erasable programmable read-only memory (EPROM): Circuitry in EPROM can be changed multiple times using ultraviolet light. EPROM retains its content when the computer is powered down. EPROM requires that the entire content of the chip be erased and then rewritten. Furthermore, EPROM chips must be removed from the computer to be rewritten. –– Electrically erasable programmable read-only memory (EEPROM): EEPROM is similar to EPROM except electricity—not ultraviolent light—is used to remove and write the content of the chip. Selected contents can be erased without the need to remove the chip from the computer. –– Flash memory: Flash memory is a type of EEPROM that erases blocks of the chip and writes data into those blocks, increasing the time it takes to change the content of the chip.
Video Memory A set of RAM is allocated for video memory. Think of video memory as a group of switches that mimic your screen. The image on the screen is composed of picture elements (pixels) much like how tiny dots are used to create an image on a printed page. Each pixel is made up of three sub-elements, one for red, green, and blue colors— each represented by a byte. 24 bytes are needed for each pixel. The value of the byte determines the intensity of the color. Adjusting the intensity collectively creates up to 16.8 million colors. Video memory is organized identically to pixels on the screen, from left to right (Figure 3.8). Setting the byte value of the first three memory locations sets the corresponding pixel location on the screen. The length of the row of pixels and the number of rows define the resolution of the screen. There must be an equal number of video memory bytes to accommodate the screen resolution. A program called a display driver constantly reads video memory transforming bytes of video memory into pixels to create an image on the screen.
80
Chapter 3: Talking Intelligently About Computers
Figure 3.8: Video memory mimics the pixel layout on the screen. The display driver translates the contents of video memory into pixel settings on the screen.
Inside a Memory Chip A memory chip contains millions of transistors and capacitors. In dynamic RAM, which is the most commonly used memory chip, a transistor is paired with a capacitor to form a memory cell that stores one bit (Figure 3.9). Well, not really one bit. Remember, a bit is a binary digit. We’re assigning each binary digit the state of an electronic circuit. The capacitor stores electrons. The transistor is an electronic switch. When writing data to memory, the transistor either allows electrons to flow into the capacitor or allows electrons to flow from the capacitor. The value of the bit is 1 when the capacitor contains electrons and 0 when the capacitor is empty.
Figure 3.9: A transistor is paired with a capacitor to form a memory cell that stores one bit of data.
Memory
81
One problem is that electrons leak from the capacitor and that can lead to the capacitor being empty in a few milliseconds. This means a 1-bit value can change to a 0-bit value in less than a blink of an eye. Engineers overcome this problem by creating another circuit called a memory controller that rewrites memory cells thousands of times per second. This is referred to as refreshing, which is why the term “dynamic” is used when referring to dynamic RAM. Circuitry in a memory chip is in the form of a grid (columns and rows). Each intersection is a memory cell. Columns are referred to as bitlines and rows are referred to as wordlines. The intersection of a bitline and wordline is the address of the memory cell. An electrical charge is sent to a column, activating the transistor. An electrical charge sent to the corresponding row sets the state of the capacitor when data is written to memory. When reading memory, circuitry—called a sense-amplifier—measures the charge at the desired capacitor. If the capacitor is at 50 percent charge, then the sense-amplifier sends a signal to the processor indicating that the bit value is 1. A charge less than 50 percent is reported as a bit value 0. Values of all intersections reset to off when the computer powers down. The task of reading and writing a memory cell is measured in nanoseconds (billionths of a second). You’ll see this as a rating for memory chips. ROM uses a diode to make the connection in the grid. A diode is an electronic component that allows current to flow in one direction if the current reaches a threshold. This is like pushing a light switch hard enough to make the connection to turn on the lights. Values in ROM are set when the chip is created, which is why the contents of ROM remain unchanged when the computer powers down.
Memory Errors: Few and Far Between Memory chips seem to never fail, but that’s not necessarily true. The manufacturing process used to make memory chips has a narrow tolerance. Tolerance is the acceptable measurement that is different from the design specification. That is, if the memory chip still works even if it is a little off the specification, this is considered “within tolerance.” Memory chips that fall outside the tolerance never make it into a computer. Memory in your computer—and your computer itself—will fail at some point and under some conditions. The risk is small, but it could happen. Computer and component manufacturers specify the mean failure rate for their product. The mean failure rate is the average amount of time of usage before the product is likely to fail. This is a purchase consideration when deciding which product to purchase. The mean failure rate for most electronic products is very low for normal operations because of the narrow tolerance of electronic components. The memory controller tests each memory cell each time the device is powered up. Any errors are noted and the memory controller makes sure that the process never
82
Chapter 3: Talking Intelligently About Computers
uses memory cells that failed the startup test. Memory chips could fail during operation, but the risk is low and has little impact because the memory chip failure will be noted the next time that the computer starts up. However, more protection is required for computers used for mission-critical applications such as those used by banks and ecommerce websites. These computers may not be restarted frequently and a memory chip failure could cause loss of important data. Mission-critical computers use the error-correction code (ECC) form of error checking. Several bits of memory are used to detect errors. An error is fixed once it is detected without restarting the computer. Cache Memory and the Pizza Connection Cache memory, simply referred to as cache, is a technique of keeping items you need frequently close to your work area rather than in the backroom. Say that you operate a pizza restaurant. During busy periods, you need pizza boxes near the oven rather than in the back so you can quickly access boxes. This is considered as having a cache of pizza boxes. Nearly all your pizza boxes are in the back yet a relatively small number of boxes are in the cache. The size of the cache is critical to the successful operation of the store since you must keep those pizza pies coming. Let’s say that you have room for twenty pizza boxes by the oven. The size of the cache is twenty. The more room you have (and the larger the cache size), the less time is spent going to the back for more pizza boxes and the faster you can process the pies. The same is true about cache memory, except bytes of information, rather than pizza boxes, are stored in the cache.
BIOS and Starting the Computer When starting the computer (booting), the BIOS takes over to check that everything is working properly and loads programs that run your computer. The complementary metal-oxide-semiconductor (CMOS) is a memory chip that contains the BIOS. The CMOS is powered by a small battery on the motherboard or built into the chip itself that lasts for ten years or more. You won’t be able to boot the computer if the battery dies. Both a new battery and reinstalling the BIOS onto the CMOS is necessary to get your computer up and running again. The first thing that occurs is the BIOS determines if the boot is starting from scratch (cold boot) or if it is a boot in response to a problem that occurred while the computer was running (reboot or warm boot). During a cold boot, the BIOS determines if each RAM address as well as input/output ports are working properly and makes sure that peripheral components (such as the keyboard and mouse) are connected. The BIOS causes a beep and an error message to be displayed on the screen if there is a hardware problem. If components pass the BIOS test, then some details of your system are displayed on the screen, such as the processor and memory. The BIOS continues the boot process by looking for the operating system on the first permanent storage device encountered. Usually this is the hard disk but could be
The Operating System
83
a CD left in a CD drive. If the operating system is located, then the BIOS loads the operating system into memory—otherwise, an error message is displayed on the screen.
Changing BIOS Options You have the opportunity to change BIOS options once the BIOS loads into memory. Commonly available options include resetting the system clock to the proper date and time, activating or deactivating plug-and-play, auto-detecting and enabling or disabling peripherals such as the keyboard and mouse. Plug-and-play is a feature that enables the operating system to locate a new device attached to the computer and install the appropriate programs that enable it to work—all behind the scenes. Two other settings include setting a password to access the computer and using power management features of the computer, which set the idle time period for the power management feature to kick in. Play it safe and leave changing BIOS options to the IT technicians if you need to change them. For all intents and purposes, the default settings are sufficient for good performance.
The Operating System An operating system is a group of programs that enables the computer to do many things. Think of the computer as a box of switches, and the operating system transforms those switches into what we know as a computer. You know an operating system such as Windows, Mac OS, Linux, or Unix, but there are countless other operating systems for specialized computing devices, such as real-time control systems that control manufacturing equipment or Android for your cell phone. Each operating system is similar in that it organizes how programs run, controls hardware devices connected to the computer, and how users interact with the computer. Yet each is different in the way it accomplishes these tasks. The operating system essentially runs the computer. Here are some things that the operating system manages: –– The processor: The operating system tells the processor what instructions to process. –– Memory: The operating system logically organizes memory for multiple uses, such as for the screen and for printing. –– Input/Output: The operating system controls sending information to the screen and the printer, and receiving information from the keyboard, mouse, and touch screen. –– Programs: The operating system loads, organizes, and runs programs, enabling the computer to perform complex decisions.
84
Chapter 3: Talking Intelligently About Computers
The heart of the operating system is a program called a kernel. The kernel is a program that controls the operation of the computer. The kernel manages hardware and program requests to have the processor execute instructions. The kernel translates instructions in a program into data and instructions that the processor understands (using the instruction set). The operating system also contains other programs such as a media player, fax/ scanner programs, and a notepad that enhances user experience. One such program is called a shell. The shell is the interface between you and the computer. It is the screen that opens after the computer boots. Windows and Mac OS have a graphical user interface (GUI) shell, enabling you to drag and drop information using the mouse and to click buttons to interact with the computer. Windows also has the command prompt shell that requires you to type commands rather than using mouse clicks to interact with the computer. For the IBM PC, a command prompt was the original shell for the DOS operating system, which was used before GUI came along. The DOS command prompt is still used today by programmers and networking professionals who can use it to efficiently solve problems such as working on the registry. You can open the command prompt in Windows by typing “command prompt” in the search program and file box, and then selecting “Command Prompt” to open the command prompt window. Here is the command prompt where you enter commands next to the arrow: Microsoft Windows [Version 10.0.17215.345] (c) 2018 Microsoft Corporation. All rights reserved. C:\Users\keogh> The Unix operating system preceded DOS and is still widely in use today, mainly as Linux in several variants. Linux is particularly popular with the open-source (free access) community, and technical and academic professionals often use the command line for its efficiency and flexibility, though GUIs offer many advantages for the less technical user. The Linux kernel has spawned several offerings, including several distributions of the operating system—Red Hat, Fedora, Debian, and Ubuntu, to name a few—that offer both GUI and command prompt options. They offer different types of shells as well, including the C shell, the Korn shell, the bash shell, and so on, which are all similar, but have their differences. The operating system manages computer components’ buffering programs from the details of interacting with the hardware. For example, the programmer writes a program to print information. The operating system makes this happen without the programmer knowing anything about the printer. This enables programs to run on the same type of computer with little or no difficulty, even when there are hardware upgrades.
The Operating System
85
There are several types of operating systems. These are: –– Real-time Operating System (RTOS): A real-time operating system is used to control complex machinery and has a skimpy user interface that enables the user to interact with the machine. –– Single-user, Single Task: A single-user, single task operating system is designed to enable one person to do one thing at a time and is used to run a small handheld computer. These are commonly handheld scanners used to scan products for inventory audits. –– Single-user, Multitasking: A single-user, multitasking operating system is the most commonly used operating system on personal computers such as Windows and Mac OS. It enables one person to do multiple things at the same time on the computer. –– Multi-user: A multi-user operating system enables many users access to the computer at the same time and is used on mainframe computers and shared computers called servers. Linux and Unix are multi-user operating systems. Linux and Unix Linux and Unix are similar operating systems. Unix came first in 1969 when it was developed by Ken Thompson, Dennis Ritchie, and other Bell Labs employees. Linux was created by Linus Torvalds in 1991. Both are based on the same general concept that each program of the operating system should do one thing very well and that you can link together programs to perform a complete task. Let’s say that you want to sort the contents of a file and then save the sorted file under a different name. There are three steps: read a file; sort a file; and save a file—each is a program. Instead of a programmer writing steps to perform, sort, and save the sorted file, the programmer links the Unix programs that perform each subtask together using a process called piping. Piping sounds technical but it is simply using the output of one program as the input to another program. So the initial read program output becomes the input to the sort program, and the output of the sort program becomes the input to the program that saves the file. Both Linux and Unix are robust operating systems that run many of the servers used for the internet and intranet. These operating systems have practically everything you need to create your own intranet on your laptop! And the best aspect is that they can be downloaded at no cost. Stop! Linux and Unix are a bit more involved than Windows and Mac OS, so you may want to let the IT technicians deal with these operating systems.
Behind the Scenes of Running a Program Big brother is watching us (kind of) when we use Windows or Mac OS. The opening screen, called a desktop, has icons that represent the programs. However, quietly behind the scenes the kernel is “watching” for signals from the processor and other programs running in the background, and from external devices such as the keyboard, mouse, and touch screen, or from the network connection.
86
Chapter 3: Talking Intelligently About Computers
The kernel detects when you select on icon on the desktop and then sends an interrupt instruction to the processor, signaling the processor to stop and execute an instruction at a specific memory address. The interrupt instruction likely tells the processor to load the selected program from the disk drive into the memory if the program isn’t as yet in memory, and then the processor is told to fetch and execute instructions beginning at that memory address. The result is the display of the program’s opening screen—at least, that’s what you see. Much happens behind the scenes, such as copying instructions that make up the program into specific memory addresses; copying data such as default settings into specific memory addresses; and copying part of the data that makes up the screen into video memory. Video memory is RAM set aside for data (pixel settings) used to display the image on the screen. And then, the kernel continues surveillance for other requests.
Interrupts The kernel can tell the processor to stop what it is doing at any time by sending the processor an interrupt signal. There is more to this than simply saying “stop!” There are many kinds of interrupts, such as to stop printing; a signal that the printer is out of paper; and pausing a process and processing a higher priority instruction. Each interrupt is identified by an ID and a memory address that contains a small program called an interrupt service routine that is executed in response to the interrupt signal. The interrupt ID and the address of the interrupt service routine are contained in a tiny two-column spreadsheet called an interrupt vector table that is loaded into memory along with the interrupt service routine when the computer starts up (Figure 3.10).
Figure 3.10: The interrupt vector table contains the interrupt ID and the address of the interrupt service routine.
The Keyboard
87
The processor receives the interrupt, it stops what it is doing, and then retrieves and executes the instruction at the interrupt service routine address. It then continues to execute subsequent instructions until the end of the interrupt service routine is reached, at which time the processor returns to the previous processing that was paused when the interrupt was received. The interrupt vector table is vulnerable to hackers who replace the interrupt service routine with their own interrupt service routine that runs malicious instructions and then runs the real interrupt service routine so you are unaware of intrusion. This goes undetected until you notice a delay running the interrupt service routine or until security software identifies the malware. The Operating System and the Program: A Special Relationship An operating system such as Windows appears to control everything on the screen. The screen can be filled with one or more programs, each having its own window on the screen depending how you arrange the screen. The cursor moves from window to window and goes from program to program, and at any moment you can click a button to interact with a program or change the size of the window. Behind the scenes, the operating system is a messenger receiving input from the mouse and then identifying the position of the cursor on the screen. This information is broadcast to all programs running on the computer. Each program compares the cursor location with the location of its window on the screen and reacts appropriately if the cursor is within its window. The message is ignored if the cursor is outside its window. Buttons, scroll bars, drop-down lists, and other graphical user interface objects that appear on the screen are nothing more than images. When you click a button, for example, you are simply moving the cursor over the image of the button and pressing the button on the mouse. The operating system reads the cursor position and identifies which mouse button was pressed, then broadcasts this information to all programs. The cursor position identifies not only the window of the program but also the object (or button) within the window. The corresponding program changes the image to make it appear that the button on the screen is pressed, then executes the corresponding action. Once executed, the program waits—as do all running programs—to read the next message from the operating system. Programmers must write instructions within the program to respond to any message sent by the operating system. For example, when you move a window on the screen, the operating system sends multiple messages that require each program to refresh its screen as the window moves over other windows. Programs do all the work while the operating system reads input signals from the mouse, keyboard, and other input devices, and then broadcasts the information to all programs.
The Keyboard The keyboard may or may not be part of your computer. Keyboards are fixed to a laptop but on a desktop. The keyboard is called a peripheral and is attached to the computer through an input/output port such as a USB port. The first time the keyboard is connected, the operating system needs to load a device driver. The operating system does this automatically if the computer uses plug-and-play—otherwise, the
88
Chapter 3: Talking Intelligently About Computers
operating system has to load the device driver from a disk provided by the keyboard manufacturer or retrieve the device driver from the manufacturer’s website or from a reputable alternative. The keyboard has characters that appear on the screen, called printable characters, and characters that are non-printable, such as the space bar and the return key. There is more going on than meets the eye when you press a key on the keyboard. Keys on the keyboard are identified using a key matrix. Think of a matrix as a grid (Figure 3.11). The location of the key on the grid identifies the key. Each key has a value and a state. The state is either up or down (off or on). The return value of the key is compared to a character map to identify the actual character on the key. Think of the character map as a look-up table.
W
E
R
S
D
F
A
X
C
Key Matrix Figure 3.11: Keys on the keyboard are identified using a key matrix. Each key value and state are sent to the program by the operating system.
There are three states of a key on the keyboard. The key can be pressed, released, and held down. When you press down a key, the state of the key (key pressed) and the value of the key are temporarily stored in keyboard memory, referred to as a keyboard buffer, then sent to the operating system for processing. The program that is running determines the action that the processor takes when the key is pressed. For example, a word processor displays a lowercase character on the screen at the cursor position, then moves the cursor to the next cursor position when a printable character key is pressed on the keyboard. However, if the key is held down, then the word processor repeats printing the key and moving the cursor position until the key is released. The state of the key and the value of the key are removed from the keyboard buffer once the keystrokes are processed. Actually, a little more is going on. When the key is pressed, the operating system reads instructions from the program to determine how the program wants to handle
The Mouse
89
the keystroke. The processor then sends an interrupt to the processor to execute the program instructions, which in this case is to convert the character into pixel data and move the pixel data to the current position in video memory. The keyboard buffer plays an important role in using control keys. A control key such as the Shift key is stored in the keyboard buffer. Keystrokes are not processed until one or more keys are pressed. Pressing the Shift key followed by the “J” key causes a capital “J” to be displayed at the current cursor position and moves the cursor one position to the right. Other control keys cause the operating system to do something other than display a character. For example, with the Windows operating system you can restart the computer (warm boot) as one option when the Ctrl-AltDelete keys are pressed simultaneously. The combination of Ctrl-Alt-Delete keys is used on a Windows computer to display a menu where you can choose to perform a number of utility functions. These are to lock the computer, switch users, log off the computer, change a password, or start the task manager to explore applications and processes running on the computer. Programmers have a great deal of control over how keystrokes are interpreted. For example, keyboards have programmable keys called function keys, usually found at the top of the keyboard. A programmer can have a program do practically anything when a specific function key is pressed. Likewise, some operating systems such as Windows enable users to create shortcuts to frequently performed tasks and assign them to function keys.
The Mouse The mouse is a pointing device that enables you to move the cursor by moving the mouse. The program positions the cursor at a location on the screen. The mouse sends the operating system a signal to move the cursor along the Y-axis (up/down) and the X-axis (side to side) based on the movement of the mouse. –– A ball mouse has a ball and wheels—a Y-axis and an X-axis. Each wheel has plastic spokes that break a light beam as it turns. A microchip inside the mouse counts the number of times the light beam is broken, and sends a signal to the operating system that moves the cursor a corresponding distance. –– An optical mouse shines a light from a light-emitting diode (LED) to the desktop. The light is reflected into a photoelectric cell at the bottom of the mouse. A microchip in the mouse detects changes in the pattern of reflected light, and then translates the pattern into a signal sent to the operating system. The operating system moves the cursor in the corresponding direction. The mouse usually has one or two buttons. The microchip in the mouse detects when the button is pressed; when the button is released; and when the button is held down. Each of these states is sent to the operating system that notifies the program. The
90
Chapter 3: Talking Intelligently About Computers
program usually has instructions to respond to each state. Sometimes the mouse has a scroll wheel. The microchip in the mouse detects the direction of the wheel and sends a corresponding signal to the operating system. The operating system notifies the program and the program usually sends instructions to the processor to scroll the screen.
Touchscreens There are five types of touchscreens (Figure 3.12). –– Resistive Touchscreen: A resistive touchscreen, commonly used in supermarket checkouts, enables you to tap the screen to enter a selection. The screen is composed of two thin, flexible layers of metal separated by a gap. Each layer has a small electronic current running through it. When you touch the screen, the top layer touches the bottom layer, interrupting the current. A microchip in the touchscreen detects the point of contact and sends the operating system the coordinates of your finger. Actually, anything can be used to press the screen, not only your finger. You probably noticed that the image on these screens is not as clear as your computer screen. This is because the image is behind the layers. –– Capacitive Touchscreen: The capacitive touchscreen has a transparent electrode layer on top of a glass panel protected by a protective layer. The image appears on the glass. There are sensors in each corner of the screen. When you touch the screen with your finger, any exposed part of your body, or an electrically charged stylus, a small electronic charge moves from the screen to your finger. Sensors detect the location where the charge is transferred and then send the corresponding coordinates to the operating system that in turn notifies the program. –– Projected Capacitive Touchscreen (PCAP): PCAP touchscreens work similarly to the capacitive touchscreen but have several advantages. First, the PCAP touchscreens work even when a person is wearing surgical gloves or thin cotton gloves. These touchscreens also will recognize multiple touches simultaneously. –– Infrared Touchscreen: An infrared touchscreen uses an invisible grid of infrared emitters and receivers along the sides of the screen. Breaking the beams (Y-axis and X-axis) with anything identifies the touch point on the screen. Coordinates of the touch point are sent to the operating system and in turn to the program. –– Surface Acoustic Wave (SAW) Touchscreen: The SAW touchscreen uses an invisible grid of ultrasonic waves generated by circuits, called a transducer, installed on the edges of the screen. Touching the screen with anything absorbs the ultrasonic wave at the location detected and the touch point is sent to the operating system, then to the program.
Permanent Secondary Storage
Infrared touchscreen
Resistive touchscreen
Clear polyester Air gap and film spacers
91
Vertical and horizontal infrared beams
A finger touches the screen and disrupts the beams.
Glass
Capacitive touchscreen
Sensor traces Lens and casing
Projected capacitive touchscreen Electric field Protective cover
LCD
Electrode pattern layer
Surface acoustic wave (SAW touchscreen) Piezoelectric emitters
Transparent electrode layer X
Edge reflector pattern
Transparent electrode layer Y Glass substrate
The X, Y location of the point of contact is calculated by the controller and transmitted to the PC.
Microphone array Glass substrate
Figure 3.12: There are five types of touch screens, each enabling you to interact with a program by using your finger.
Permanent Secondary Storage Memory is considered primary storage. Secondary storage devices are disk drives, flash drives, and CD drives that retain data after the computer is powered down. All permanent secondary storage devices connect to the computer’s motherboard through an input/output port. These devices are also referred to as removable storage because the device can be removed from the computer. Even a disk drive inside the computer can be removed. The hard drive is still a common secondary storage device that uses a combination of electronic and mechanical technology to write and read information from a platter commonly called a disk drive. The disk drive contains multiple disks, each coated with iron oxide. Each disk is logically organized into tracks that form concentric rings. Each track is divided into sectors which is where data is stored. There is a mechanical arm that moves read/write magnetic heads above each disk as disks rotate at approximately 10,000 revolutions per minute (rpm). There are two read/write magnetic heads on the arm: one for the top of the disk, and the other for the bottom of the disk to read/ write to both sides of the disk.
92
Chapter 3: Talking Intelligently About Computers
The way data is organized on a hard disk is called a file system. The New Technology File System (NTFS) is a commonly used file system by Windows. The disk can be divided into partitions, with each partition having its own file system; however, there is one partition and one operating system on most disks. A disk must be formatted before it can be used to store data. Formatting is a process used by the file system to organize the disk. Initially, each track and sector is tested to determine if the sector can successfully store data. The file system makes note of sectors that fail the test and avoids using those sectors when writing data to the disk. The file system then creates all the necessary file system files to manage the disk. The initial sector on the disk is called a boot sector and contains technical information about the file system. There is a master file table stored on the disk that contains information about files such as the file name, file type, timestamp, and access control, along with the track and sector that contain the first byte of the file. The boot sector also has metafiles that contain the transaction log, root directory, and information on available space on the disk. The transaction log is a file that lists changes made to the file system such as changing the file name or access to a file. The root directory is a list of files and folders that are on the disk. File folders and file names that are displayed on the screen have little to do with how files are actually stored on the disk because these are logically grouped in the master file table to make it easy to read on the screen. File names point to the location of the file on the disk.
Saving Data to a Disk When the processor receives instructions to save a file, the processor sends a stream of data through the input/output port that contains the disk drive. Behind the scenes, the file system takes pieces of the streaming data and writes those pieces to the next available track/sector based on information in the master file table. The file name and other information about the file are saved in the master file table along with the address of the sector that contains the first piece of the file. When the disk is opened on the screen, you see the list of files from the master file table. Selecting a file causes the file system to read data from the sector that contains the first piece of the file. The last piece of data in that sector contains the location of the sector that contains the next piece of the file or a code that indicates the end of the file. The file system sends data from sectors containing the file to the processor for processing until the end-of-file indicator is encountered.
Permanent Secondary Storage
93
Important Measurements of a Disk Drive There are three important ways to measure a disk drive. These are: –– Capacity: Capacity is the number of bytes that can be stored on the drive. A word of caution: some manufacturers use half of capacity as a backup disk, automatically making a backup copy of files. This means only half the advertised capacity is used for your files. –– Data Rate: Data rate is the number of bytes per second that the drive can send to the processor. Disk drives with a relatively high data rate are costly. Most desktop computers don’t require a high data rate. –– Seek Time: Seek time is the time in milliseconds the disk drive takes to find data on the disk. A fast seek time is usually offered as a premium price, which is not required for most desktop computers.
Deleting a File If you have ever moved a file to the trash, you probably know that the file isn’t deleted. Placing a file in the trash tells the file system to mark the file as deleted in the master file table. That is, the file name and other information about the file won’t appear in directories displayed on the screen. You can always go to the trash to see and retrieve deleted files. When you retrieve a deleted file, the file system unmarks the file as deleted. It then appears in directories displayed on the screen. When you empty the trash, files in the trash are deleted, but not necessarily. Emptying the trash causes the file system to remove the file name from the trash and mark sectors used by the file as available. Data remains in each of the file’s sectors until the sector is used for another file. Forensic tools can sniff through sectors trying to reassemble the file. The sooner the sniffing occurs after the trash is emptied, the better chance there is to recover all or pieces of the file. The longer the wait, the greater the chance that some or all of those sectors will be overridden by new files. Disk cleaning programs are used to permanently delete files from a disk. Disk cleaning programs rewrite all sectors, setting it to zero. No data can be retrieved. Organizations use a disk cleaning program before disposing of the disk to ensure no data can be retrieved.
Disk Fragmentation Disk fragmentation can occur over time when many files are deleted and new files are written to the disk. In the ideal world, sectors used for a file are relatively close to each other to reduce the time it takes for the read/write head to move to a sector. In the real world, the file system tries to use sectors that are close to each other, but may have
94
Chapter 3: Talking Intelligently About Computers
no choice than to use distant sectors. This is referred to as disk fragmentation, where pieces of a file are in sectors located all over the disk. Some file systems will periodically defragment the disk by rewriting files to closer sectors. This is like periodically reorganizing stacks of papers on your desk. File systems also have programs that you can manually run to defragment your disk when there a decrease in response rate from the disk drive. File Compression You probably are aware of zip files. They are used when an attachment is too large to send in an email. Windows also has an option to “send to” a compressed (zip) folder. A zip file is a compressed file where duplicate information is stored as a type of shorthand. Take a look at this sentence: Mary and Bob took the car to the car wash before Mary and Bob parked the car under the car port. –– “Mary” appears twice. –– “Bob” appears twice –– “car” appears four times –– “the” appears four times –– “and” appears twice Conceptually, an abbreviation can be used for each subsequent appearance of the word, reducing the amount of space needed to store the sentence. Instead of saving all letters of the word, you save the abbreviation. A program can unzip the file, replacing the abbreviations with the actual word when the file is displayed. Technically, Bob, Mary, and other words really don’t exist in the file. They are represented by 0s and 1s, as you learned in Chapter 2. The zip program looks for a series of 0s and then replaces all but the first 0 with an abbreviation that indicates the number of 0s that were removed. For example, if there was a series of five 0s, the abbreviation removes four of them and indicates that four 0s were removed. They are restored when the file is unzipped. The same happens with a series of 1s.
Compact Disc (CD) and Digital Optical Disc (DVD) A compact disc (CD) is referred to as optical storage because a laser in a CD drive reads bits of data encoded on the CD. The surface of a CD is a mirror that contains a spiral of data in the form of microscopic bumps. The reflection from the laser is detected by optoelectronic circuits when the CD is read. The optoelectronic circuits are able to detect the difference between the reflection from a microscope bump and no bump. These differences are recognized as 0s and 1s. Some CDs are recordable, called CD-Rs (write once), and others are rewritable, called CD-RWs (rewrite multiple times). The CD-R has a layer of dye on the disc. The recordable drive uses a laser to heat tiny spots on the CD, causing the dye to darken. When read, the optoelectronic circuits consider the darkened area as a microscopic bump. CD-R discs are not rewritable. CD-RW discs are rewritable because of the material used on the surface of the disc. The CD-RW drive has three temperature settings for the laser. At one tempera-
Monitors
95
ture, tiny areas of the surface become very reflective, and at another temperature, the surface becomes dull. A third temperature is used to read reflections from the surface. Reflections (1) and dullness (0) are treated as bits by the optoelectronic circuits. Digital optical discs (DVDs) work on similar principles.
Flash Storage Flash memory is referred to as a flash drive or solid state storage, which is used like a disk drive rather than computer memory. Flash memory is faster than a disk drive because there are no moving parts. A disk drive uses a mechanically-driven arm to position the read/write head over the area of the disk that contains the information. The mechanical movement slows the response time compared with flash memory that reads/write data at the speed of electricity.
Monitors You cannot run a computer without a monitor. This is true even for large mainframe computers that use at least one monitor and keyboard to interact with the computer. A monitor is an output device, although some touchscreen monitors work double-duty as an input device too. The most commonly used monitors use liquid crystal display (LCD) technology. Monitors are measured by their resolution—the number of pixels that make up the image. The higher the number of pixels, the higher the monitor’s resolution and the sharper the image appears on the screen. A monitor’s resolution is described by two numbers. These are the number of pixels on each row (horizontal axis) and the number of pixels in each column (vertical axis), and is written as “row x column,” such as 1280 x 1024, which is 1,280 pixels per row and 1,024 pixels per column. The highest resolution is 2560 x 2048 but that’s likely to change in the future as monitors increase in size. The aspect ratio is another measure of a monitor. The aspect ratio is the ratio of the width and height of the screen written as “width:height,” such as 4:3, the width being wider than the height. No matter what measurement is used, the aspect ratio needs to be the same. You can use 4 inches:3 inches or 4 feet:3 feet and the image will look proportional. You probably noticed the impact of the aspect ratio in two familiar situations. The first is when you adjust an image in PowerPoint or any graphics program. If you grab and drag the corner of the image, it scales up or down proportionally. However, dragging a side of the image causes the image to become distorted. This is because you changed the aspect ratio of the image.
96
Chapter 3: Talking Intelligently About Computers
You probably also noticed the aspect ratio when watching a movie in a wider-screen format than your television. This is referred to as letterbox or cinematic. It looks as if the top and bottom of the screen was chopped off. What you are seeing is a difference in aspect ratios. The standard TV aspect ratio is 4:3 and the movie aspect ratio is about 21:9. It’s a mismatch, so the broadcaster has the choice of chopping off the left and right sides of the movie, or shrinking the movie, leaving a top and bottom black margin. You probably don’t notice this problem if you use a new TV or computer monitor, since they are widescreen with an aspect ratio of 16:9, which makes letterbox less noticeable. The screen size of a monitor can be deceiving at times. The screen is called the projection surface, where the image appears. The screen size is measured diagonally from the inside beveled edge of the monitor (Figure 3.13). The more common screen sizes range from 12 inches to 17 inches but can go well beyond 20 inches. Don’t assume that you will see a better image on a larger screen. A sharp image depends on the screen size and the resolution. You’ll need to have high resolution for large screen sizes to maintain a sharp image. Low resolution on a large screen causes the image to be fuzzy.
Visual height
Visual width
Figure 3.13: The screen size is measured on a diagonal from inside the beveled edge of the monitor.
The number of colors that can be displayed by a monitor depends on the color depth of the graphic circuit board, referred to as the graphics adapter, sometimes referred to as the video adapter, which connects the monitor to the motherboard. Color depth is the number of bits used to represent a pixel. Today, a 24-bit color depth is most commonly used to produce true color. High color depth may seem impressive; however, we can recognize only true colors. True colors are colors that you can see as different colors. A 24-bit color depth actually produces about 16.8 million colors but most people can only recognize about 10 million colors. Colors are created by changing the intensity of red, green, and blue elements on the screen to create one pixel. The intensity of each element is determined by the value of one byte of video memory. The highest bit color depth available today is 32-bit, which is used for animation and digital video. A
Monitors
97
32-bit color depth produces 16.8 million colors; however, the additional bits are used to control the lighting on the image, much like how photographers adjust the lighting to make a photograph look realistic.
Inside Look at LCD Monitors The LCD monitor is a sandwich. The bread is polarized glass called a substrate, and the meat is a liquid crystal material. Light called a backlight shines through the bottom polarized glass. Electronical current controlled by the video adapter changes the state of the liquid crystal material to allow some light to pass through the liquid crystal material to the top polarized glass to create an image on the screen. The polarized glass contains a matrix of thin film transistors (TFT) and capacitors. This is referred to as active matrix technology. Each intersection of columns and rows of the matrix is a pixel. As seen earlier in this chapter, a transistor is a switch that when turned on allows electrons to enter the capacitor, turning the pixel on. The screen is black at first because none of the capacitors contain electrons. Transistors are activated and deactivated when a program is running to change images on the display, causing capacitors to be charged. Capacitors begin to leak electrons almost immediately after being charged, which is why the video adaptor resets the charge many times a second in a process called the refresh cycle. Each LCD monitor has a native resolution, which is the best possible resolution for the monitor. You can change the resolution settings, however, and the image will be scaled and may not fit the entire screen. It will look different than normal. A common complaint IT techs hear when the resolution increases beyond the native resolution is that more information appears on the screen but it is smaller and at times difficult to read. There is nothing the IT tech can do to fix this problem. The brightness of the LCD monitor can be adjusted based on what you are watching. For example, watching sports is greatly improved with a bright image on the screen. The brightness is referred to as luminance and is measured in nits. A nit is one candela per square meter (cd/m2). The monitor is rated by a brightness range. A range of 250 cd/m2 to 500 cd/m2 is ideal if the monitor is being used for normal work (240 cd/ m2) and to watch sports or movies (500 cd/m2). Another important measurement of an LCD monitor is the contrast ratio. The contrast ratio is the difference between bright whites and dark blacks, which makes the whiter colors stand out more from the darkest colors on the screen. A common contrast ratio is 450:1, which means that the brightest white is 450 times more than the darkness black. Although there are monitors with high contrast ratios, any ratio greater than 600:1 is not noticeable. You probably noticed that looking at the screen from the side distorts the image. The angle at which you view the image is called the viewing angle. Each monitor has a defined viewing angle by the manufacturer within which the image can be viewed
98
Chapter 3: Talking Intelligently About Computers
without distortion. Not all viewing angles are measured the same, so it is best to test the viewing angle of the monitor. The response rate is a factor considered when evaluating an LCD monitor. The response rate is the speed at which pixels are changed on the screen. Look for a fast response rate if you plan to view videos on the monitor. A lower response rate can cause a ghosting effect when images move quickly on the screen.
Landscape, Portrait, or Multiple Monitors Most of us use one monitor in the landscape position, where the width is longer than the height. However, sometimes changing the monitor to portrait, where the height is longer than the width, makes working more efficient. LCD monitors are adjustable, enabling the monitor to be rotated into the portrait position. This might look strange but it is functional, especially if you are writing long documents and you don’t want to scroll down the page. Although the monitor can be rotated to the portrait position, you’ll still need to change the image to portrait until the monitor detects and tells the operating system about the rotation. The CTRL + ALT + arrow keys can be used to change the image on the monitor from landscape to portrait and back in Windows. Another effective approach to using an LCD monitor is to connect multiple monitors to the computer. This is like extending your desktop to other desks. Operating systems such as Windows treat all monitors connected to the computer as one monitor when the display setting is changed. Let’s say you have three monitors attached to the computer. Initially the window containing the program appears in the center monitor. You can drag the window off the screen to the left and the window leaves the center monitor and appears on the left monitor. Similarly, dragging the window off the screen to the right causes the window to appear on the right monitor. Computer programmers tend to use at least two monitors; one is rotated to the portrait position, which is used to display long lines of program instructions, and the other in landscape to display the output of the program when the program executes. The programmer can then move from screen to screen to change the program and then interact with the program when it executes.
Chapter 4 About Computer Applications Systems, applications, and programs are all too confusing at times, especially when technicians seem to use these terms interchangeably when speaking about getting the computer to do what you want it to do. Let’s clear up any confusion and set the record straight. A system is a way of doing something, such as a plan for winning at poker. It seems everyone has a system—some work and some don’t. What’s important about a system is that it may or may not require a computer. You don’t need a computer to have a system for winning at poker. Likewise, you don’t need a computer for an accounting system; there was a time when accounting was done with paper and pencil. It is common to refer to a computer as a system. It is a system of components that collectively process instructions and data. An application is a program or a set of programs that tell the processor how to do related tasks. A systems analyst automates a system by replicating the logic of the system in an application that runs on a computer. You need a computer to run an application, but not to use a system. Later in this chapter you’ll see how a system defined in pseudo code is translated into a program. Consider Word. Word is a word processing application that enables you to write a document; save a document; retrieve a document; and print a document, among other tasks. Each task can be considered a program within the Word application. There are computer applications for everything—some you see, and some you don’t see. Office tools such as Word and Excel are applications. Your organization’s accounting and payroll systems are applications. These are applications that fly an airplane and make your car work. You don’t see them and you don’t need to enter information because sensors send all the information that’s needed to the application. Still another type of application decides how to work without any human instructions, such as IBM’s Watson, which knows how to learn and beat humans at Jeopardy or chess. We tend to identify an application by what it does. Here are a few applications that you’re sure to recognize. –– A transaction processing application processes orders. Think of the point-of-sale (POS) system at the supermarket checkout. –– A business intelligence application is used to help managers make sense of data collected by other applications. –– A decision support application helps managers make decisions. –– An expert system provides expert advice without the expert. –– An office automation application improves workflow. Applications are also categorized by how information is processed. Many of today’s applications process while you wait. This is called real-time processing. You’ve seen DOI 10.1515/9781547400812- 04
100
Chapter 4: About Computer Applications
this at the supermarket checkout. Other applications are less time-sensitive, such as processing payroll that can be processed without anyone waiting. Similarly, your credit card company will send you a bill with all your transactions in a month (in a batch), rather than have you make payment upon each transaction. This is referred to as batch processing—information about each employee is batched and then processed at one time.
Application Architecture Technically, a programmer could write all the instructions for an application—all programs that make up the application—in one gigantic source code file. Source code refers to instructions written by a programmer (the programming code for the program). The processor has no difficulty executing thousands of instructions. However, putting it all in one file is impractical. No programmer wants to try to understand how the application works by reading the huge source code file, and so it is a nightmare to maintain the application. There are better ways of organizing an application, which are referred to as application architecture. Applications are divided into major components: parts that run on the user’s computer (the client side) and parts that work outside the user’s computer (the server side). The client side is responsible for the user interface, collecting and displaying data on the screen, and selecting reports to be printed. A good application design results in a minimum of instructions stored on the user’s computer. This avoids the hassle of having to update all the computers when there is a change in the application’s user interface. Newer applications are web-based, where the client side of the application is a webpage located on the organization’s intranet web server. The webpage is called using the browser. This is referred to as a thin client. There is no need to install any part of the application on the user’s computer because the client side of the application is retrieved from the web server each time the user runs the application. In contrast, older applications required part of the application to be stored on the user’s computer, referred to as a fat client. Every computer in the organization needs updating whenever the user interface of the application changes, which is a nightmare for technicians. The server side of the application contains all programs that handle processing. Although “server side” implies one server, typically there is more than one server, each of them running one or more components of the application. Each component is placed in the appropriate location in the architecture to deliver the most effective response time to users.
Application Architecture
101
Modern application architecture is divided into five components, referred to as layers (Figure 4.1). These are: –– Presentation Layer: The presentation layer is the user interface. –– Presentation Logic Layer: The presentation logic layer is the part of the application that generates the presentation layer. –– Application Logic Layer: The application logic layer is where logic, rules, and data are processed. –– Data Manipulation Layer: The data manipulation layer contains instructions that interact with the database. –– Data Layer: The data layer is where the data resides.
Figure 4.1: Application architecture layers.
Tier Architecture The application’s architecture is also organized into tiers, each of which is identified by a number. A tier is a logical grouping of major tasks. An example will clear up any confusion. Let’s say an application has the presentation layer and the presentation logic layer both residing on the client side. The remaining layers reside on the server side, on one server. This is referred to as a two-tier architecture. One tier has the presentation layer programs on all user computers, and the other tier has all the processing and data on one server. A three-tier architecture has three divisions, each on a different computer/server. The user interface on the user’s computer (Tier 1) connects to the application server
102
Chapter 4: About Computer Applications
(Tier 2) for processing. The application server connects to the database server (Tier 3) to retrieve, store, and manipulate data. Here are the components of the three-tier architectures (Figure 4.2): –– Tier 1: Tier 1 is the client layer that contains the user interface. –– Tier 2: Tier 2 is the business layer on the application server that contains processing instructions. –– Tier 3: Tier 3 is the database layer and contains the database management system (DBMS) (see Chapter 5) and the database on the database server.
Figure 4.2: A three-tier architecture divides the application into three divisions.
A four-tier architecture, referred to as the internet architecture, has four divisions. The browser (Tier 1) calls the application’s webpage from the web server (Tier 2) and then uses the application’s webpage to interact with the application. The web server (Tier 2) connects to the application server (Tier 3) to process information and the application server connects to the database server (Tier 4) to retrieve, store, and manipulate data. Components of the four-tier architecture are (Figure 4.3): –– Tier 1: Tier 1 is the client that accesses the application using a browser. –– Tier 2: Tier 2 is the web server that sends the client a webpage. –– Tier 3: Tier 3 is the application server. The application server contains the application. –– Tier 4: Tier 4 is the database server where the DBMS and database reside.
Application Architecture
103
Figure 4.3: A four-tier architecture is used for web-based applications.
Robust applications use a multi-tier architecture where business processes are divided into common tasks that can be used by multiple applications. For example, credit card processing is a common task used by several applications with an organization (such as an online store or retail store). Rather than duplicating instructions to process credit cards in all applications, there is one credit card processing program that is accessed over the network by any application that requires credit card processing (Figure 4.4).
104
Chapter 4: About Computer Applications
Figure 4.4: A multi-tier architecture divides business processes into programs accessible by other applications.
There are a number of advantages to the multi-tier architecture. First, instructions for the common tasks reside in one location. Changes need to be made in one place, not in dozens of applications, and changes automatically take effect throughout the organization immediately. Another advantage is accountability—one programmer maintains the program. This is a prime security benefit too. Other programmers need to know how to call the program and read the results of the program. They don’t need to know how the process works. Multi-tier architecture is well suited for transactional applications. A transactional application processes a transaction, such as registering students for a course, placing an order, or generating a bank transfer. There is a beginning (processing) and an end, and then the application moves on to the next transaction. A multi-tier architecture enables designers to reconfigure the architecture to accommodate high-volume transactions. This is referred to as scalability. Let’s say that management throughout the organization accesses sales transactions. Requests for information are processed the on the same servers that are also
Inside an Application
105
processing new sales transactions. Each is a transaction processed by one program typically using one database. If demand increases—more managers request information and/or there is an increase in sales transactions—the transaction process slows, creating a poor response time; managers are waiting for information and customers are waiting for orders to be processed. There are just too many transactions to process. The application, however, can be redesigned so that there are duplicate databases and additional transaction programs. When a sale occurs, a transaction program saves the data to a database. Immediately, the data is copied to three other databases. Managers who require access to sales transactions use one of the three databases to retrieve data with no impact on sales transactions. The designer is said to have scaled the application, increasing response time by increasing the number of databases (Figure 4.5).
Figure 4.5: A multi-tier architecture can be reconfigured easily to adjust to increased transactions.
Inside an Application As you saw in Chapter 1, an application begins with an idea that is transformed into specifications by an application designer and systems analyst. A programmer translates specifications into programs that bring the idea to reality. The systems analyst breaks down the idea into workflows and related processes. A workflow is a high-level description of how a task is performed. A process is a detailed description of how a task is performed. The process is described logically using pseudo code. Pseudo
106
Chapter 4: About Computer Applications
code is the logic required to perform a process, described in a way so that anyone can understand it. Here are a few samples of pseudo code that you should have no problem understanding (Example 4.1). Example 4.1: Here are examples of pseudo code.
Sequential: Prompt the user to enter a user ID Prompt the user to enter a password Validate the password IF-THEN-ELSE statements: If the medication is due within an hour then Display a yellow indicator Else If the medication was due an hour or more ago then Display a red indicator Else Display a green indicator Iteration (loop): While the patient is in the hospital Determine if medication is due for the patient End While Pseudo code is crucial in developing a program because it describes all logic needed to complete a process. You’ve probably experienced times when a program doesn’t work well. Simply, there is a bug in the program. The bug isn’t a real bug—instead, it is a fault with the logic of the process. Let’s revisit the IF-ELSE statement to see how this works. Here it is again with one change in the logic (Example 4.2). There are instructions on what to do if the medication is due within the hour and there are instructions on what to do if the medication was due an hour or more ago, but then there is a breakdown in the logic. There are no instructions that tell you what to do if neither of these conditions occur. This is a problem and a bug. Example 4.2: Here is an example of a bug.
If the medication is due within an hour then Display a yellow indicator Else If the medication was due an hour or more ago then Display a red indicator The systems analyst flushes out the logic for each process in a program by meeting with stakeholders who describe pieces of the process. The systems analyst then assembles pieces into the pseudo code to identify the process, filling in gaps in the
From Pseudo Code to a Program
107
logic so there is no bug when the programmer translates pseudo code into instructions using a programming language.
From Pseudo Code to a Program A programmer is a kind of translator who translates pseudo code into instructions using a program language that is understood by a processor. Instructions written by the programmer are referred to as source code. The language that the processor understands is called machine code: instructions written as a series of 0s and 1s. This is also referred to as the executable program: the program that runs when you select the program’s icon on your desktop. A programmer can look at the executable program using a hex editor. A hex editor translates the 0s and 1s into hexadecimal values commonly referred to as hex. Hexadecimal is a numbering system similar to binary and decimal except that the hexadecimal numbering system has 16 digits compared with 2 digits for binary and 10 for decimal. Instead of seeing an ultra-long string of 0s and 1s by looking at the binary value of machine code, the hex editor displays a shorter representation of machine code using hexadecimal values. It makes machine code slightly easier to read although rarely will a programmer look at machine code. It is simply too complicated to understand. However, a hex editor is used to view non-executable files such as text files and data files. You’ll learn more about this in Chapter 11 on forensic computing. Technically the programmer could write the program in machine language, but that doesn’t make sense. Imagine spending your days translating pseudo code into a tedious, very long series of 0s and 1s that are difficult for you to read. You probably wouldn’t last an hour writing the program in machine code, and neither would the programmer. Example 4.3 shows an example of one instruction to stop the program written in machine code (). Example 4.3: Here is an example machine code that tells the processor that the program has ended.
00000000 Assembly language is another programming language that uses English symbols, kind of abbreviations, to create instructions for the processor. Example 4.4 is an example of assembly language. Although the symbols push, mov, and pop are easy to read, their meanings aren’t clear, or at least not as clear as pseudo code. You really don’t need to know how this example works, but let’s take a close look for the fun of it. A few basics: In the last chapter you learned that the processor is the computer within the computer and that the processor performs basic math. You also learned that there is a work area inside the process called the arithmetic logic unit (ALC) and that the processor has its own memory called the register. The processor executes
108
Chapter 4: About Computer Applications
instructions sequentially. Each instruction gets in line and waits its turn to be executed. The line is called a stack—think of a stack of playing cards. An instruction is placed at the bottom of the stack (deck) and moves up the stack until it is at the top when it get executed by the processor. The instruction is removed from the top of the stack once the instruction is executed. Now, let’s interpret what is happening in this assembly language code. The symbols ax and ds are the names of two registers in the processor. In this example, let’s assume that prior instructions placed values from other memory into the ax register. The push symbol causes the value of the ds register and then the ax register to be placed on the stack. The mov symbol (an instruction already on the stack) is an instruction that tells the processor to move the value of the ax register to the ds register. The two pop instructions remove the reference to the registers from the stack. And the end symbol tells the processor that the program has stopped. Granted, this doesn’t do much, but it gives you insight into how assembly language works. Example 4.4: Here is an example of machine code that tells the processor that the program has ended.
push ds push ax mov ax,ds pop ax pop ds end The programmer writes the assembly language program into a text file using a very basic word processing program called a text editor. A text file contains only text compared with a Microsoft Word document that contains both text and instructions on how to format the text on the screen. The text editor is like Word, but without fancy formatting options. The editor is more like Notepad in Windows than Word. A program called an assembler reads the text file, converts the assembly language program into machine code, and saves it to a file that can run (execute) on the computer. The operating system (see Chapter 3) recognizes the machine code file (executable file) by a file extension (.exe), or by a setting in the file metadata, depending on the type of operating system. The machine code has instructions that are understood by a specific processor: the processor’s instruction set (see Chapter 3). Both machine code and assembly language are machine-dependent. This means that the program only runs on computers that use the type of processor that understands that particular machine code. A computer with a different type of processor won’t understand that machine code and cannot run the program. Stop!
From Pseudo Code to a Program
109
It isn’t worth going further with assembly language (assembler) because you’re probably not going to become an assembly language programmer. This example simply provides insight into how a programmer instructs a processor using assembly language. Leave writing assembly language programs to programmers—not many programmers write in assembly language because it is simply too complicated and their program would work only with a specific processor.
High-Level Programming Languages Writing a program that can run on only one type of processor is not practical. There is a much better way to write a program using a high-level programming language. When we talk about high-level, we are not talking about hard. An assembly language program is a low-level programming language. A high-level programming language is used to write instructions for a processor without knowing anything about the processor. You don’t need to know about registers or instruction sets to write a program. The focus is on translating the pseudo code logic into instructions that tell the processor how to perform tasks. A high-level programming language contains more English-like words and structures than symbols used in assembly language. And in some situations, words used in the pseudo code, such as IF-THEN, are the same words used in the high-level programming language. Here is an example of an instruction written in Beginners All-purpose Symbolic Instruction Code (BASIC) language (Example 4.5). You don’t really need to be a programmer to understand this program. It prints the words “HELLO WORLD” on the screen at the current cursor position. Example 4.5: Here is an example of an instruction written in a high-level programming language that tells the processor to display the text on the screen.
PRINT "HELLO WORLD"
A program written using a high-level programming language needs to be translated into machine code by another program called a compiler or an interpreter. These programs need to know all the details about the processor and how to write machine code to execute the instruction written in the high-level programming language. All the programmer needs to know is how to write instructions using the high-level programming language. So, the programmer is shielded from a lot of complexity for most applications they may write. Programs written in a high-level programming language (source code) can usually—but not always—be executed by various processors by compiling the program using a compiler that is designed for a specific processor. The compiler creates the executable program that runs on a specific processor.
110
Chapter 4: About Computer Applications
There are many high-level programming languages (see side bar). Some, such as BASIC, are relatively easy to learn, but may not have the capabilities required for robust programs and applications. An organization’s management information systems (MIS) department frequently recommends that programmers use a specific high-level programming language so programs are uniform throughout the organization. That is, all of their programmers “speak” the same programming language and can be assigned to work on any program within the organization. Popular High-Level Programming Languages –– Java: Java is a popular high-level programming language because a Java application can run on practically any computer without having to change the source code. This is referred to as write once, run anywhere (WORA) without having to compile the course code for difference processors. –– Python: Python is a high-level programming language used for developing web applications. –– C: The C high-level programming language is a robust programming language that is used to write general applications and programming tools such as compilers. –– Ruby: Ruby is a high-level programming language used for developing web applications. –– JavaScript: JavaScript is a high-level programming language used to add interactive elements to a website. Instructions are interpreted without the need to compile the program’s code. –– C#: The C# (C-sharp) high-level programming language is similarly designed as Java and is used to develop applications that run on Microsoft operating systems using Microsoft’s .NET framework. –– PHP: PHP is used to develop apps and websites that process lots of data. –– Objective-C: Objective-C is a high-level programming language used to develop apps to run on Apple’s operating system. –– SQL: Structured Query Language (SQL) is used to create and interact with a database and is used to develop the database interaction portion of a database application.
Breaking Down a Program into Parts Now that you have an idea of instructions that make up a program, let’s take a closer look at the structure of a program. Technically, the processor doesn’t care how a program is structured. There can be thousands of instructions listed using single-spacing and small type, and the processor simply executes one instruction at a time. However, programmers would find such a list of instructions a nightmare to read. Therefore, instructions in a program are grouped by functionality: by major tasks. Each group is called a function and has a unique name. Depending on the program, there might be a function to prompt the user to enter an ID/password, another function to authenticate the password, and still another function to handle ID/password errors. Organizing a program into functions makes it easy for the programmer to read instructions and helps the programmer focus on one major task (or function) at a time.
From Pseudo Code to a Program
111
The body of the program contains instructions that call each function by name as needed. Think of calling someone on the phone as a program. You need to turn on the phone (smartphone); locate the person’s phone number; dial the number; hold a conversation; and disconnect the call. For each of these a major task—a function—needs to be performed in order to call someone. Each of these major tasks has subtasks, or instructions. There are subtasks to turn on the phone. Once completed, you locate the person’s phone number. When all subtasks are completed, then the major task is completed. You then call the next major task (function). Each function is organized into four parts: input, processing, output, and feedback (Figure 4.6). Input is the part of the function that receives data necessary for processing. For example, an ID/password combination is data needed for the function to authenticate a user. The processing part of the function contains instructions to do the work—authenticate the ID/password. The output part is the result of processing—the ID/password is valid or invalid. The feedback part is data returned to the instruction that called the function. Feedback might simply indicate that the function completed without error, with error, or with information related to processing, such as an indication if the ID/password are valid or not.
Figure 4.6: Each function is organized into four parts: input, processing, output, and feedback.
Function at Work Here is an example of a simple function that adds two numbers (Example 4.6). This might look confusing at first, but hang in there. It isn’t difficult to understand. This is written in the C programming language but the same concepts are used in other highlevel programming languages. Example 4.6: Here is an example of defining and calling a function.
main() { int result; result = add (5, 10); } int add (int A, int B)
112
Chapter 4: About Computer Applications
{ int X; x = A + B; return (X); } The body of the program is called main() and its instructions are within the open and closed French braces ({}). The function is called add() and is defined outside the body of the program. Instructions for the function are contained within its own French braces. The function requires two integers as input. They can be any integer values sent to the function when the function is called. In this example, the integer values are 5 and 10 and are entered within parentheses next to the name of the function in the body of the program. Each is separated by a comma. The comma tells the compiler where one digit ends and the other begins. Let’s look at the function. This is called the function definition because it explicitly tells the processor what to do when the function is called. Notice that the name of the function also has parentheses. Instead of values the parentheses contain placeholders for the values. These are referred to as parameters. Individually these are called variables. You probably remember using variables in your algebra class. Each variable has a name (A and B) and an indicator of the type of value that the variable represents. In this example, each variable is a place holder for an integer (int). Instructions that execute when the function is called are written in the body of the function. The first instruction defines another variable (X), which is a placeholder for an integer and will represent the sum of the addition calculation. Next is the meat of the function. This is the instruction that adds two values sent from the main program. The sum is assigned to the variable X. The last instruction tells the processor to return the value of X to the instruction in the main body of the program that called the function. Notice that int is used to the left of the name of the function. This tells the processor that the return value is an integer. Now let’s go back to the body of the program. The first instruction defines a placeholder for an integer and calls its result. The next instruction calls the add() function. The return value of the add() function is represented by the variable result. If this were a real program, something would be done with the value of the result variable, such as displaying the sum on the screen. Here’s what happens when the program executes. The values 5 and 10 are assigned to variables A and B respectively. Every time the processor “sees” A, it really “sees” 5, and every time the processor “sees” B, it really “sees” 10. After calculating the sum, the value of variable X, which is 15, is returned to the second instruction in the body of the program and is assigned to the variable result. If this were a real program, there would likely be instructions in the body of the program that evaluate the value returned by the function. This is the feedback mech-
From Pseudo Code to a Program
113
anism. If an error occurred, the function would return a value indicating the type of error. The instruction in the body of the program would then react to the error. Variables and Memory Let’s get down to what’s happening in the program. An instruction that defines a variable, such as int result, tells the processor to set aside memory sufficient to store an integer. The memory is identified by the compiler by a specific memory address in machine code. The programmer identifies that memory address using the name of the variable. When the program calls the add() function, the processor places 5 and 10 into their respected memory addresses. When the processor “sees” the addition instruction, the processor fetches each value from memory; adds those values; then places the sum in the memory address that corresponds to variable X. When the processor returns to the instruction that called the function, it copies the value from the memory address variable X and places that value in the memory address that corresponds to the variable result. The processor releases all memory addresses that are represented by variables defined in the function. Those values no longer exist once the last instruction in the function executes and the memory addresses can be used for other purposes.
Decisions, Decisions, Decisions The processor is always asked to make a decision. The pseudo code identifies a decision point in a process. A decision point is like a fork in the road where the process can go in one of two directions depending on a condition. The systems analyst identifies the condition and how to make the decision in the pseudo code. The programmer then translates this logic into instructions that tell the processor how to make the decision. The next example is a modification of the previous addition example to show the instructions that tell the processor how to make the decision (Example 4.7). Notice that the body of the program stays almost unchanged except the function name is changed from add() to grading() and the values are changed from 5 and 10 to 60 and 80. The grading() function determines if the student’s final grade is passing. Example 4.7: Here is an example of how a function makes a decision.
main() { int result; result = grading (60, 80); } int grading (int A, int B) {
114
Chapter 4: About Computer Applications
int X; float grade; grade = 2/(A + B); if (grade => 70) { X = 1; } Else { X = 0 } return (X); } Now let’s take a look at the grading() function itself. The grading() function is similar to the adding() function. Both have two placeholders for the input values. There are variables A and B. And both functions return an integer value to the instruction in the body of the program that calls the function. More on the return value in a moment. The first two instructions in the grading() function tell the processor to reserve two places in memory: one to store an integer and the other a float. You’ll recall from Chapter 3 that an integer is a whole number (no decimal values), and a float—short for floating point—is a whole number and a partial number; it can hold a decimal value. The processor identifies these memory locations by their memory address. The programmer identifies them by the variables X and grade. Next, the grading() function averages the two grades sent by the instruction in the body of the program that called the function. You’ll remember from your math class that the calculation contained within parentheses is performed first. Here, the processor is told to add two grades and then divide by 2 to come up with the average grade. The average grade may be a whole number or may contain a decimal value, which is why the programmer told the processor to reserve space in memory to hold both a whole number and decimal value (float). Now comes the decision with the IF-ELSE instruction. The IF instruction contains parentheses, within which is the condition that the processor is going to test. The processor is asked to determine if the average grade is equal to or greater than 70, which is passing. If so, then the processor executes instructions within the French braces, which is to assign 1 to the variable X. If the condition fails, then the processor is told to follow instructions within the French braces of the ELSE instruction. Here, the value 0 is assigned to the variable X. The last instruction in the body of the function returns the value of X to the instruction that called the function. The feedback—value of X—is either true (1) or false (0). The value returned by the grading() function is assigned to the variable result in the body of the program. Nothing happens except that the program ends, but
From Pseudo Code to a Program
115
in a real program there would be other instructions that tell the processor what to do if the student passes and what to do if the student fails.
Feeling Loopy There are times when a process needs to be repeated multiple times. The programmer can repeatedly write those instructions or write instructions once and tell the processor to continue to execute those instructions until a specific condition is met. This is called a loop. This example shows a loop used to do nothing except to count (Example 4.8). Let’s take a closer look at how this works. You’ll notice that this example is similar to the previous example in that it has a body of the program and one function. The body of the program contains the instruction to call the pause() function, which simply pauses execution of the program for a specific length of time. The length of time can be adjusted by changing the value of the integer sent to the pause() function. The value 100 is used here; however, the programmer might later change the value if the pause is too short or too long. Example 4.8: Here is an example of how a function makes a decision.
main() { int result; result = pause(100); } int pause(int A) { int i = 0; while (i < A) { i = i + 1; } return (0); } Now let’s look at the pause function. The pause function simply counts and while it is counting, the rest of the program waits until the counting is completed. The pause function continues counting until the value of A is reached. The letter A is the letter we assigned to a memory address. The value placed in that memory address is 100, which is the value placed in parentheses when the pause() function is called in the
116
Chapter 4: About Computer Applications
main() portion of the program. Next, the value 0 is assigned to the letter i. The letter i is also a letter we assigned to a different memory address. Next is the while loop. Most high-level computer languages have a variety of loops, each doing practically the same thing but a little differently. The while loop tells the processor to continue to execute instructions with the French braces as long as the value of variable i is less than the value of variable A. The only instruction in the while loop increments the value of variable i. That is, it counts. Let’s see how this works. Remember that the value of A is 100 and the first line of the function sets the value of i to 0. When the processor first executes the while() loop, it compares the value of i with the value of A. Is A less than i? If yes, the instructions within the body of the while() loop are executed. If not, then the processor executes the instruction that follows the close French brace of the function, which is return(). Each time the processor executes the instruction within the loop, the value of i is increased by 1. Once the value of i is 100, the processor stops executing the instructions within the while() loop and continues by executing the return() instruction. Notice that the return() instruction sends the value 0 back to the instruction that calls the function. This indicates that the function ended without an error. There really isn’t a chance that the function will fail, which is why the return value is set to 0 (true). Nothing is done with the return value in the body of the program.
Sharing Parts of Programs Programs and applications can contain millions of lines of instructions called code, each giving the processor explicit instructions to perform at precisely the appropriate moment. And those instructions are executed flawlessly time and again. Every nuance on the screen requires instructions. Hovering over an icon on the menu changes hundreds of pixels, each requiring an instruction. Hundreds of hours are needed to write and test a program before it becomes a working part of an application. Programmers cut down on the number of instructions that need to be written by placing frequently required instructions in a function. The function is called any time within the program when those instructions need to execute. Another way programmers economize is by creating a library of functions. A library is a file that contains many functions, typically written by many programmers—it is a way of sharing functions. Think of this library as a library of how-to books written by many authors. You know you need to perform a task. You can spend time by trial and error developing the steps necessary to perform the task, or you can go to the library to find how to perform the task. Someone else went through the pain of developing it for you. Programmers do practically the same thing when translating pseudo code to a high-level programming language.
Sharing Parts of Programs
117
A programmer doesn’t need to know or care how a function performs the task. The programmer needs to know the name of the function; data the function needs to work; and how to interpret the value returned by the function. The function takes care of the details of executing the steps needed to perform the task. The programmer calls the function the same way as if the programmer wrote the function. The function is in the library—a separate file—and not in the source code. Therefore, another program is required to combine the library and the source code. That program is called a linker. Here’s how the linker works. Before compiling the program, the programmer makes reference to the library in the source code. This is critical because the compiler looks for the function definition in the source code when compiling the program. An error is displayed if the compiler can’t find it. Reference to the library tells the compiler not to worry because the function definition is in a library that will be linked later to the program. The compiler translates the source code into an object file that is written in machine language. The library too is in the form of an object file having been previously compiled. The linker combines these object files into the executable file. The executable file works just as if the programmer wrote all the instructions (Figure 4.7).
Figure 4.7: The linker is a program that combines the compiled source file with the compiled library.
The Tool Box: Integrated Development Environment Hopefully you’re still hanging in there. There are lots of pieces that programmers need to make for every application. Programmers must be highly organized to develop and assemble these pieces. The use of a toolbox helps to manage this chaos. The toolbox
118
Chapter 4: About Computer Applications
isn’t really a toolbox but an application that enables the programmer to create, edit, compile, link, and execute a program all in one application. The toolbox is referred as an integrated development environment (IDE). There are many available. A popular one is Microsoft Visual Studio, which uses .NET standards across several programming languages. Microsoft .NET as we have discussed, is a programming environment used to build, deploy, and run applications and services that use a common language such as ASP.NET. A service is a program that provides features used by a variety of applications. An IDE usually has libraries of functions that come with the application and online help that programmers use to search for functions and learn how to use those functions. Plus, the IDE usually enables the programmer to use the programmer’s home-built library with the IDE. Just-in-Time Compiling A compiler translates the high-level program language into machine code so it can run on the computer. Translation can occur when the program is created or just before the processor executes the instruction, depending on the language used by the programmer. A compiled programming language is a high-level programming language that requires the program to be compiled into an executable file when the program is created. An interpreted programming language is a high-level programming language that is translated into machine language right before the processor executes the instruction. An interpreted programming language requires that the computer running the program also run a program called an interpreter. The interpreter reads each instruction in the source code, translates the instruction into machine code, and then sends the machine code to the processor for processing. One problem: the program cannot run on a computer that doesn’t have an interpreter for that language. The BASIC language, an interpreter, for years was loaded on the DOS and Windows operating systems so that it was available if you needed to use BASIC. A compiled program doesn’t require an interpreter; however, the compiled program can only run on a computer that can read the executable file. Writing a program using an interpreted programming language eliminates this problem because the computer running the program has the interpreter that translates the program into machine language. Now for the dilemma. The programmer probably doesn’t want to distribute the source code for everyone to see and modify, so few programs are written in an interpreted programming language. Java and C# provide alternative options to resolve this dilemma. Java and C# are interpreted languages that are compiled to bytecode. Bytecode, also known as portable code (or p-code), is a translation of the source code into a structure that can be efficiently translated by a program called the virtual machine. The virtual machine is an interpreter running on the computer that executes the program. However, the virtual machine only interprets bytecode—not source code—into machine language. The source code is not distributed. Java and C# are used to create programs that “are written once and run everywhere.” This means that the program doesn’t have to be modified or compiled for each processor. You might have seen a message pop up on your computer and state that Java is not installed or that a Java update is available. This message is referring to the Java virtual machine that is usually available on most computers and is used to run Java applications.
Dynamic Libraries
A GUI Program
119
Should you get the current smartphone or wait for the new one to reach the market? This is a common dilemma facing everyone who buys any electronic devices these days. You’re stuck with the old technology if your timing is off. The same happens to programmers who use libraries of functions. There is always the chance that someone will develop something new that makes your program appear dated because you used an outdated function. This is most noticeable with programs that use a graphical user interface (GUI) such as Windows, because your program might not have the latest feature found in other programs. Programmers avoid this problem by using libraries that are linked to the program when the program is running on the computer. While the program is running on the computer, the library is linked when the function is called by the program. This is referred to as a dynamic-link library (DLL). The dynamic-link library is usually an element of a graphical user interface operating system such as Windows and is usually updated on the user’s computer as part of the operating system updating process. In fact, the Windows API is made up of dynamic-link libraries. Let’s say the programmer wants a drop-down list, something all of us use. The programmer calls the corresponding function and sends it any necessary data such as items that will appear in the list. The function, however, is not compiled with the program. Instead, the function is linked to the program when the program calls the function as the program is running. The drop-down list is always the latest because Microsoft updates the library on every computer that runs Windows, assuming the user permits regular updating. The programmer doesn’t have to worry about updating the program. You might have noticed the slight drawback of using a dynamic library—hesitation when first using a feature of a program. You’ve probably clicked a frequently used menu item and the program responds instantaneously. And then there is a subtle delay when selecting an infrequently used menu item. That subtle delay is Windows loading the function from the dynamic library. The delay occurs only the first time the function is called. Afterward the function is already in memory.
A GUI Program Nearly all programs have a graphical user interface that enables users to click their way through the program. Programmers create the program by dragging and dropping objects on the screen using a visual integrated development environment. Objects, as discussed in Chapter 1, may be buttons, scrolls lists, drop-down lists, text boxes, and other familiar graphical items used to interact with the application. Each screen starts off as a blank canvas containing guidelines as rows and columns of dots. These dots serve as guides for positioning objects on the canvas but don’t appear on the final screen. There is a smaller window that contains a toolbox of objects (Figure 4.8). The programmer drags objects from the toolbox into position on the canvas, then adjusts the object’s size to conform to the screen design. There are other tools in the visual IDE that enable the programmer to align objects on the canvas much like you align objects on a slide in PowerPoint.
120
Chapter 4: About Computer Applications
Figure 4.8: The toolbox contains objects that are dragged and dropped onto the canvas to create a window for the program.
Dragging and dropping an object on the canvas also creates subroutines referred to as subs, another term for functions. There is one subroutine created for each event associated with the object. An event is something that can be done with the object. For example, the user can hover over the button to display a short description of the button. This is considered as an event. Instructions to display the description are contained in a subroutine that executes automatically when the mouse cursor hovers over the button. The programmer enters those instructions into the subroutine. Likewise, there are subroutines when the left mouse button is pressed (Figure 4.9); when the right mouse button is released; when right mouse button is pressed; and when the right mouse button is released. Events associated with every object on the screen have instructions written by the programmer in a subroutine.
Figure 4.9: The programmer enters instructions into the subroutine that executes when the left
Web Applications
121
mouse button is selected.
There are also subroutines that create the object on the screen; resize the object on the screen when the user changes the screen size; and other events that most of us probably don’t notice—that is, until the event doesn’t happen. Instructions for these utility type subroutines are usually written automatically for the programmer when the object is placed on the canvas. The programmer needs only to focus on instructions directly related to the program and not housekeeping chores. Not all subroutines are associated with an object. There are subroutines that are automatically called when the program or the screen opens; when the program or screen closes; and subroutines that can be called at any time by instructions in other subroutines. Collectively this is how a programmer creates a GUI program. Want to Try Your Hand at Writing a GUI Program? Becoming a programmer may not necessarily be in your future, but you can experience what it is like to create a GUI screen by using Excel. Here’s how to do it. Open Excel and you should see a Developer tab. If you don’t then you’ll need to turn on the Developer option. Google has instructions on how to do this. Select the Developer tab and click the Visual Basic icon to open Visual Basic for Applications (VBA) GUI integrated development environment. This exists in all Office products and is used to create “scripts” or “macros,” often used to perform repetitive tasks that you may use often in your office work. VBA makes this easy by allowing you to use a macro recorder to keep a macro (script) that you create and want to save to use over and over again. Select Insert and then select UserForm to open a blank canvas and the object toolbox (Figure 4.1). Drag and drop a command button from the object toolbox to the blank canvas. You can drag it anywhere to reposition it on the screen. Double click the command button to display the subroutine where you enter instructions that will execute when the command button is selected. Stop! Interested in going further? Check out the many online resources in Google and YouTube that show you how to build a VBA program or if you are serious about learning more there are several good books on the topic. If you have the patience and drive to learn VBA programming, then you can automate many of the routine tasks you perform in Excel and other Office products—making it look and feel like a real GUI program. Keep in mind that it isn’t a real GUI program. A VBA program only works on computers that run Office. Other integrated development environments are used by programmers to create GUI programs that don’t need Office.
Web Applications A web application consists of “kind-of programs” and programs. You learned in Chapter 2 that a program called a browser is used to connect (link) to a website. The initial webpage of the website (called a home page) is usually—but not always—in a
122
Chapter 4: About Computer Applications
file called “index.html.” When you select a website, the browser retrieves the index. html file from the website and then follows instructions in that file. The index.html file, as with other files that contain webpages, has instructions written in hypertext markup language (HTML). A markup language contains instructions and text. Instructions called tags tell the browser how to display the text on the screen—this is the “kind-of program.” Here is an example of a very simple webpage that displays “Hello World!” on the screen (Example 4.9). Hello World is typically the first program that a programmer learns to write in any program language because it illustrates the basics of writing a program. Example 4.9: Here is an example of HTML instructions used to create a webpage.
Welcome
Hello World!
HTML instructions are enclosed within a greater than and less than symbols (< >). If you leave these out, then the browser displays the text, not recognizing it is an instruction. The first instruction always tells the browser that this is an HTML document. The next instruction tells the browser where the document begins. Subsequent instructions define major sections of the document: defines the heading section of the webpage; defines the body of the webpage. Sections begin with section name and end with the section name that starts with a forward slash (/) such as . There is an assortment of instructions (tags) programmers can insert to further direct the browser on how to display text. You’ll notice that “Welcome” appears between the title instructions. This tells the browser that the text should be displayed as the title of the webpage. Exactly how Welcome appears on the webpage is left to the browser. The browser determines the font, style, location and other details on how a title is displayed. The body section contains the text “Hello World!” and the browser determines how body text appears on the screen. The index.html file, as with most webpages, is a text file, similar to a Word document but without all the formatting. Take a look at the contents of your favorite webpage file and see what the browser sees. Here’s how to do it. Bring up a webpage in the browser. Place the mouse cursor on the webpage and right-click the mouse
Web Applications
123
button to display a pop-up menu. Select View Source and the browser displays the actual webpage. There are probably hundreds of lines of instructions—most if not all are unreadable. Programmers even have a tough time reading them. Yet these are just some of the instructions—yes, there are more—needed for a professional webpage.
Cascading Style Sheets (CSS) The Hello World! example is not very impressive. It lacks the pizzazz found on commercial websites. HTML lacks instructions to specifically tell the browser how you want the webpage to look. Cascading Style Sheets (CSS) extends HTML and provides instructions to tell the browser exactly the design and format of the webpage. Figure 4.10 shows a webpage that doesn’t use CSS and Figure 4.11 shows the same content with CSS. These are great examples from w3schools.com.
Figure 4.10: Here is a webpage created using HTML, leaving the finer points of the design to the browser.
124
Chapter 4: About Computer Applications
Figure 4.11: Here is a much more professional webpage design using CSS.
CSS contains instructions that define a style for any HTML tag. The style is defined at the beginning of the HTML document in the A programmer ensures that the style is consistent throughout the website by creating an external style sheet. An external style sheet is a file that contains the style definition for webpages used on a website. Reference to the file is made in the section of each webpage, as shown here (Example 4.11). The browser retrieves the file
Web Applications
125
when the link instruction in encountered and then applies the style sheet throughout the webpage. Example 4.11: Here is an example of how CSS is linked in to a webpage.
JavaScript HTML is limited, as you’ve seen. It lacks the robust features found in high-level programming languages. Just as CSS enhances HTML by providing instructions on how to give webpages a professional look, JavaScript provides the programming structure that you’ve seen in high-level programming languages. JavaScript is used by programmers to create functions; make decisions based on changing conditions in webpages; and includes much of the functionality found in programming languages. JavaScript can be embedded into a webpage using the The condition statement is (new Date().getHours() < 18). Here’s what it is saying. Call the Date() function to get today’s date and current time. Call the getHours() function to get the time portion of the date and time. The period links together to two functions; the output of the Date() function becomes the input to the getHours() function. If the hour is less than 18, then the condition is true. The instruction in the IF statement is sure to confuse you. The instruction is “simply” changing the text in the paragraph (
Good Evening!
) to “Good day!” if the current hour is less than 18. Notice that the paragraph is uniquely identified as “demo,” which is used in the instruction within the IF statement to identify the paragraph.
126
Chapter 4: About Computer Applications
Stop! You probably get the point. Going beyond this point is unnecessarily confusing— but you can always visit w3schools.com, Google other sites, and search YouTube or if you are serious about learning about it, there are many good books on JavaScript. Example 4.13 shows another example of how JavaScript makes a webpage interactive. Example 4.13: Here is an example of how JavaScript is used to make decisions on a webpage.
Click Me!
This is a demonstration.
This example shows how a programmer defines a function called myFunction() using JavaScript. Notice the HTML has instructions to create a button on the webpage called Click Me! The button type is “button”—there are other button types, such as “submit.” This statement has an onclick attribute that tells the browser what to do when the button is clicked. In this example, the browser called myFunction(). Inside myFunction() is an instruction similar to the instruction you saw in the previous example. Here the instruction replaces the test in the “demo” paragraph with “Hello JavaScript!” when the button is clicked. Building a professional webpage and website is much more involved than what is shown in these examples, but they provide a glimpse behind the scenes of how instructions tell the browser how to create a webpage.
Mobile Apps An app is a relatively small application that runs on a mobile device such as a tablet or smartphone. Some apps run entirely on the mobile device—it doesn’t need to connect to the outside world; however, more frequently, the app is part of a much larger application that involves linking to databases and other applications running on remote servers. An app usually takes advantage of features that are available on the mobile device to enhance the user experience with the app. You’ve probably seen this happening when you scan a bank check using a mobile device’s camera and use the banking app to deposit the bank check. The banking app captured your login information from the screen and then connected to the bank’s server using the cell phone network. Once connected, the app sends your login information, a picture of the bank check, and
Mobile Apps
127
some other instructions to an application running on the server to process the bank check. The server has programs that receive the data, then contacts other applications and databases on the bank’s internal network to complete the processing. Many features available on mobile devices are accessible by an app such as GPS location, contact lists, other apps on the device, calendars, internet information, and information that many users preferred to remain confidential. Apps “ask” users when the app is installed to opt-in or opt-out. Opting-in grants access. Opting-out does not grant access. Apps in the United States are typically set for opt-out. Outside the United States, apps are typically set for opt-in. Some apps are distributed free while others require a small payment to download the app. Free apps are not really free. There is usually a revenue source for the maker of the app such as ads, selling the user’s information to third parties, or encouraging the user to use products or services. For example, the app may provide only basic features. Upgrading to a fully functional app will cost you. Paid apps provide a direct revenue stream to the maker of the app.
Building an App An app is platform-specific and called a native mobile app. The app runs on specific mobile operating systems. For example, the iPhone uses the iOS operating system and run apps written in the Swift or Objective-C programming languages. The Android operating system is used in many non-iPhone mobile devices and runs apps written in Java.). App developers use a software development kit provided by Apple for iOS and Google for Android to build an app. The software development kit (SDK) contains tools and interface elements that enable developers to build and test an app on a computer before loading the app to a mobile device. The SDK gives the developer the capability to incorporate all features of the mobile device into the app. Two other SDKs for apps are Xamarin from Microsoft and React Native from Facebook. Xamarin enables developers to build an app that can run on many operating systems. Apps here are written in the C# programming language. Facebook’s React Native is similar in concept except apps are written in JavaScript and run on iOS and Android. Another approach to building an app is to create a web app using HTML, CSS, and JavaScript. The web app is downloaded similarly to other apps but must run in the mobile device’s WebView (browser). An app has three layers. These are: –– Presentation layer: The presentation layer contains instructions on displaying the user interface and user interactions.
128
Chapter 4: About Computer Applications
–– Application layer: The application layer contains instructions that process user requests and interacts with features of the mobile device. –– Data layer: The data layer has instructions that process data, interacting with databases on a remote server.
Creating a Dynamic Webpage A web-based application can change the content of webpages based on current needs. Let’s say you want information from a database. You’ll learn more about databases in Chapter 5. For now, think of a database as a filing cabinet. A webpage prompts you to enter a request for data then calls a program on the server side to process the request. The server side program then imbeds the data in a new webpage. This is referred to as a dynamic webpage because its content is changed by a program. A static webpage such as the home page contains instructions that don’t change. PHP is a commonly used programming language used to create programs on the server side of an application. Originally, PHP was an abbreviation for “personal home page,” but now it’s simply referred to as a hypertext preprocessor. A preprocessor is a program that modifies data for another program. A hypertext preprocessor modifies HTML. Here’s a glimpse of how PHP and HTML work together to create a webpage on the fly. Example 4.14 displays a button on the webpage using the HTML tag. The tag is described with the type: onclick and value. You learned about these in the JavaScript section of this chapter. In this example, the onclick attribute tells the browser what to do when the button is selected—call the program “HelloWorld.php” from the web server. The rest of this example you’ve seen previously in Example 4.9. Example 4.14:
Welcome
The PHP program is shown in Example 4.15. Does it look somewhat familiar? It should, because these instructions dynamically generate the same webpage as in Example 4.9. The PHP program opens with . The body of the
Testing the Application
129
PHP program contains PHP instructions, which, in this example, sends (or echos) the browser HTML instructions that the browser interprets as instructions to display a webpage. Example 4.15:
Stop! You don’t need to learn more about how PHP is used to handle communications between webpages and applications on the server side unless you want to dabble in web programming—though it is something worth trying someday. You can find information about PHP programming on Google and YouTube or in your favorite bookstore.
Testing the Application Before an application is used by an organization, programmers and other technicians follow a rigorous testing process to ensure that the application fulfills the design specifications and meets expectations. The testing process is thorough, requiring that each piece of the application be tested, then tested with all the pieces together, and then tested with the entire application with all the other applications that are used by the organization to be sure there are no undesired consequences when the new application goes live. Here is the testing process—some steps are behind the scenes and others involve stakeholders verifying that expectations are met: Unit testing: A unit is a small functioning piece of the application—a program or a portion of the program (function). Typically, the programmer who develops the unit also tests the unit to be sure that it meets specification.
130
Chapter 4: About Computer Applications
Integration testing: Integration testing assembles units and then tests them to make sure that they work together. Functional testing: Functional testing focuses on a particular functionality of the application such as processing credit card payments. System testing: System testing places the application in a simulated production environment that contains hardware, operating systems, and other applications used by the organization. The goal is to simulate how the new application will work once the new application is installed in the production environment. Stress testing: Stress testing determines the limits of the application. The application is placed in the simulate production environment. All applications are gradually cranked up until the new application nearly crashes. This lets the programmers know the limits of the new application. Let’s say the application processes orders. A stress test determines the number of orders that can be processed before response time slows to a crawl. Performance testing: Performance testing determines if the application’s response is acceptable in less stressful situations. Usability testing: Usability testing determines if the application is user-friendly. Acceptance testing: Acceptance testing is the test-drive by the project sponsor. In a sense, the project sponsor “buys” the application once the application passes the acceptance test. Regression testing: Regression testing is retesting the application after the programmer fixes a problem with the application to make sure that the fix hasn’t caused problems with other pieces of the application and other components in the production environment. Beta testing: Beta testing is letting stakeholders use a pre-released version of the application to identify how the final application will work under real-life conditions. This is commonly used for commercial software where there might be hundreds of thousand users of the application such as Windows. Environments You might hear technicians use the terms “test environment,” “training environment,” “staging environment,” “production environment,” and an assortment of other kinds of environments. Here’s what they are talking about. An environment is a collection of computers, servers, networks, operating systems, and applications. Here are the more common environments.
Testing the Application –– ––
–– ––
131
The production environment is the environment used to run applications required to operate the organization. The staging environment is identical to the production environment with one exception: the new application. The only difference between the staging environment and the production environment is the new application that is undergoing last-minute testing before it is released into the production environment. The training environment contains only the new application and is used for training stakeholders on how to use the application. The testing environment is used by programmers to test their portion of the application.
White-Box, Gray-Box, and Black-Box Testing You know from experience that proofing your own reports is not a good thing because you are likely to overlook errors. The same is true for programmers when it comes to testing a program. The programmer is simply too close to the program to conduct an objective test. Programmers overcome this problem by using three methods of testing, called white-box, gray-box, and black-box testing. –– White-box testing is when the tester knows how instructions should work. –– Black-box testing is when the tester enters information and evaluates the output of the application but knows nothing about the process. –– Gray-box testing is when the tester has limited knowledge of how instructions should work sufficient to test a potential break down (a bug) in the logic.
Chapter 5 Data and Databases Practically every application stores data of some kind that is used for a variety of purposes. The application you use to make an online purchase stores data. Each time you visit a website a small amount of data called a cookie is stored on your computer. Your smartphone has all sorts of data about you stored on the smartphone—and possibly in a remote computer. Any application that stores data can be called a database application. Let’s begin by exploring data and information. These terms are used interchangeably but they’re not the same. Data is part of information. By itself, data is meaningless, but when associated with related data collectively, there is meaning: information. Seems like we’re splitting hairs? Not really, because the difference is critical to every database application. A database application breaks information into data that is stored and then reassembled into information needed to run an organization. Here’s an example that will clarify this point. You realize that “Bob” is a first name, but there are millions of Bobs in the world. Is Bob data or is Bob information? Whether or not Bob is information is a bit tricky. Let’s say you’re at a party in your home. Your friend Bob is there. No other Bob is at the party. The name “Bob” is data, but you have other data—you recognize Bob’s face and voice and he’s the only Bob in the room—that collectively becomes (meaningful) information. The name “Bob” combined with other data identifies your friend. At work there are three colleagues called “Bob” and two have the last name “Smith.” You receive an email signed “Bob Smith.” Is the signature data? or information? Bob and Smith are each data. Since there are three people at work with the same first and two with the same last name, the signature is data: meaningless. You need additional data to identify Bob Smith as information.
Storing Data You have your own data storage system—a filing cabinet organized with file folders or the archaeological method where there are stacks of paper on your desk with the oldest on the bottom of the pile. Hopefully you also have an electronic filing system where you organize your project files into folders on your computer. You store Excel workbooks, Word documents, PowerPoint presentations, and other documents related to the project in the same folder. Your storage method probably works fine for you, but might be challenging for anyone else if they needed to locate a particular piece of information on your computer. They need to know which folder and which document contains information
DOI 10.1515/9781547400812- 005
134
Chapter 5: Data and Databases
they need. In fact, you may face the same challenge and find yourself opening and closing many files to locate the information. Windows has a limited search feature that will do some of the legwork for you. Enter a search criteria and Windows searches through multiple folders and documents displaying documents that contain the search criteria. Selecting the returned document opens the document where you’ll find the search criteria within the context of the document. Windows search isn’t perfect. It simply locates the search criteria in a file. If you’re looking for “automobile” and you search for “car,” Windows won’t find “automobile.” It isn’t that smart. Also, Windows will return the name of every document that has the search criteria—even documents where the search criteria aren’t really discussed in detail. Say that you have a folder with 100 files and you’re looking for all references made to “automobile.” Windows might return 45 references in 20 documents. Windows narrows the search—not necessarily finds what you are looking for. You still need to review all 45 references manually. However, this is much better then manually searching all 100 files looking for the search criteria where there is a good chance that you will overlook some instance. With a Windows search you at least know they are the only instances of the search criteria. A Look into Text Files Like most, you probably take for granted all the effort it takes to display text in any type of Office application. You type characters, change fonts, underline, bold—the creative options are nearly endless. However, there is more happening than meets the eye. A text file—one that usually has the file extension “.txt” contains just characters entered into the text editor, such as Notepad. At the end of every line—when you press Return—the text editor automatically enters two non-printable characters called a carriage return and line feed. Go back to the days of typewriters—you might have seen them used on television or in the movies—to understand what this means. The carriage return tells the program to move to the beginning of the line, like moving the typewriter carriage. The line feed tells the program to move to the next line, like rolling the typewriter carriage to move the paper up one line. Some older text editors will place another non-printable character at the end of the file—called the end of file marker—that tells the text editor to stop loading the text. A word processor is a fancy text editor that produces a text file that also contains formatting instructions—all those creative ways to display text. Other Office products have some of the same capabilities as a word processor and then some. The Rich Text Format (RTF) is a formatting language that is used to format a document so that formatting can be used by different word processors. All formatting is described in the file, which is why the Word document is relatively large even when you type only a few characters into the document. Try this eye-opening exercise to see what’s really in a Word document. 1. Open Notepad 2. Select File 3. Select Open 4. Change from Text Document to All files 5. Select a file that you created in a word processor 6. You’ll see text of the Word document and you’ll see all the formatting commands that you won’t see when the document is displayed in Word.
Storing Data
135
A better approach to storing data is to use a database. A database is like a filing cabinet where data is stored logically, making it easy and quick to retrieve. Notice that a database contains data. Related data—information—is stored logically in the database. For example, a customer’s name and address (information) is broken down into data (first name, last name, street, city, state, postal code). Data is then stored in the database. Again, it may seem like we’re splitting hairs. Picture an Excel workbook that is organized into worksheets (Figure 5.1). The workbook is the database. Both the Excel workbook and a database have multiple tables. A worksheet may contain information about customers. That worksheet is organized into columns and rows. Each column contains data identified by the column heading—first name, last name, street, city, state, and postal code. Each row contains data about one customer—Bob, Smith, 121 Any Street, Any City, Any State, 07777. Each cell contains data for a particular customer: Bob. Keep this picture of a workbook in mind as you learn about databases. You might have heard technicians mention the terms “fields” and “records” when talking about a database. A field is another term for a column and a record is another reference for a row.
Figure 5.1: A database and tables that make up the database are similar in concept to an Excel workbook and its tables.
File? Database? What’s the difference? Think of a file as a document that contains a continuous flow of characters and words that are in readable form; you can read each paragraph and so can Windows when it searches the document. However, you
136
Chapter 5: Data and Databases
and Windows need to read through each line of the document to find a specific piece of information. In comparison, a database application breaks down information into data and stores the data in a database. Data in a database is easily searchable and also easily assembled into information. However, a database is in a less readable form than a Word document—you and Windows might find reading data in a database challenging, but it is easy for the database application to read.
Memory and Disk There is a tendency to assume data is stored on a disk or some permanent storage device (see Chapter 3). In reality, data is stored in both memory and in a permanent storage device. For example, the small piece of data stored by websites on your computer—a cookie—is stored in a file on your hard disk, but is also stored into memory for quick access. The cookie contains identifiable information such as the last time you visited the website or the last item viewed. Databases are also stored both on disk and in memory. The amount of the data stored in memory depends on the nature of the database application. Some database applications store the most frequently used data in memory while more robust database applications—such as those used by online retailers—store the entire database in memory for fast access and to avoid delays caused by accessing the disk drive (see Chapter 3). Requests for information are made to the database located in memory. Changes to the database in memory are then saved to the database residing on the disk drive.
Developing a Database Application Creating a database requires careful analysis and planning that begins with a review of workflows (Chapter 1). Databases today have such a wide degree of functionality that they may or may not need to be programmed, depending on the needs of the client. The database application contains instructions for the processor on how to capture, store, retrieve, and display data. The database designer translates the pseudo code, or whatever other technique used, into a database design referred to as a database schema. The database administrator then uses the database schema to create and maintain the database. Let’s take a look at how a database designer identifies data for the database. The objective is to identify all the information associated with a workflow. The technique for doing this is called entity analysis. The database designer identifies each entity by name, which is referred to as an entity type. An entity is a person, place, thing, or event (see Chapter 4)—such as customer, order, product, and other familiar names common to many workflows.
Developing a Database Application
137
Next is to identify characteristics of each entity, referred to as entity attributes. You recognize these as customer number, customer name, and customer address for the customer entity. However, the name of the entity attribute probably doesn’t give the full picture of the data associated with the entity. Yes, the process has a customer number, but you don’t know what a customer number looks like. The same is true about a customer name and customer address. The database designer needs to see real data—a real customer number, a real customer name, and a real customer address. These are referred to as entity instances. The database designer then knows the type and size of the data, much like you need to know column sizes in an Excel worksheet. Here are some entity attributes for a customer (entity type): –– Customer Number –– Customer First Name –– Customer Middle Name –– Customer Last Name –– Street Address 1 –– Street Address 2 –– City –– State –– Postal Code Here are instances of the entity attributes for a customer: –– 1234-567 –– Bob –– Jeffrey –– Smith –– 121 Smith Avenue –– None –– Any City –– Any State –– 070001 –– A Closer Look at Data The importance of seeing actual data (an instance) is to be able to define the data in the database. Data is defined as type and size, as explored in Chapter 4 when instructions told the processor to reserve memory sufficient to hold a specific type of data. Likewise, the database designer needs to know how much space to reserve in the database for the type of data. Seeing real data helps the database designer identify the type and size of the data. Data is broadly defined into two categories: structured data and unstructured data. Structured data are numbers, text, and dates that can easily be stored in a database
138
Chapter 5: Data and Databases
(customer first and last names). Unstructured data are images, video, and audio— rather large files that really don’t fit in a database. Imagine the size of a database that stores a video of your favorite YouTube clips—it would be too large to manage. There is a workaround, however. Reference to the file that contains the unstructured data is stored in the database. You’ve probably seen programs that display a picture of a person along with the person’s name, address, and other pertinent information about the person. This information is stored in the database. Only a reference to the picture is stored in the database. The reference tells the application where to retrieve the photo. You might have seen something like this when using PowerPoint, where the reference to a video is embedded into a slide and the video itself is elsewhere on the disk. The presentation tells PowerPoint to go to the file that contains the video. A drawback of using unstructured data is that the reference may not be accurate. The reference contains the path to the video file. However, the path may be valid for your computer but not for the computer running your PowerPoint presentation. An error message is displayed when PowerPoint can’t find the video—usually in the middle of the presentation.
More About Structured Data Structured data is defined by the type of data referred to as a data type that is used to reserve a specific amount of space in the database for the data. Table 5.1 contains commonly used data types for databases. There are more data types available. You’ll recognize a few familiar terms from Chapter 4 in Table 5.1. Other terms might be a little confusing. Let’s clear up any confusion. Table 5.1: Commonly used data types in a database. CHARACTER
Character data
BINARY BOOLEAN INTEGER FLOAT DATE TIME TIMESTAMP
Non-character data such as images True/False Whole Numbers Whole and partial (decimal) numbers Date data Time data Date and time data
The first two data types are CHARACTER and VARCHAR. These reserve space in the database to store a character—one of the characters on the keyboard. CHARACTER lets the database designer know how many characters will be stored and VARCHAR sets a maximum number of characters to be stored. Notice there is a data type called
Developing a Database Application
139
CLOB—character large object binary. This is used for a large amount of text—more than 4 gigabytes—such as a book. Usually this amount of text is stored separately from the database in its own file. Reference to the file is stored in the database. When text is needed, the database application finds the file referenced in the database and then looks elsewhere for the file that contains the text. The terms precision and scale are used in a few data types. Precision refers to the number of digits in the number and scale refers to the number of digits to the right of the decimal. Data types that reserve space for integers (INTEGER or INT) have either a fixed precision or let the database designer define the precision in the case of the INTEGER(p). The DECIMAL data type requires both precision and scale since it contains both a whole and partial number. The terms binary and character are referenced in a few data types. Undoubtedly, these can be confusing. You’ll remember from Chapter 2 that binary is a numbering system consisting of two digits: 0 and 1. These are used to represent data in memory, on a disk, and in a database. All data is stored as a series of binary digits (bits), and this is where the confusion lies. A program can read data as binary values (0 and 1) or interpret the binary values as representing characters—letters, numbers, and symbols on the keyboard. Using a character data type tells the program to read data as characters. Using a binary data type tells the program to read data as 0s and 1s. Usually data at the beginning of the file (called metadata) is used to help the program interpret the rest of the file (i.e., picture, video, audio). Although binary data can be stored in a database using the BINARY data type, many are too large to be stored in the database and are stored in a file outside the database. Reference to the file is actually stored in the database. In addition to defining data by data type, the database designer further describes behaviors of the data. “Behaviors of data” probably sounds a little strange, so let’s clarify. –– NOT NULL: Some data, such as a customer number, is required at all times. Database designers call this behavior NOT NULL, meaning that the data can never be without a value (NULL). –– Auto generated: Data that contains the date and time can be filled in automatically using the current date and time from the computer. This ensures accuracy and reduces the need for the user to enter the date and time. The date and time can always be overridden if necessary. This is referred to as an auto generated value. –– Default value: Based on the workflow, the database designer can determine the starting value for some data. This is referred to as a default value. You probably have seen this when making an online purchase where the shipping address defaults to the billing address. You can override the default value, but there is a high probably you won’t need to since the default value is probably the correct value. –– Validate: Ensuring the accuracy of data before data is stored in a database is critical to every workflow because business decisions will be made based on the
140
Chapter 5: Data and Databases
data. Some data is automatically validated based on the data type. You can’t store anything but a date in a date data type. Likewise, you can’t store a character in an integer data type. The database designer can define more elaborate validation rules for data in the database design. A validation rule consists of instructions that describe the validation process. For example, a customer number for a new customer must not have been assigned to an existing customer. The validation rule describes the instruction to determine if the new customer number already exists. The validation rule automatically executes when a customer number is assigned to the customer number data. –– Valid range of values. A less elaborate but still very useful validation method is to determine if the value assigned to the data makes business sense. Let’s say there has never been a time when a customer ordered more than 50 widgets per order. Any order of more than 50 widgets would be unusual and even suspicious. Someone might have incorrectly entered the quantity. A valid range of values can be established that will validate selected data such as the quantity data. In this example, the valid range is from 1 to 50. A value greater than 50 may trigger a request to confirm the quantity. Table 5.2: Here are data types that are commonly used for databases. Data type
Description
CHARACTER(n) VARCHAR(n) CLOB BINARY(n) BLOB BOOLEAN VARBINARY(n) INTEGER(p) SMALLINT INTEGER BIGINT DECIMAL(p,s)
Fixed length character string where n is the fixed length Variable length character string where n is the maximum length Character large data stores more than 4 GB of character data Fixed length binary string where n is the fixed length. Binary large object stores binary data True or False values Variable length binary string where n is the maximum length Integer numerical where p is the precision Integer numerical with a precision of 5 Integer numerical with a precision of 10 Integer numerical with a precision of 19 Decimal with a precision of p and a scale of s. Example: decimal(6,1) means the number has six digits, five of which are to the left of the decimal and one to the right of the decimal. Year, month, and day values Hour, minute, and second values Year, month, day, hour, minute, and second values
DATE TIME TIMESTAMP
Database Management System (DBMS)
141
Database Management System (DBMS) A database management system (DBMS) is an application that is used to create and manage a database. Think of a DBMS as a file clerk working in your organization’s file room. Information is sent to the file clerk who then files the information using some form of filing system. You really don’t care how the information is stored, as long as the file clerk can retrieve the information when you request it. The same is true with a DBMS. There are two major types of database used by business today. Relational database utilize a tabular structure that is quite successful for structured queries. In addition there are NoSQL databases that are not structured and so have been found to be effective for mining big data application. The relational database administrator tells the DBMS the type of data (data type) that will be stored and the relationship of the data (tables). The DBMS determines how the data is physically stored. The database administrator and programmers who develop database applications send requests for information to the DBMS using a language that is understood by the DBMS—usually Structured Query Language (SQL). You’ll learn more about SQL later in this chapter. The DBMS then searches the database and sends the requested information. There are many DBMS available, but the most commonly used are MySQL, Oracle Database, SQL Server by Microsoft, DB2 by IBM, and Sybase. Chances are pretty good that your organization uses one of these systems. A DBMS usually runs on a database server connected to your organization’s Intranet. As you’ll recall from Chapter 4, a server is a dedicated computer that shares applications and data with other computers throughout the organization. A database server is dedicated to running the DBMS and storing databases. If you use Office Professional, you might use Microsoft Access, which is left off the DBMS list above. Microsoft Access is a DBMS suited for smaller, single-computer database applications but lacks the robust capacities for an organization-wide database application. Microsoft Access is a great application to use if you want to dabble in designing your own database—give it a try. And if you want to get serious about designing a database then you might use Google MySQL. MySQL is a free downloadable DBMS that you can run on your computer.
Data Modeling The database designer develops two designs of the database. The first is called the logical database design where data is organized into a logical arrangement (Figure 5.2). This is the result of entity analysis, where data is organized into customer information, order information, and similar groupings based on workflow analysis. For more information on entity analysis, see “Developing a Database Application” earlier in this chapter.
142
Chapter 5: Data and Databases
Figure 5.2: The logical database design identifies information that is used in a workflow.
The second is called the physical database design. This is where the logical arrangement of data is rearranged into tables of the database (Figure 5.3). Many times, a table and entity match, such as when a customer information table contains entity attributes about a customer. However, there are times when the database designer needs to break the link between entity and table in order to develop an efficient way to store data.
Data Modeling
143
Figure 5.3: The physical database design requires that you identify column names, the data type(s) of the data stored in the column, whether or not the column can remain empty, and rules to be applied to the column.
A critical aspect of the physical database design is to make sure that each row is uniquely identified. Imagine having several customers named Bob Smith. It would be a nightmare trying to find the right Bob Smith without a unique way to identify him. The database designer uniquely identifies a row by a primary key. Think of a primary key as the key to finding the right Bob Smith. The term primary key might be new to you but the concept is familiar. Your Social Security number, your employee number, and your driver’s license number are all ways to uniquely identify you. These are primary keys of tables that contain your information. Yes, we’re all a number in someone else’s database. Sometimes there is an obvious primary key, as with your Social Security number. Other times, an application generates the unique identifier—such as assigning a new customer a customer number—especially when we want to preserve the confidentiality of the Social Security number. Still other times the application generates a unique but meaningless number to simply provide a way to uniquely identify each row in the table. The number is used internally to the database. Ideally, the value of one column is the primary key: the Social Security number. However, the database designer could assemble values of multiple columns to create a primary key. This is referred to as a concatenated primary key. For example, combining a person’s first and last name with the person’s address has a high probability of uniquely identifying the person. Of course, there is always a chance that two people
144
Chapter 5: Data and Databases
with the same name live at the same address and are in the same database. In this case, the name and address would not uniquely identify the person’s row in the table. Some large merchants use a customer’s phone number as a way to uniquely identify the customer. This works if every customer has a cell phone, but there are some customers who still use landlines available to anyone living in the home. Two partners may each have an account with the merchant and have the same phone number identifying the account if both use a landline. This doesn’t matter much when the cashier recalls the customer information by asking the customer for the telephone number. Both accounts are displayed, and further inquiry by the cashier easily identifies the customer. A cell phone number is better, but in both situations those numbers can change at any time, causing chaos for the merchant. Database Models There are different ways of organizing data in a database, which is referred to as a database model. Here are some different types of database models. –– Hierarchical Database Model: The hierarchical database model is considered the first database model developed in the 1960s by IBM. The hierarchical database model is a tree-like structure where each branch is a record (row). Records are associated with other records in a parent/child relationship similar to tree limbs. A disadvantage is that each child has one parent—as a result, data is duplicated throughout the database, making it challenging when updating data; you must update each location of the data. –– Network Database Model: The network database model organizes data so that a record (row) can have multiple parents, which is a more natural association between entities but leads to a cobweb design. The network database model was popular in the early 1980s when the need for limited flexibility and fast processors were only available on mainframe computers. –– Relational Database Model: The relational database model provides a structure for linking together rows of different tables using a primary key. There is no parent/child relationship. The relational database model is commonly used today because of the need for flexibility when retrieving data. The application needs only to access tables that contain needed information. No other table needs to be accessed. Furthermore, processor speed has come a long way from the 1980s and can link together many times without influencing response time. –– Object-Oriented Database Model: The object-oriented database model organizes information into objects as opposed to tables in the relational database model. An object can be a person, place, thing, or event. An object has both data and behaviors (functionality). Data describes the object and behaviors contain instructions that manipulate the object. You’ve seen the use of the object-oriented model in animation films such as Cars. Each car is an object that has data that describes the car and behaviors that are like functions that manipulate the car. Collectively, this is an object. The instance of the car object such as Lightning McQueen has data that describes attributes of Lightning McQueen and behaviors that tell the process how Lightning McQueen should act on the screen. A commonly used object-oriented database is InterSystems Caché, which is used by popular electronic healthcare records applications. More on object-oriented design can be found in Chapter 8. –– XML Database Model: The XML database model is used to store whole documents. Chapter 4 explains XML and details how it is a markup language that tells the browser how to display text. XML stands for Extensible Markup Language. XML is also a markup language that describes text within an XML document. Notice there is no mention about a browser. You can make up your own XML tags to describe text—no one else will know what you’re talking about unless you share the
Relational Database
145
definition of the XML Tags with them. More on XML is featured in Chapter 9. For now, all you need to know is that some industries have agreed on a set of XML tags that are used for sharing documents. For example, publishers create an XML document that describes a book. The document is electronically transmitted to Amazon, who has an application that extracts specific information (title, subtitle, author) using the agreed upon XML tags to display on its website. XML documents can be stored in a XML database but database structure lacks the flexibility seen in a relational database.
Relational Database A relational database is a database model that organizes data into logical groups called a table similarly to an Excel worksheet. A table contains like information such as a customer’s name and address. Rows contain a customer’s name and address. This is referred to as an instance of a customer. There can be many tables, each containing data about a specific entity: customer or order. In a relational database, two tables can be linked together using a value that is common to both tables. The values create a relationship between the tables. Let’s take a look at how this happens. There are two tables: customer and order. Customers are uniquely identified by customer number, which is referred to as the primary key of the customer table. Each order is uniquely identified by an order number, which is its primary key. Every order is placed by a customer. Rather than storing the customer’s information in every order, customer information is stored in the customer table and order information is stored in the order table. The customer number of the customer who placed the order is also stored in the order table. When the application wants to display the order, it retrieves information for the order from the order table, and then uses the customer number in the order table to relate the order with the corresponding customer information in the customer table using the customer number. Once tables are linked with the customer number, the application treats both the order table and the customer table as one table. The unique value that identifies each customer is the customer number. In the customer table this is referred to as the primary key. The customer number in the order table is referred to as a foreign key, which is a primary key of another table. A relational database has the following rules: –– Each table must have a unique name. –– Each column in the table must have a unique name. –– Each row must be uniquely identified (primary key). –– The order of columns and rows is irrelevant.
146
Chapter 5: Data and Databases
The Relational Database Advantage Imagine for a moment that orders are stored as Word documents. Let’s say that a customer places 100 orders throughout the year, each having the customer’s name and address and a contact person. This year, the contact person is replaced. Last year’s orders refer to a contact person who no longer exists. You need to update the contact person’s information 100 times. There must be a better way. Imagine the amount of space taken up by the customer’s name and address. Now multiply it by the number of orders placed by the customer, and multiple that by the number of times orders are replicated (backed up). And multiply that number by the number of customers. For a large organization, this is a lot of space. There must be a better way. And there is a better way by storing orders in a relational database. A key design element is to remove duplicate data from the relational database and store information in one place. Customer information is stored in a customer table and is referenced anywhere that customer information is needed. Any change, such as information about the contact person, is changed in one place and automatically made available wherever the contact person information is needed.
Referential Integrity Success of a relational database depends on the integrity of foreign keys. As long as customer information exists in the customer table, the database application can recreate an order by accessing order information from the order table and then linking the order to the customer information in the customer table. This is called referential integrity. The reference to the customer information exists. However, the order cannot be recreated if the customer was deleted from the customer table because they are no longer considered a customer. Referential integrity must be maintained in a relational database. This is not to say that information in a relational database cannot be deleted. It can as long as referential integrity rules are followed. However, sometimes deleting information from the database doesn’t make sense because all history of a transaction can be lost. Here are the deletion rules. Restrict Delete Rule: The restrict delete rule states that no parent side information in a parent table (row) can be deleted unless all corresponding rows in children tables are also deleted. The customer is the parent and the orders are children—an order cannot exist without a customer. The Cascade Delete Rule: The cascade deletes rule states that parent information and corresponding children information must be deleted automatically when the parent information is deleted.
Relational Database
147
Relational Database Design Entity analysis of the workflow by the database designer identifies entities and data associated with the entity. The database designer maps entities to the physical database to create the database schema (Figure 5.4). Think of the database schema as the layout of the database. For the most part, each entity has its own table. However, there are times when this doesn’t make sense, especially when there can be many values for the same data.
Figure 5.4: Each entity can usually be translated into a table where its attributes are columns except when the attribute can be calculated, such as totals.
For example, a customer typically purchases multiple products on the same order. You would expect to find each item in the order table. However, there is a problem. Each product has a quantity, price, and product number. And there can be an endless number of products purchased on the same order. You simply can’t fit all this information neatly in a table that contains rows and columns. You would simply run out of space. The database designer avoids this problem by creating another table—an order products table—that contains products purchased on the same order. Database designers take their time to identify all conflicts that may arise in the design of the database because it is challenging to change the database design once the database is in production.
Normalization The database design has the somewhat daunting task of removing duplicate data for forming the logical database design. This process is called normalization. You don’t need to normalize data, but seeing how it is done gives you an insight into the database designer’s effort to streamline the database. The goal is to minimize data redundancy in the database to prevent having to modify the same data located in several tables and to reduce the size of the database. Figure 5.5 illustrates the results of normalization. Here is the mapping of complex relationships into tables.
148
Chapter 5: Data and Databases
Figure 5.5: Normalization removes duplicate data in the database. Here, customer information appears only in the customer name. Only the customer number appears in the customer table and the order table.
The normalization process follows rules called normalization forms. These are: –– First normal form: Remove multi-valued attributes by placing them in their own tables, such as customer name and address. –– Second normal form: A primary key must identify groups of non-key attributes. A customer number uniquely identifies customer information. –– Third normal form: The primary key must not be determined by another attribute. Simply, the value of the primary key cannot change. Database designers try to avoid over-normalizing data, which results in too many tables and too much processing time to link together tables.
Index The database designer can request that a column be indexed. An index is similar to an index of a book. The index is used to quickly search for information in a table. For example, the customer last name and first name collectively are ideal columns to be indexed—index keys—since they will frequently be the search criteria. You’ve probably used something like this when trying to locate an employee in a database when you don’t remember their employee number. The search brings up everyone that has your search criteria along with other information that helps to identify the person. The index is a separate table that contains two columns. One column contains the value called a primary key—and the other column contains the number of the row that contains that value. For example, a customer number is a primary key that uniquely identifies each customer. This is in one column. The number of the row in the corresponding table that contains the unique value is in the second column of the index. When searching the database for a specific customer number, the index is searched. When the customer number is found, the second column containing the corresponding row number is used to quickly find the customer’s information in the table.
Structured Query Language (SQL)
149
The database designer requests that a column be indexed, but it is up to the DBMS to decide whether or not to index the column. While indexes are useful for finding information quickly, there is a downside too. Each index needs to be revised each time the data changes. The DBMS determines if it is beneficial to create the index. Many times, it is quicker to create the index at the time of the request rather than to update the index each time the information changes.
Structured Query Language (SQL) The systems analyst defined the workflow in pseudo code. The database designer identified data required for the workflow in the logical database design. And the database administrator created a physical database design. Now it is time to build the actual database and tables, and to enter and retrieve data. Think of the DBMS as a file clerk who is responsible for all the information in the organization. The file clerk takes care of all the details. All you need to know is how to communicate with the file clerk. Interacting with the DBMS is practically the same. The DBMS deals with the details of how data is stored and retrieved. You need to speak the language to interact with the DBMS, and that language is called Structured Query Language (SQL). The database administrator needs to be a master of SQL in order to create databases and tables and to manipulate data stored in databases. Programmers who write database applications also need to be versed in SQL to write instructions within the application to interact with the DBMS. This is referred to as embedded SQL because the SQL statements are incorporated into instructions within the application. You probably won’t need to master SQL to access data in a database because nearly all your interactions with the database are by using a database application. You request information on the screen, click a button, and the database application sends instructions in SQL to the DBMS to retrieve the information for you. However, SQL is a relatively straightforward language to learn, and knowing the basics gives you insights into how the database administrator and programmers will respond to your request for information. You can even dabble with SQL using MySQL DBMS on your own computer. Try it using SQL statements in the remainder of this chapter and you’ll surprise yourself with how well you develop your own database. Let’s explore SQL. A DBMS usually has an interactive user interface referred to as an SQL console to directly communicate with the DBMS without the use of a database application. For example, MySQL Workbench is the interactive user interface for the MySQL DBMS, which is downloadable at no charge, as is MySQL DBMS (Google MySQL for the link). This is where you enter SQL statements.
150
Chapter 5: Data and Databases
SQL and Standards The keystone to technology is standards. Technology works fine if everyone follows the same standards. NASA lost a Mars orbiter because of a breakdown in communication that prevented navigation information from transferring between the Mars Climate Orbiter spacecraft team at Lockheed Martin and the flight team at NASA’s Jet Propulsion Laboratory. Lockheed Martin engineers used US customary units and the NASA team used the metric system for spacecraft operations. They used different standards. You might be wondering why this is relevant to SQL and DBMS. There is a standard SQL and a notso-standard SQL. Most DBMS recognize standard SQL but there are exceptions. This is not because of a goof or being simply stubborn to conform, but primarily due to the evolution of standards. In the early 1970s, IBM developed the first DBMS along with SQL. The marketplace had growing demands that were fulfilled by entrepreneurs such as Larry Ellison and associates who developed Oracle DBMS. With each new DBMS came additions to SQL that satisfied a particular need in the market place. As a result, there became dialects of SQL; some SQL statements were understood only by particular DBMS. Each dialect had its own words for the same functionality. Standards come about when engineers realize the need to make products uniform. All DBMS should understand the same SQL statements so that organizations won’t need to rewrite SQL statements in their database applications if they switch to a different DBMS. There are organizations that establish standards—the most well-known is the International Organization for Standardization (ISO). In the United States, there is the American National Standards Institute (ANSI) and in Europe there is the European Committee for Standardization (CEN). Engineers propose standards to the standards body. In the case of SQL, they propose a word(s) and a definition for the word such as CREATE DATABASE to create a database. Each DBMS manufacturer makes sure that its DBMS recognizes the word(s) and performs the desired functionality. The standards body doesn’t care how it’s done as long as the functionality conforms to the standards. DBMS manufacturers each campaign for their word(s) to be adopted as the standard so they don’t have to change their product. There can be a lot of politics involved in creating a standard. Changes in technology occur quickly as new technology is introduced to meet demands of the market place. However, standardizing on changes takes time for the standards boards to meet and agree on standards. It is for this reason—and probably some stubbornness by a few DBMS manufacturers—that there still remains some DBMS not understanding standard SQL. The manufacturer has features that are not standard as yet, or they don’t feel there is a benefit to conform to the standards.
SQL Basics SQL is like any language—it has a vocabulary and a structure. Vocabulary consists of words that, when placed in a specific structure, form a statement. One or multiple statements are used to create a message to the DBMS referred to as a query. Let’s write a query that creates a database (Example 5.1). This query is probably self-explanatory. CREATE DATABASE tells the DBMS to create a database. What follows is the name of the database: MyFirstDB. Notice that the statement ends with a semicolon—think of this like a period used at the end of a sentence.
Structured Query Language (SQL)
151
Example 5.16:
CREATE DATABASE MyFirstDB; Also notice a mixture of uppercase and lowercase. The DBMS doesn’t care what case is used because SQL is not case-sensitive. A mixture of cases is traditionally used to make it is for us to read. This is referred to as a naming convention. Usually an organization adopts a naming convention for all applications and database to avoid any confusion when working with data. Here are a few tricks used by some programmers and database administrators to make queries easy to read. –– SQL words—words that must be spelled correctly—are in all uppercase. Non-SQL words—names of databases and tables—are words that you create. Any combination of letters can be used, even to form nonsensical words. However, it is important to use descriptive word(s) such as CustomerTB or OrderDB so there is no doubt about the information being referenced. –– Capitalize the first letter of each word of the non-SQL words. This makes it easy to read concatenated words such as MyFirstDB. SQL doesn’t use spaces between non-SQL words so it is common to concatenate words in order to convey a meaning. This is referred to as CamelCase because the capital letters resemble a camel hump. –– An abbreviation is used to further identify the non-SQL word(s) such as DB for database, TB for table, and INX for index. Reading the name CustomerTB infers that this is the name of the customer table. Elements of SQL SQL has three elements. These are: –– Data definition language (DDL): DDL consists of commands used to define a database and tables. –– Data manipulation language (DML): DML consists of commands used to maintain the database and to query a database. –– Data control language (DCL): DCL consists of commands that control database functions.
Creating a Table Tables can be added to a database once the database is created by using the CREATE TABLE statement (Example 5.2). Remember, a database table is similar to a worksheet in an Excel workbook. However, a database table requires that each column explicated be defined when the table is created. Columns are defined within parentheses of the CREATE TABLE statement. Each column is defined by a name and data type followed by a comma. The data type is described in Table 5.1. Example 5.2 defines the customer table. Notice that
152
Chapter 5: Data and Databases
nearly all columns are designated as NOT NULL. You’ll remember from earlier in this chapter that NOT NULL means there must always be a value entered in the column. That is, there can’t be a customer without a customer number, name, and address. The last line in the CREATE TABLE statement is PRIMARY KEY. A primary key is a value that uniquely identifies each row in a table. However, if the DBMS is told to use the CustomerNumber column as the primary key since this uniquely identifies each customer. Here are a few more rules. No spaces are permitted in the database name, table name, or column names. Each column name must be unique within the table and a semicolon must follow the closing parenthesis to indicate the end of the statement. Example 5.2 can be considered a query and if entered into a DBMS console would create a table called CustomerTB in the current database. Database administrators typically combine multiple SQL statements within the same query, and each statement is executed when the DBMS processes the query. For example, a query may have statements that create all tables for the database rather than creating them one at time. Example 5.17:
CREATE TABLE CustomerTB ( CustomerNumber INTEGER NOT NULL, CustomerFirstName VARCHAR (30) NOT NULL, CustomerMiddleName VARCHAR (60), CustomerLastName VARCHAR (60) NOT NULL, Street VARCHAR (60) NOT NULL, City VARCHAR (60) NOT NULL, State VARCHAR (2) NOT NULL, PostalCode VARCHAR(10) NOT NULL, PRIMARY KEY (CustomerNumber) );
Automatically Validating Data A DBMS can be told to validate data before placing data into a column in the database. This is referred to as a constraint for a column. A constraint can be any logical expression that has a true or false result. The expression is defined using the SQL CHECK clause within the CREATE TABLE statement. An SQL clause is like a dependent clause in an English sentence that expresses a partial thought but cannot stand alone. Example 5.3 creates a small invoice table. The gross amount contains the total invoice amount. The net amount column contains the amount that is charged to the customer and reflects the discount based on the contract with the customer. Each
Structured Query Language (SQL)
153
customer has a different discount—however, the discount is never more than 50% of the gross amount. The DBMS is told to determine if the net amount is greater than 50% using the expression in the CHECK clause. As long as the expression is true and the discount isn’t greater than 50%, the row is entered into the table—otherwise, an error is returned by the DBMS. Example 5.18:
CREATE TABLE InvoiceTB ( CustomerNumber INTEGER NOT NULL, GrossAmount NUMERIC NOT NULL, NetAmount NUMERIC NOT NULL, CHECK (NetAmount > (GrossAmount * .5)) );
Modifying the Database SQL enables you to fix errors by adding or dropping a column or table. However, doing so also can affect the integrity of the database because data is also removed. Database administrators typically make modifications during testing before any real data is entered into the database. Example 5.4 adds a second address column to the customer table. ALTER TABLE identifies the table and ADD defines the new column. Example 5.5 removes the second address column from the customer table using DROP followed by the name of the column. The column that is dropped cannot be the column used as the primary key for the table, otherwise relationships with other tables become broken (see “Referential Integrity” earlier in this chapter). The entire table can be removed, as shown in Example 5.6. The database itself can also be removed, as shown in Example 5.7. Example 5.19:
ALTER TABLE CustomerTB ADD Address2 VARCHAR (60); Example 5.20:
ALTER TABLE CustomerTB DROP Address2; Example 5.21:
DROP TABLE CustomerTB;
154
Chapter 5: Data and Databases
Example 5.22:
DROP DATABASE MyFirstDB;
Creating an Index Previously in this chapter you learned that an index is used by the DBMS to locate data quickly in the database much like you use an index of a book to find information. An index is like a two-column table where one column contains data and the other column identifies the row in the table that contains that data. The index is placed in sort order (numerical or alphabetical) and is used automatically if the DBMS decides that the index will speed up the search for the data. The DBMS decides whether or not to use the index based on the search criteria and the way the DBMS organizes the database. You can explicitly tell the DBMS to create an index using the CREATE INDEX and the ON clauses when creating a table in the database. As shown in Example 5.8, an index called CustomerNumberINX is for the customer table and uses the customer number column number as the value of the index. Example 5.23:
CREATE INDEX CustomerNumberINX ON CustomerTB (CustomerNumber); More than one column can be used to create the index. This is referred to as a cluster index. A common cluster index is used for the customer name and is used to search for a customer when the customer number is unknown. Example 5.9 shows how a clustered index is created. This is similar to creating an index, with the difference being that more than one column name is used in the ON clause, each separated by a comma. The name of the index should indicate the nature of the index. Here it is called CustomerLastNameFirstNameINX. This refers to the customer table using last and first names. Notice that the last name, then the first name, is specified. This is important because the search begins with last name, then first name. There is a lower probability that there are multiple patients with the same last name so searching stops after the value of the last name changes. The DBMS then returns rows that contain the last and first names that match the search criteria. Example 5.10 shows how to remove an index. Example 5.24:
CREATE INDEX CustomerLastNameFirstNameINX
Structured Query Language (SQL)
155
ON CustomerTB (CustomerLastName, CustomerFirstName); Example 5.25:
DROP INDEX CustomerLastNameFirstNameINX;
Insert Data Entering data into a table is straightforward by using the INSERT INTO statement (Example 5.11). The name of the table is identified and then the VALUE clause is used to specify data for each column. The DBMS presumes that the order of the data matches the order of columns in the table because column names are not specified in the statement. Each data is separated by a comma. Character data are enclosed within single quotations. No quotations are necessary for numeric data such as customer number. Example 5.12 demonstrates how to insert data into specific columns in the table. Column names are placed within the first set of parentheses followed by the VALUES clause. The DBMS places data into the corresponding column. You can change the order of the columns without impacting the data as long as the column and the data are in the desired sequence. Example 5.26:
INSERT INTO CustomerTB VALUES (12345, 'Mary', 'Margret', 'Jones', '555 Maple Street', 'Any City', 'NJ', '07660'); Example 5.27:
INSERT INTO CustomerTB (CustomerNumber, CustomerFirstName, CustomerLastName, Street, City, State, PostalCode) VALUES (12345, 'Mary', 'Jones', '555 Maple Street', 'Any City', 'NJ', '07660'); Retrieving Information The SELECT statement is used to retrieve data from a table. The SELECT statement lists the column names of the data that you want returned by the DBMS. The FROM clause specifies the table that contains the data. Example 5.13 retrieves all column and all rows from the customer table. Only include column names of data that you want retrieved. You don’t need to retrieve all columns.
156
Chapter 5: Data and Databases
Example 5.28:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName, Street, City, State, PostalCode FROM CustomerTB; The WHERE clause is used in the SELECT statement to retrieve specific rows of data from the table. The WHERE requires a conditional expression that returns a true or false value. Notice there is no semicolon after the FROM clause. You can see this in Example 5.14. The DBMS is asked to return the customer number and customer name of all rows from the customer table where the value of the customer last name column is equal to ‘Jones.’ Multiple conditions can be used in the WHERE clause to further fine-tune the search. Example 5.15 asks the DBMS to return rows where the customer last name is ‘Jones’ and customer first name is ‘Mary.’ The OR operator can be used in place of the AND operator to indicate that data should be returned if either condition is true. Comparison operators can be used in the conditional expression for the WHERE clause. These are particularly useful when working with numeric values. Table 5.3 contains comparison operations that can be used with the WHERE clause. Example 5.29:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName FROM CustomerTB WHERE CustomerLastName = 'Jones'; Example 5.30:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName FROM CustomerTB WHERE CustomerLastName = 'Jones' AND CustomerFirstName = 'Mary'; Table 5.3: Table 5.2 Comparison operators Operator
Description
= < > =
Equal to Less than Greater than Less than or equal to Greater than or equal to Not equal to
Structured Query Language (SQL)
157
Pattern Matching There are times when you might be unsure of the search value—you have a partial value but not the whole value. You can use pattern matching in the search criteria to ask the DBMS to return rows that contain the partial value. Pattern matching uses a wild card character in place of the unknown character. Wild card characters are: –– Underscore (_): This represents any signal character. For example, ‘J_n_s’ returns any value that begins with J, ends with s, and has an n as the second character. –– %: This represents zero to any number of characters. For example, ‘Jo%’ returns any value that begins with the letters ‘Jo.’ Pattern matching is used with LIKE or NOT LIKE operators. The LIKE operator tells the DBMS to return rows with values like the value in the pattern matching expression. The NOT LIKE operator tells the DBMS to return rows that have values unlike the pattern matching expression. The following example (Example 5.16) tells the DBMS to return all columns of rows where the customer last name begins with ‘Jo.’ Example 5.31:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName FROM CustomerTB WHERE CustomerLastName LIKE 'Jo%';
Searching for Ranges The BETWEEN clause is used to return rows whose values fall within a specified range of values. For example, let’s say you want a list of customers who placed an order between 1/1/2020 and 12/31/2020. Here’s the expression in the WHERE clause: WHERE OrderDate BETWEEN '1/1/2020' AND '12/31/2020'; Likewise, the NOT clause can be used to exclude a range in rows returned by the DBMS as shown here. WHERE OrderDate NOT BETWEEN '1/1/2020' AND '12/31/2020'; There are occasions when you want to search for a set of values rather than a range of values. For example, you may want rows of customers who live in a specific set of postal codes. You can specify a set as criteria for the WHERE clause by using the IN clause. The IN clause contains the set of values to be searched. Here is the WHERE clause that searches for a set of postal codes:
158
Chapter 5: Data and Databases
WHERE PostCode IN ('07552', '07660');
Changing Values Values in a column can be changed by using the UPDATE statement. The UPDATE statement has two clauses. The SET clause specifies the column name and the new data value, and the WHERE clause specifies the row that is being updated. Existing values are overridden and cannot be restored if overridden in error. In Example 5.17 the customer has a new address. The DBMS is told to override the existing values of the address columns. Notice that the WHERE clause identifies the row that is being updated. Excluding the WHERE clause causes the DBMS to update the specified column of all rows with the value specified in the SET clause. Example 5.32:
UPDATE CustomerTB SET Street = '777 Some Street', City = 'Some City', State = 'NJ', PostalCode = '07777' WHERE CustomerNumber = 12345;
Deleting a Row A row can be removed from a table by using the DELETE FROM statement and specifying the table name. The WHERE clause is used to identify the row. Example 5.18 deletes all orders associated with the customer number 12345. Excluding the WHERE clause causes all rows in the table to be deleted. There is no way to restore a deleted row. Example 5.33:
DELETE FROM OrderTB WHERE CustomerNumber = 12345;
Calculating Values in a Column Earlier in this chapter you learned that the database designer excludes data that can be calculated from the database because it can be calculated whenever the data is required. Here’s how the calculation is done. SQL has built-in calculations referred to as column functions that instruct the DBMS to perform various tasks. Column functions can be used in a SELECT statement to return the calculated value. Table 5.4 contains commonly used column functions.
Calculating Values in a Column
159
Table 5.4: Commonly used column functions Calculate the Sum of Values
Return the Lowest Value
SELECT SUM(NetAmount) FROM CustomerInvoiceTB;
SELECT MIN(NetAmount) FROM CustomerInvoiceTB;
Count the Number of Rows that Contain Values in Return the Highest Value the Specified Column SELECT COUNT(NetAmount) FROM CustomerInvoiceTB;
SELECT MAX(NetAmount) FROM CustomerInvoiceTB;
Count the Number of Rows Regardless of Values Return the Average Value SELECT COUNT(*) FROM CustomerInvoiceTB;
SELECT AVG(NetAmount) FROM CustomerInvoiceTB;
Multiple column functions can be used in a SELECT statement to return different calculations by placing each column function in the SELECT statement separated by a column. This is illustrated in Example 5.19 where the DBMS calculates and returns the minimum, maximum, and average values of the NetAmount column in the customer invoice table. Example 5.34:
SELECT MIN(NetAmount), MAX(NetAmount), AVG(NetAmount) FROM CustomerInvoice TB; Column functions can also be used in the WHERE clause as part of the logical expression in the selection criteria. In Example 5.20 the DBMS calculates the average net amount, compares the average net amount to the value in the NetAmount column, and returns the customer number for customers whose net amount is greater than the average net amount. Example 5.35:
SELECT PatientMRN FROM PatientBillTB WHERE NetAmount > AVG(NetAmount);
Remove Duplicates The DISTINCT clause is used to remove duplicate values from the result returned by the DBMS. Let’s say you are in a healthcare facility and you want to identify practi-
160
Chapter 5: Data and Databases
tioners who have patients currently admitted to the facility. Each practitioner is likely to have more than one patient. The patient table has a practitioner identification column. You can ask the DBMS to return practitioner identification for those patients. However, the returned list of practitioner identification will likely have duplicates. You can eliminate duplicates by using the DISTINCT clause in the query. This is shown in Example 5.21. Example 5.36:
SELECT DISTINCT PractitionerID FROM PatientTB; The DISTINCT clause can also be used in column functions. In Example 5.22, the DISTINCT clause is used to count the number of practitioners excluding duplicates. Example 5.37:
SELECT COUNT (DISTINCT PractitionerID) FROM PatientTB;
Organizing Data The DBMS can organize data returned from a query by using the GROUP BY clause. The GROUP BY clause is used to organize return values by a column specified in this clause. Let’s say that you want to see a list of products purchased by customers, organized by customer. The GROUP BY clause will do this for you as shown in Example 5.23. Example 5.38:
SELECT ProductNumber, ProductDescription FROM OrderTB GROUP BY CustomerNumber; The GROUP BY clause can also be used with the COMPUTE clause and a column function to calculate by grouped values. The COMPUTE clause tells the DBMS to calculate a column based on a value in another column. The customer invoice table may have multiple entries of customers who have placed multiple orders. In Example 5.24 the GROUP BY clause combined with the COMPUTE clause is used to total the net amount for each customer and group the results by customer number.
Joining Together Tables
161
Example 5.39:
SELECT CustomerNumber FROM CustomerInvoiceTB GROUP BY CustomerNumber COMPUTE SUM (NetAmount) BY CustomerNumber; You can limit rows returning by the GROUP BY clause by using the HAVING clause. This might sound a little confusing, but think of the HAVING clause as the WHERE clause, enabling you to specify a search condition. Example 5.25 shows how this is done. Here, only rows where the value of the NetAmount column is greater than 10,000 will be included in the group. Example 5.40:
SELECT CustomerNumber FROM CustomerInvoiceTB GROUP BY CustomerNumber HAVING NetAmount > 10000; The DBMS can place results in a sort order by using the ORDER BY clause. The ORDER BY clause specifies a column that will be used to sort the data. Let’s say that you want to display customer information sorted by customer last name. Example 5.26 shows how this is done. The ORDER BY clause can be used in conjunction with the GROUP BY clause to sort values by group. Example 5.41:
SELECT CustomerNumber, CustomerFirstName, CustomerMiddleName, CustomerLastName FROM CustomerTB ORDER BY CustomerLastName;
Joining Together Tables As you learned earlier in this chapter, in a database, data is organized in multiple tables based on the normalization rules to reduce the amount of duplicate data stored in the database. Every row in each table is uniquely identified by a value in one or more columns called a primary key. The value of a primary key is also being stored in a column of another table referred to as a foreign key and is used to link both tables called a join. Sound confusing? Let’s see how this is done.
162
Chapter 5: Data and Databases
The customer table contains information about customers. The order table contains information about orders placed by customers. Rather than repeating the customer information in the order table, the customer information is placed in its own table identified by customer number. The customer number is the primary key of the customer table because each customer has a unique customer number. The orders in the order table are identified by an order number that is also a unique number and the primary key of the orders table. The order table also has the customer number of the customer who placed the order. This is the same customer number that exists in the customer table. The customer table and the order table can be linked together (joined) by telling the DBMS to match customer numbers in both tables. There is always be a match because an order doesn’t exist unless there is a customer. The DBMS joins together tables based on the data in columns, not by the column name. Column names can be different. As long as the data in the columns are the same, the DBMS can join the tables. A join is created in a query by specifying names of the tables in the FROM clause and using a logical expression in the WHERE clause that tells the DBMS which columns to join together. Take a look at Example 5.27. The WHERE clause creates the join by specifying the condition for the join: the customer number in both tables must match. Let’s see what’s happening here. Example 5.42:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName, OrderNumber, OrderDescription FROM CustomerTB, OrderTB WHERE CustomerTB.CustomerNumber = OrderTB.CustomerNumber; The SELECT statement contains the column names of data that needs to be returned by the DBMS. Notice that the customer number and customer name are from the customer table, and the order number and order description are from the order. The DBMS knows which tables contain the data because column names are unique to each table. The only duplicate column name is CustomerName; however, its data is the same in both tables. The FROM clause contains the names of both tables separated by a comma. The WHERE clause creates the link. Notice that the table name and the column name are both used for each table. The DBMS is explicitly told which CustomerNumber column to use because both tables have a column called CustomerNumber. The table name is specified first, then the column name separated by a period. You must reference the table name when using a column name only if the column name appears in both tables. A common problem is that the table name is long and takes up too much room when referencing the table name in the query. The work around is to declare an alias
Joining Together Tables
163
for the table name in the FROM clause. Think of an alias as an abbreviation. An alias can consist of one or more letters that are indicative of the table name. The alias is declared when you specify the table name in the FROM clause by placing the alias after the table name, separated by a space. Example 5.28 shows how this is done. The alias is then used in place of the table name throughout the query. Example 5.43:
SELECT CustomerNumber, CustomerFirstName, CustomerLastName, OrderNumber, OrderDescription FROM CustomerTB CT, OrderTB OT WHERE CT.CustomerNumber = OT.CustomerNumber;
Writing SQL Writing SQL is a skill worthwhile to developed especially for programmers, database designers, database administers and super users. A database designer designs the structure of the database and a database administer builds the database and manages the database. A super user is someone in the business unit who knows how to use to use technology more in-depth then other business users. A super user who knows how to write SQL can write reports and manipulate data without waiting for help from technicians.
Chapter 6 Talking Intelligently About Cybersecurity Mention cybersecurity, and images of government agents in undisclosed buildings— or that twelve-year-old sitting in her room on a laptop—trying to break into your computer flash through your mind. Or the countless number of annoying popups telling you it’s time to change your password on all your different accounts. Your password is too easy. Try again! Your password must have at least one capital letter, a number, eight characters, and a punctuation mark. Try again! You’ve used this password already. Try again! You can’t use the name of the organization or any part of the organization’s address. Try again! You can’t use the name of your college, the name of your town, your name, or names of any relatives. Try again! Enough already! But no. Each time you log in with your acceptable password, you still might not be trusted and asked to further authenticate yourself. The organization’s called you using your number that is on file. You’ll need to answer the call and press any key on the phone’s keypad to verify you are you. Or the organization sends you an authentication code via email that needs to be entered along with your ID and password. Then there is the real insult. Access to your favorite website is blocked on your work computer. To make matters worse, the organization’s cybersecurity department deactivates USB ports, making your flash drives—even for work—useless and they won’t reactivate it no matter how much doing so interferes with business operations. Welcome to the new world of cybersecurity.
Challenges of Cybersecurity Is the cybersecurity threat real or perceived? You need only to see the latest newscast to realize that cybersecurity breaches can be real and can have a dramatic effect on your personality, business, and governmental operations. However, there is always the lingering doubt, especially if you’ve never been the victim of a cybersecurity attack. Even with the defensive strategies imposed by the cybersecurity department of your organization there is no guarantee that computing devices are fully protected from a cybersecurity attack. Here’s why. Consider the number of programs that are used on a computing device. First there is the operating system (see Chapter 3), which is really a set of programs. Then there is the network operating system that controls transmission over the network (see Chapter 2). This too has a set of programs. There are applications and database management systems too. DOI 10.1515/9781547400812- 006
166
Chapter 6: Talking Intelligently About Cybersecurity
Unseen programs are contained in computer chips built into computing devices, referred to as an embedded processor. There are two types of embedded processors. These are a microprocessor and a microcontroller referred to as a programmable logic controller (PLC) (Figure 6.1). A microprocessor is the computer within a computer as discussed in Chapter 3. A microcontroller is a computing chip that controls electronic devices such as television, appliances, automobiles, and industrial equipment such as power plants and factories. The logic of some of these programs called firmware is etched into an integrated chip. An integrated chip is a set of electronic circuits on a semiconductor material usually made from silicon.
Figure 6.1: A microprocessor (left) and a programmable logic controller (right) control computing devices and industrial equipment.
Network communications (i.e., intranet and internet) are controlled by a series of programs working at each of the seven layers of the OSI model standard (see Chapter 2). These are (Figure 6.2): –– Layer 7 Application: The application layer prepares the message from the application to be transmitted over the communication network—selecting the send button in Outlook. –– Layer 6 Presentation: The presentation layer converts the message into a format that is understood by other layers in the OSI model. –– Layer 5 Session: The session layer establishes communication with the receiving network device and maintains communication until the message is delivered. –– Layer 4 Transport: The transport layer controls the flow of data and determines if there were any transmission errors. The transport layer also integrates data from multiple applications into a single stream of data. –– Layer 3 Network: The network layer determines the way the data is sent over the network (IP address). –– Layer 2 Data: The data layer defines the network type, packet sequencing (order in which packets are sent), and physical network protocols (rules) to use for the transmission. –– Layer 1 Physical: The physical layer is the hardware that controls the timing of data, connections, and voltages.
Challenges of Cybersecurity
167
Figure 6.2: Each layer of the OSI model can be a target of a hacker in an effort to interfere with data transmission.
Imagine for a moment hundreds of thousands of lines of instructions in these programs independently designed and written by a countless number of systems analysts, engineers, and programmers. Each program works perfectly (well, close) and delivers the expected result, but not every program is secured from a cyberattack because there are subtle gaps in instructions that can be exploited by a cyberattack. Sophisticated hackers go to extremes to identify these vulnerabilities of operating systems, microprocessors, database management systems, applications, and security defense strategies. The vulnerability is exploited until the gap in the program is fixed and distributed to computing devices. However, not all computing devices are upgraded frequently, exposing them to a cyberattack, and firmware is generally not subject to update. Of particular concerns are security measures taken by device manufacturers and manufacturers of operating systems because their programs are distributed to hundreds of thousands, or millions of customers. For example, a malicious program that infects a programmable logic controller has the capability of controlling critical operations in manufacturing plants and the electrical grid. Likewise, a malicious program that infects the manufacturer’s operating system product such as Windows will have control over every computing device that uses the operating system. One successful attack gains control of millions of industrial equipment devices and hundreds of millions of computing devices. The probability of securing every computing device from exploiting faults in programs directly or indirectly used by each computing device is zero. The best that can be accomplished is to fix vulnerabilities once they are identified, a little like closing
168
Chapter 6: Talking Intelligently About Cybersecurity
the barn door after the horse has left. And so the challenges of the cybersecurity department continue, as do the frustrations of changing passwords and conforming to other annoying cybersecurity requirements.
Cybersecurity Audit Before cybersecurity defensive measures are taken, an organization undertakes a cybersecurity audit conducted by the cybersecurity department, or an outside cybersecurity auditing firm is brought in to perform an objective audit. A cybersecurity audit is a process that reviews an organization’s cybersecurity risk, both electronic and physical. Both electronic and manual processes involved in recording, storing, and retrieving internal and external information is examined in a cybersecurity audit. This includes emails, database applications, and websites—any interaction that exchanges information is carefully analyzed to assess risks to the organization. The cybersecurity audit ensures the integrity of information by identifying both the possibility and probability that someone might tamper with the organization’s information. The goal of the cybersecurity audit is to verify that information is consistently valid and is processed correctly and delivered accurately to the staff. The cybersecurity audit begins with a review of policies and procedures that govern how information is maintained and accessed. Policies should clearly state that only staff that requires information is authorized to access only the portion of the information needed to perform their duties and that staff must keep the information confidential. No one can share information with another unless that information is needed to perform their duties. The cybersecurity audit also identifies electronic controls to access the information and to detect unauthorized attempts to access the information. For example, cybersecurity security controls built into the application, database, or network verify that a login ID has access to information. Failed attempts to log in are electronically noted in a log and electronically reported to the cybersecurity department for investigation.
Database Access Points Auditors identify database access points. A database access point is an entrance into the database. Typically the organization’s information is stored electronically in a database located on a database server in the data center. A goal of a cybersecurity audit is to reduce the number of database access points, diminishing the threat to breaching the database (Figure 6.3).
Cybersecurity Audit
169
Figure 6.3: A data access point is where a program can access information that is stored in a database.
Here there are at least four database access points: 1. The database: A technician using a program may be able to retrieve data directly from the database. 2. The database management system (DBMS) (see Chapter 5): The DBMS responds to queries that can be sent by any program or written interactively by a technician. 3. The database application: The application sends queries to the DBMS that accesses the information from the database. 4. The network: Packets (see Chapter 2) containing information can be accessed as they travel over the network. A Little Secret There is a common risk facing many organizations that inadvertently defeat top-notch cybersecurity defensives: humans. Shareholders, banks, regulatory authorities, and other stakeholders make decisions based on reports provided by executives that describe the state of the organization. The presumption is that information contained in the report is generated from the organization’s secured databases. But is it? Reports contain information that is assembled from the organization’s secured databases using Office products such as Excel and Word. Although information from the database can be electronically transferred into Excel, it is likely that someone manually enters the information into Excel where the information is then analyzed, massaged, and then reported. The information is easily manipulated by executives, analysts, and executive assistants—practically anyone who has access to the Excel, Word, and PowerPoint files. Here’s another potential security breach. Information generated by the secured database is probably stored locally on unsecured computing devices (smartphones, tablets, laptops, flash drives) and distributed in printed reports, PDF files, presentations, and other unsecured electronic means. Each of these is a data access point and a high risk for breach. There is no need to use sophisticated techniques to break into the database when all you need to do is look in the trash (called dumpster diving) at the end of the day or pick up an executive’s mobile computing device to gain access to the information. Security breaches also happen when executives travel. Executives tend to forget that competing firms provide the same accommodations to their executives when traveling on business. They use the same VIP lounge waiting to board the plane; sit in the same business class on the plane; and sit in the same hotel lobbies, bars, and restaurants. When an executive opens their computing devices to work on presentations or analyze data, the information might be in plain sight of the competition.
170
Chapter 6: Talking Intelligently About Cybersecurity
Physical Access Data servers, application servers, and other computing devices used to store and access the organization’s information are housed in a secured location accessible by an authentication system. An authentication system is a system that controls access to secured facilities. You’ve seen these electronically readable secured ID card and biometrics, such as a fingerprint reader to gain access to the facility. An authentication system is not foolproof. For example, access is granted to any electronically readable secured ID card authorized to enter the area regardless of the person holding the card. Likewise, biometric readers typically have sensitivity controls that can be adjusted from 100% match to less than a 50% match, depending on the risk tolerance of the organization. Rarely does the fingerprint match perfectly to the biometric data on file. Dirt on the finger or on the glass or positioning the finger on the glass may interfere with a perfect match. Therefore, less than perfect is acceptable. An organization needs to determine if the system should identify or authenticate an employee. Identify or authenticate? Confused? There is a subtle difference. Identification is less accurate than authentication. For example, the readable secured ID card is used to identify a person. Of course, the card can be held by a different person other than the employee. A biometric iris scan is used to authenticate an employee. Only one person will match the iris measurement on file. There is no doubt that this is the employee. A highly secured area uses either a layered cybersecurity system or a multimodal cybersecurity system. A layered cybersecurity system uses biometric scanning and a non-biometric method (such as an ID and password) together to authentic an employee. In a multimodal cybersecurity system, more than one biometric scanning method is used.
Biometrics Biometrics measures your physical and/or behavioral characteristics. Measurements—the actual image or recording is not stored for biometric authentication—are stored in a central database accessed by authentication applications. However, there are some biometrics systems where measurements are stored on a smart card. There is no central repository of measurements. Instead, the authentication system compares your biometric measurements to measurements on your smart card. Don’t confused biometrics with images stored for human recognition. Images such as your picture are stored and used on photo ID cards and displayed on a computer screen to help another person recognize you. However, a computer can’t recognize you from the picture. The computer uses biometric measurements taken of the image to identify you.
Physical Access
171
An advantage of biometrics is that you’ll never forget or have to change your password because there isn’t a password. Access is granted by measuring physical characteristics of parts of your body, such as your fingerprints, face, irises, or even your veins. You can also be identified by your behavioral characteristics such as your handwriting, typing rhythm, and voice, which are difficult to copy (Figure 6.4).
Figure 6.4: Biometrics measures characteristics of unique parts of the body to authenticate a person.
Handwriting Handwriting measurements involve more than the shape of each character and how characters are connected. It also records the speed and rhythm of the writing and the pressure placed on the writing device. Also measured are the sequences of dotting i’s and crossing t’s. These measurements are nearly impossible to forget. A handwriting authentication system usually has a touch-sensitive screen and a smart pen that measures the angle at which the pen is held and the pressure placed on the pen. Initially you are asked to write a specific text three times. Each time, measurements are recorded and stored. When you sign in to gain access, the same measurements are taken and compared to the average measurements on file. The handwriting authentication system isn’t looking for an exact match because you probably never write the exact same way twice. Instead, a reasonably close match is accepted.
172
Chapter 6: Talking Intelligently About Cybersecurity
Hands and Fingers Hands and fingers have unique characteristics that are recorded by a hand and finger geometry reader. The hand and finger geometry reader is a digital camera and a lighting device. The reader has pegs used to align your fingers on the flat surface. One or more pictures of your hand are taken. The reader measures the length, width, thickness, and curvature of your hand and fingers, and then these measurements are stored in a central database. Hands and fingers have less distinctive characteristics than other biometric measurements. Furthermore, hands change as people age, which is why this biometric method is less accurate than other methods, and measurements need to be updated regularly to maintain their accuracy.
Voice Prints A voice print records characteristics of how you speak specific words. The recording is in the form of a spectrogram. A spectrogram is a graph that shows frequencies of the voice over time. Sounds created when you speak change the shape of the graph. The spectrogram also shows acoustical qualities of your voice such as pitch, duration, intensity, and timbre. Voice prints are particularly useful in situations when a person is not physically present, such as saying your password to access your voice mail. A major weakness in this system is that the voice can be previously recorded and played back to the system. Some voice print systems try to detect sound characteristics associated with a recording.
Iris Scanning The iris is the flat, colored ring behind the transparent layer forming the front of the eye called the cornea. The iris is visible but protected so it doesn’t change with age and remains unchanged even after eye surgery. Furthermore, contact lenses and eyeglasses don’t interfere with measuring the iris. An iris scanner consists of a digital camera that uses visible and near-infrared light to take a high-contrast picture of the iris. The pupil becomes black, enabling the iris-scanning program to clearly identify the iris. The eyes need to be positioned between three and ten inches from the camera. The iris-scanning program measures the center of the pupil, the edges of the pupil and iris, and the eyelids and eyelashes. More than 200 points of reference are measured and translated into a unique code that is stored. The probability of two people having the same measurement is very slim.
Physical Access
173
Fingerprints Tiny ridges on the finger form a pattern of ridges and valleys called a fingerprint. Slight differences in a fingerprint are used to uniquely identify a finger. A fingerprint scanner captures the image of a fingerprint, which is measured by fingerprint scanning program using approximately seventy points of reference to measure the fingerprint. The optical fingerprint scanner uses a digital camera to capture the image of the fingerprint. Ridges appear dark and valleys appear light. The optical fingerprint scanner takes a small sample of the image to determine if the image is too dark or too light based on the average pixel darkness. If the image is out of range, then the image is rejected. The exposure time is then adjusted before another image is taken. Once the image is clear, the fingerprint scanner determines how sharp the image is. This is called the image definition. The fingerprint scanner samples several horizontal and vertical lines across the image. Lines running perpendicular to ridges will be comprised of very dark pixels and very light pixels. The image’s measurements are compared to the measurements of fingerprints on file. A capacitive fingerprint scanner measures ridges and valleys using electrical current rather than light to generate the image measuring the voltage output to determine if a characteristic is a ridge or a valley. Fingerprint scanners compare features of a fingerprint called minutiae. Focus is on the points where ridge lines end or when ridges split, referred to as bifurcations. These form distinctive features known as typical. The fingerprint scanner measures relative positions of minutiae. A match occurs when a sufficient number of minutiae patterns are found in both fingerprints. The same technique is used to compare palm prints.
Veins A vein scanner measures the location and dimensions of veins in a hand, which are unique to each person. Although you can’t see your veins even if you hold your hand in front of a bright light, a vein scanner produces an image of veins using near-infrared light to identify a pattern of veins. Near-infrared light is the same light used in remote controls. Typically, veins in the hand are used by the vein scanner; however, veins in a finger or wrist can also be used. When a person places a hand—either front or back—on the vein scanner, a digital camera takes a picture of the hand. Veins appear black, as hemoglobin in the blood absorbs light. Hemoglobin is a component of red blood cells. Measurements are taken of the vein structure and are compared to measurements on file.
174
Chapter 6: Talking Intelligently About Cybersecurity
Facial Facial scanners measure landmarks on the face. Each landmark is referred to as a nodal point. There are approximately eighty nodal points that are measured. These include the distance between the eyes; the width of the nose; the depth of the eye sockets; the shape of the cheekbones; and the length of the jaw line. The measurement of each nodal point is used to create a numerical code called a faceprint, which is stored in a database. Facial scanning is less accurate if the face is scanned using two-dimensional technology in a non-controlled environment where lighting differences can throw off the measurement of the nodal points. Many facial scans use three-dimensional technology in a controlled environment, giving facial scanners the capability of recognizing a face even in a facial profile. Once the face is detected, the facial scanning program determines the alignment—the head’s position, size, and pose. The system then measures each nodal point to generate the faceprint. The faceprint is then matched to other faceprints in the database. Privacy Issues Biometric scanners can pick out the bad guys in a crowd. Of course, this assumes that the picture of the bad guy was taken with the latest facial recognition system and the faceprint is in the database used to find the bad guy. The rapid development of biometric systems has brought science fiction to reality. However, facial scanners are more likely to help investigators after the crime is committed than to help prevent a crime. Picking a face out of a crowd—not just a bad guy—is a slippery slope because in theory, any face can be linked to a wealth of information that has already been collected about a person. Hollywood depicts in movies and TV shows a massive database of information on everyone, linked to our Social Security number or other unique identifiers including our biometric print. But is this myth or reality? Biometric facial measures are collected by many states when you take your driver’s license picture. At least twenty-six states permit law enforcement agencies to run a facial recognition search against driver’s license and ID photos in their databases. Major cities throughout the world claim to use real-time facial recognition using cameras on the streets. So there is a database that contains your facial biometric measurements and personal information found on a driver’s license. Increasingly, states require fingerprint-based background checks for positions—including volunteer positions—that involve working with children. Fingerprints are measured and compared with fingerprint measurements in a criminal database. Are these fingerprints and application information destroyed? What happens to our biometric information is a question few of us ask. This is especially concerning when dealing with private organizations who many times require you to sign a release that may give the organization the right to use your biometric information as needed without your knowledge. Technically the biometric information—along with other information you provide—can be linked with information purchased by the organization, such as credit information that collectively creates your profile. For example, credit card information tells where, when, and what your purchase was. There are concerns that government agencies can use biometric data checks, but the use of biometric data by private organization cannot be overlooked. For example, the New York Times used
Physical Access
175
Amazon’s Rekognition, a face matching service, to identify attendees at a royal ceremony. An issue is whether or not facial recognition systems used to unlock a smartphone can be used by the smartphone provider and other apps for commercial purposes. Google Clips is a camera with facial recognition capabilities built-in and available for anyone to use. The goal is to be able to search your photo albums for specific photos. Microsoft offers technology that provides facial recognition, including identifying hair color and emotions, and is used by Uber to verify the identities of their drivers. Microsoft has urged Congress to regulate the use of facial recognition to prevent exploitation of personal information. The European Union prohibits collecting biometric data for facial recognition without the user’s consent.
Password And, yes, the ID and password. ID and password combinations are the most commonly used method to protect information from unauthorized access. However, protection is only as good as the character pattern used to create the password, known as the password strength. The challenge is to pick a password that is not easily guessed yet one that is ease for the staff to remember. The organization sets a password policy, striking a balance between the need to protect information and the staff’s frustration level using passwords. A perfect password contains randomly generated uppercase/lowercase letters, numbers, and punctuation marks, making it nearly impossible to guess but impractical for you to remember. There is a compromise where the password is required to have: –– At least one uppercase character –– At least one number –– At least one punctuation mark –– No commonly used names such as the user’s name, user ID, organization name, names of relatives of the user, commonly used abbreviations, and names of commonly used items. The reason for using a combination of uppercase/lowercase, numbers, and punctuation marks is pure statistics. Probability is the statistically measured likelihood that an event will occur. There are ten possible digits for each number used in the password. The probability of picking the correct number is one in ten, written as 1:10. There are fifty-two possible letters (uppercase/lowercase) used for each letter in the password. The probability of picking the correct letter is 1:52. There are nineteen punctuation marks on the keyboard. The probability of picking the correct punctuation mark is 1:19. Here are the probabilities of guessing the password when these elements are combined in eight characters. You can calculate these yourself by using factorial (Google it):
176
Chapter 6: Talking Intelligently About Cybersecurity
Lowercase letters only (26) 1:208,827,064,576 Uppercase/Lowercase letters (52) 1:53,459,728,531,456 Uppercase/Lowercase letters and number(s) (62) 1:218,340,105,584,896 Uppercase/Lowercase letters and number(s) 1:645,753,531,245,761 and punctuation (88) No person is going to attempt to guess a password except for the obvious such as your ID, your name, the organization name, and similar names that are easy to remember. And the default password can’t be overlooked. Cybersecurity auditors often find “admin” as the administrative password because technicians never changed the default password. No person is going to guess a password but a password cracking program will attempt to guess a password. Password cracking software such as Brutus, RainbowCrack, and Wfuzz use the brute force method of guessing a password. One password cracking program can try 275 passwords per second. When a cluster of password cracking programs is on the attack they can try up to 350 billion passwords per second. A cluster is a group of computers running the same password cracking software and attacking the same target. Now, the likelihood that someone or some organization will spend this effort to guess your password is probably slim, depending on your role in your organization and the nature of your organization. The Pentagon’s computers are a high-valued target compared to a local restaurant. And this is a factor that must be considered when setting the password policy. An internal attack is more realistic than an external attack for many organizations. An employee may acquire the password of a colleague to access confidential information. And that colleague is likely to comply with the password policy—except the password is in the top desk drawer, under the keyboard, or written on a post attached to the computing device. The cybersecurity auditors will also require that the password be changed quarterly or less, depending on the nature of the information accessible through the ID/ password. The password policy must also specify the number of login attempts before the user is locked out and required that the help desk be contacted to unlock the computing device. A failed attempt may indicate that the employee forgot the password or may indicate that an intruder attempted to gain access to the system. The number of failed attempts before the ID is locked out depends on the tolerance of the organization. Some are locked out after two attempts. Many are locked out after three or four attempts. It is more likely that the user forgot the ID/password than an intruder attempting to access the computing device because intruders—except for casual intruders—are aware of the lockout policy and make no more than two attempts and then wait a day or so make another two attempts.
Physical Access
177
Access Rights Access rights is another method used to secure information. Each employee is assigned an ID and password. The ID is assigned to one or multiple security groups. A security group is a logical grouping of identification numbers that have the same access to computing devices, data, and/or facilities. For example, all sales representatives need access to the order entry system, the accounts payable system, and the accounts receivable system. Rather than grant each sales representative access to each system, a security group is created, called a sales representative. Each sales representative is assigned to the sales representative security group. The sales representative security group is given access to these systems. A new sales representative is granted access by assignment to the sales representative with the security group. Removing access is as easy as removing the sales representative associated with the sales representative security group (Figure 6.5).
Figure 6.5: The IDs of each member of the sales staff are assigned to the sales representative group. The sales representative group is given access to applications.
Real or Perceived Threat The battle between intruders seeking to gain unauthorized access to information and implementing defense strategies to protect access to information is never-ending. The overriding question: Is the threat real or perceived? How many computing devices in the organization are actually targets of an attack? It is difficult to answer this question. Anti-malware programs continually scan computing devices for viruses, key loggers, and other malware that seek to compromise the computing device. Suspicious malware is then isolated, reported to the MIS department, and usually is sent to the manufacture of the anti-malware program for analysis. The MIS department probably has a count of suspicious malware caught by anti-malware programs. However, anti-malware programs trap known malware. New malware is always being developed and may go undetected until it attacks the computing device. A common way to gain fraudulent access to a computing device is to spoof the user into giving up the ID/password. For example, the user may receive a call purporting to be from the MIS department asking for the ID/password to access the user’s computing device to fix a problem. Other spoofing
178
Chapter 6: Talking Intelligently About Cybersecurity
sends an email that appears from an official source asking the user to click on a link. The link connects to malware that will then be installed on the computing device or a link to an official-looking website asking for personal information including the user’s ID/password. Threats are real, but it is difficult to determine how many are actually targeting your organization and specific employees. The presumption is that the organization is a target and necessary defensive strategies must be taken to protect the organization.
Disabling Services The cybersecurity auditors look to determine if certain obvious risks are mitigated. An operating system has a central program that operates the computing device called a kernel and ancillary programs sometimes referred to as services that enhance the operations of the kernel (see Chapter 3). There are three services that intruders can use to access a computing device behind the scenes without any notice. These services are: Telnet: Remote login software called telnet is usually available as a service with most operating systems. An intruder uses telnet on the intruder’s computing device by first entering the target’s computing device’s internet protocol (IP) address (see Chapter 2). The intruder’s telnet service tries to connect to the target’s telnet service. Both devices must be connected to the same network for this to work. The telnet service was created long before today’s internet as a way for academics to share information on each other’s computing device. Everyone was trusted. The telnet service login ID was the person’s email address and the password was the word “anonymous.” There was no validation. The target computer simply placed the email address into a log and gave remote access to the computing device. It is as if the remote user is sitting in front of the target’s computer, free to access anything. This is unheard of today but the telnet service is still a service of many operating systems. The cybersecurity auditors expect to find the telnet service disabled. File transfer protocol (FTP): The FTP service is used to transfer files between computing devices that are connected to the same network. The intruder enters the target’s IP address into the intruder’s FTP service running on the intruder’s computing device. Once a connection is made, the intruder logs into the target’s computing device and then sends a specifically formatted request to have a specific file copied to the intruder’s computing device. Similar to the telnet service, the FTP service is usually available today as a service on many operating systems. The cybersecurity auditors expect to find the FTP service disabled. File sharing: File sharing is a service used to share information among computing devices. This was implemented long ago before file servers were used to store shared files. The file sharing service still exists as a service of some operating systems. An intruder using a computing device can gain access to local storage on the target’s computing device by using the file sharing service. The cybersecurity auditors expect
Proxy Server
179
to find the filing sharing service disabled to prevent someone from accessing a computer device’s drive.
Proxy Server A proxy server is an intermediary between the computing device within the organization and the internet website and is used to give employees access to the internet with minimum cybersecurity risks to the organization. Here’s what happens when you search the internet from within your organization. The website name is entered in your browser and then the home page of the website appears. More happens behind the scenes. The request for the website goes to the organization’s proxy server. The proxy server forwards the request to the webserver that hosts the website. The home page of the website is returned to the proxy server. The proxy server then forwards the home page to your browser. When you request the home page from a website, the website stores the IP address of your computing device. At home, you connect to the internet through an internet service provider. The internet service provider is assigned a group of IP addresses. One of the IP addresses is assigned to you each time you connect to the internet. The website stores that IP address and uses it to send information (home page) to your computing device. It is different at work. Organizations have a fixed IP address that is always associated with a specific computing device on the internet, called the proxy server. The organization’s internal computing devices have IP addresses on the organization’s intranet that the organization provides, but those IP addresses are hidden from the outside website by the proxy server. The website knows the IP address of the proxy server, but has no idea (unless additional information is provided through the home page) about what computing device within the organization made the request. The proxy server knows that the request came from your computing device. Your computing device never directly interacts with the internet (Figure 6.6).
Figure 6.6: The proxy server links internal computing devices to the internet and websites outside of the organization.
180
Chapter 6: Talking Intelligently About Cybersecurity
Demilitarized Zone (DMZ) The organization is divided into two zones: Security Zone: The security zone contains computing devices, networks, databases, and applications that are protected by cybersecurity defenses. Demilitarized Zone (DMZ): The demilitarized zone contains devices that are not on the intranet such as the organization’s website and possibly cell phones employees use for work. Requests from outside the organization are received by the proxy server. If the request is for unsecured information available in the DMZ, then the proxy server fulfills the request with information from the unsecured computing device.
Firewall A firewall is a program that acts like a filter between an organization’s intranet and the internet. Information going from and coming into the organization’s proxy server is examined by the firewall. The firewall then apply rules created by the MIS department to restrict the flow of information. Those rules are frequently acquired from a vendor who ensures that restrictions are updated regularly. Let’s take a closer look. Information travels across the intranet and internet in electronic envelops called packets (see Chapter 2). Packets destined for outside the organization pass through the firewall. The firewall assembles packets and compares the content to filtering rules. Based on the comparison, the packet is forwarded for processing or is rejected, causing an error message to be displayed on the browser of the computing device that generated the packet. For example, employees could be prohibited from accessing YouTube from computing devices within the organization. The firewall is set to reject packets destined for YouTube identified by the YouTube IP address or the YouTube URL (www.youtube. com). Filtering rules may allow certain employees access but not other employees based on the employee’s login ID. Additionally, filtering rules may permit access to YouTube but restrict links to certain videos on the YouTube website. The firewall is also programmed to stop incoming messages from entering the organization’s intranet. For example, certain IP addresses from untrusted sources such as those from potential spammers are blocked by the firewall before reaching the organization’s intranet.
Firewall Controls Traffic Flow There are two ways in which a firewall controls the flow of packets. These are:
Firewall
181
Packet filtering: Packet filtering is a method used to compare the content of each packet with the filtering rules to determine if the packet should be passed along or rejected. Stateful inspection: Stateful inspection compares key elements of a packet—not the entire packet—to filtering rules to determine whether to reject or pass along the packet.
Configuring a Firewall Filtering rules can allow or disallow the following transmissions: –– Internet protocol address (IP): The firewall can block certain incoming and outgoing IP addresses or a range of IP addresses. –– Domain names: A domain name is a name that corresponds to an IP address. Domain names can be blocked. –– Protocols: A protocol is a standard way of doing something such as transmitting packets. There are a variety of communication protocols. A firewall can be configured to block one or more protocols. –– Ports: A port is an entry point into a server (see Chapter 3). For example, port 80 is for web access and port 21 is for FTP access. The firewall can block access to specific ports. –– Phrases: A list of words and phrases can be incorporated into the rules of a firewall, enabling the firewall to search through packets looking for matches. If there is a match, then the packet is rejected.
Breaking Through the Firewall A firewall is a good defense but not a perfect defense. Here are ways a firewall can be circumvented: –– Simple Mail Transfer Protocol (SMTP) Session Hijacking: SMTP is a common way email is transmitted over the internet (see Chapter 2). An SMTP session is the active process of transmitting the email. Emails can be redirected by manipulating the SMTP server, making undesired email seem to be coming from a trusted source. The email is received by the firewall with a trusted IP address when in fact the email is coming from an untrusted source who is sending spam or a virus. –– Backdoors: A backdoor is a little-known feature available in some applications that enables someone to circumvent security precautions. For example, a person is able run the application without having to enter an ID/password. Some programmers build backdoors into an application to facilitate development and testing.
182
Chapter 6: Talking Intelligently About Cybersecurity
–– Source Routing: Source routing occurs when a hacker makes packets appear to come from a trusted source by manipulating IP addresses within the packet. –– Remote Login: A remote login enables a person to log into a computing device from another computing device. For example, the help desk technician uses a remote login to access a computer that is causing trouble for a user. Once remotely logged in, any data transmitted will appear to come from the local computer. Likewise, information received by the local computer can be seen and manipulated by the person who is remotely logged into the local computer. Filtering rules are commonly established by vendors who actively monitor internet activities, seeking out sources of spam and malware. Once identified, the filtering rules are updated. You too probably have a firewall on your home computing device that is updated by Microsoft or Apple, depending on the computing device. Sometimes the filtering rules exclude information you need. Sometimes the information is filtered without your knowledge. Always check your spam folder in your email program and contact the MIS department once you realize that you’re not getting all your expected information. The MIS department sets the filtering rules on your work computer. MIS can change the filter to access emails that are considered spam but are not spam.
Encryption The best way to protect information is to encrypt it—scramble the information so it is not readable. The process of scrambling information is called encoding. The process of unscrambling encoded information into a readable form is called decoding. The way in which information is encrypted is called an encryption algorithm. One of the easiest—but very weak—ways to encrypt information is to use bit flipping. Remember from Chapter 2 that information is logically represented as a series of binary digits, bits 0s and 1s. Bit flipping changes all the zeros to ones and ones to zeros. For example, “Jim” is stored as 01001010 01101001 01101101 and is interpreted as “Jim” when displayed on the screen. Reversing each bit results in 10110101 10010110 10010001 that is interpreted as meaningless characters. Decoding is using the same program to flip bits back to their original state. Strong encryption algorithms use a value called a key that is applied to scramble the information. A key is a code used in a mathematical algorithm to encode and decode the information. The size of the key gives the encryption algorithm its strength. In the 1970s, the Data Encryption Standard (DES) used a 56-bit key—56 binary digits used to encrypt information. There are 70 quadrillion combinations of 56 bits, making it nearly impossible to guess the key. However, computers’ speeds have made guessing the DES key possible and within the realm of probability.
Encryption
183
The DES was replaced by the Advanced Encryption Standard (AES). The AES uses a 128-bit key, 192-bit key, or 256-bit key, making guessing the key—even by a computer—to be nearly impossible. Furthermore, Google uses a 2048-bit encryption key for the Secure Sockets Layer (SSL) certification system (see Chapter 2). A popular encryption program you can try is called Pretty Good Privacy (PGP) and can be used to encrypt any type of information.
Classes of Information and the Need For Protection Information falls into four classes used to gauge the type of encryption needed to protect the information. These are: –– Data at rest: Data at rest is protected information stored on a server or other computing device. There are several methods that can be used to encrypt data at rest. These are: ○○ Full disk encryption: Full disk encryption requires that all data on the disk, including the operating system, be encrypted. Prior to installing the operating system in memory, which is called booting, the user is prompted to enter an ID/password. If authorized, the operating system is decrypted and loaded into memory. ○○ Virtual disk encryption: Virtual disk encryption groups files into a logical container. The content of the container is then encrypted. Only an authorized ID/password can gain access to the container and decrypt the contents of the container. Files outside the logical container remain unencrypted. ○○ Volume encryption: Volume encryption is the process of encrypting a section of the storage media called a volume. All files within the volume are encrypted. This is a commonly used encryption method for USB flash drives and hard drives. ○○ File/folder encryption: File/folder encryption is the process of encrypting the contents of a file or the contents of files that are logically organized into a folder on a storage device. –– Data in motion: Data in motion is protected information that is being transmitted over a network. Data can be encrypted at the network layer of the OSI model (see Chapter 2) using commercially available encryption programs. –– Data in use: Data in use is protected information that is being created, retrieved, updated, or deleted. Privacy screens on monitors and restricted positioning of monitors away from public view are the best methods to protect data that is being used by staff. –– Data disposed: Data disposed is protected information that is being destroyed. The process of removing obsolete data is called sanitization. Deleting data from media does not remove the data. Data can be reconstructed. The sanitization
184
Chapter 6: Talking Intelligently About Cybersecurity
process ensures that deleted data cannot be easily recovered. There are three commonly used methods for sanitizing data. These are: ○○ Clear: Clear is a method of overwriting the storage space with non-sensitive data using appropriate software and hardware. ○○ Purge: Purge is a method of degaussing. Degaussing is exposing the media to a strong magnetic field, causing disruption in the recorded data. ○○ Destroy: There are several methods for destroying the media. These are pulverization, melting, incineration, and shredding. If the media is not going to be reused, then the most cost-effective process is to destroy the media.
Categories of Encryption There are two categories of encryption. These are: –– Symmetric key encryption: A symmetric key encryption requires that the computing device receiving the encrypted information uses the same key as the computing device that sends the encrypted information. A drawback using symmetric key encryption is that computing devices receiving the encoded information must be known to the sender. This is problematic over the internet when both computing devices are unknown to each other. –– Asymmetric key encryption: Asymmetric key encryption is commonly referred to as public key encryption and uses two keys to encode and decode information. These are the public key and the private key. Let’s say that a remote computing device wants to send you encoded information. The remote computing device receives your computer’s public key and uses the public key to encode the information. The encoded information is them transmitted to your computer. The private key on your computer is used to decode the information. The private key is known only to your computer. The public key cannot be used to decode the information.
Digital Certificates, SSL and TLS You’ve probably used a secured connection on the internet to conduct purchases and banking. Behind the scenes is an encryption process at work. These can include digital certificates, Secure Sockets Layer (SSL), and Transport Layer Security (TLS). A digital certificate is a unique code issued by a known certificate authority certifying that a web server is trusted and that the web server is who it says it is. Each computer is provided with the public key of the other computer that is used to encrypt and decipher information that is transmitted over the internet.
Encryption
185
Sensitive information is also transmitted over the internet using the Secure Sockets Layer (SSL) and Transport Layer Security (TLS). SSL is the original internet security protocol that is being replaced by the TLS. Both the SSL and TLS use digital certificates. When a request is made to transmit secure information, the HTTP protocol changes to HTTPS, indicating that information will be transmitted securely. The browser initiating the transmission sends a digital certificate and the public key. The digital certificate is then verified for authenticity. The public key is then used to encode the information before sending the information to the browser. A combination of asymmetric (public) key and symmetric key is used to facilitate secure transmission. The public key is used to send the symmetric key to the other computing device. The symmetric key is then used to encode information during the session and is then discarded at the end of the session. The symmetric key is only valid for the session It is no longer valid once the session is completed since the information has already been decoded.
Hash Value Another element of encryption is a hash value. A hash value is a value that is computed using a hashing algorithm and an input number used in public key encryption. As a simplified example, let’s say that the input number that is to be encoded is 534. The hashing algorithm requires that 534 is multiplied by a number created by the algorithm; let’s say that number is 143, producing the result of 76,362. 76,362 is then transmitted. The computing device receiving the transmission knows the hash value and uses it to decode the transmission. Public key encryption uses very large multiplication factors and more complex encryption algorithms, making it extremely unlikely that a computing device using brute force will be able to identify the hash value.
Cyclic Redundancy Check and Checksum A cyclic redundancy check (CRC) and checksum are methods used to ensure that transmitted data isn’t corrupted during transmission. CRC and checksum are not used for encryption, although these methods can indicate if the data changed during transmission. Checksum adds together bytes in a packet and places the sum in the packet trailer (see Chapter 2). The bytes in the packet are again added when the packet is received. The sum is compared with the sum in the trailer. If they match, then the data has not changed. If they are different, the data has changed and a request is made to resend the packet.
186
Chapter 6: Talking Intelligently About Cybersecurity
A cyclic redundancy check uses a similar concept as checksum except polynomial division instead of addition is used to calculate bytes in the packet.
Wireless Network Security A wireless network (Wi-Fi) extends the organization’s network without having to install cabling throughout the facility by using radio waves (see Chapter 2). Wi-Fi uses transceivers—commonly called hotspots—placed in key locations that are connected to the network using cables. A transceiver sends and receives radio signals that are carrying network packets. Any Wi-Fi enabled computing device can receive and send information using the transceiver. Therefore, Wi-Fi transceivers must be secured. There are two common methods used to secure a Wi-Fi connection. These are: –– Wi-Fi Protected Access Version 2 (WPA2): WPA2 uses the Advanced Encryption Standard (AES) that uses a temporary encryption key to encrypt packets transmitted over the Wi-Fi network. AES uses a data encryption method that combines user- and system-generated keys up to 256-bit long and verifies that the encryption key wasn’t altered during transmission. –– Media Access Control (MAC): Every computing device capable of connecting to a network has a unique media access control address that is encoded into the computing device’s network card (see Chapter 2). The Wi-Fi router can be configured to allow only specific MAC addresses access to the Wi-Fi connection. However, a hacker may send a copy of the approved MAC address to the Wi-Fi to gain access. This is spoofing (mentioned earlier in this chapter).
Bluetooth Security You probably use Bluetooth to go hands-free connecting a headset to a smartphone. Bluetooth is a protocol that enables two devices to automatically establish a wireless connection using radio waves. Transmission is over a very short distance (see Chapter 2). A Bluetooth connection has the same security risk as a Wi-Fi connection, as radio waves can be intercepted by a Bluetooth device. There are two ways to provide security on a Bluetooth device. These are: –– Non-Discoverable Mode: A Bluetooth device can be placed in the non-discoverable mode, preventing other Bluetooth devices from connecting to the device. This is ideal if a user is not going to connect Bluetooth devices to the Bluetooth-enabled device. –– Trusted Devices: A Bluetooth-enabled device can be configured to trust the source of data transmission. A trusted source can be given device-level security, which
Hacking
187
is access to the device-level and/or service-level security, which is access to specific data on the device. Most smartphones require authentication before giving access to a Bluetooth device.
Hacking It is nearly impossible to listen to the news without hearing of a hacking attack where some intruder has gained unauthorized access to a computing device. The intruder can be a clandestine government agency, corporate espionage, crime organization, or a bored preteen on a laptop. Regardless, a hacker can steal sensitive information, hijack a computing device, make a computing device go wild, or simply display a message saying, “I was here.” There are two elements to hacking: first, to gain access to a computing device, and then to do something once access is gained. Sometimes gaining access is as easy as asking a person for their ID and password using social engineering. Social engineering is a technique where the hacker manipulates a person to give the hacker access. You may get a call at work from the “MIS department” telling you they need to quickly upgrade your computing device. The caller sounds legitimate. They call you by your first name and sprinkle the conversation with enough business and technical jargon to avoid any suspicion. At some point in the conversation, they casually ask for the ID and password. Once access is gained, the hacker may search the computing device for information or links to other unsuspecting users so the hacker can pretend to be you when contacting them. More frequently, malware is installed on the computer. Malware is a name given to programs whose sole purpose is to disrupt normal operations of the computing device or to even disrupt normal operations of an organization.
Computer Viruses A computer virus is malware that attaches to a file or application program and quietly waits until the file is open or the application runs before the virus runs. What happens next depends on the nature of the virus. The virus may block access to the computing device; make the computer act strangely; or the virus may run quietly behind the scenes, capturing and sending information to a remote computing device on the internet. Two common forms of computer viruses are a Trojan horse and worms. A Trojan horse is a computer virus that appears to be a trusted file or application program, such as a picture attached to an email or a purported upgrade to an existing application. The unsuspecting user downloads the file or installs the application and then executes it. A Trojan horse is spread when unsuspecting users share the file or
188
Chapter 6: Talking Intelligently About Cybersecurity
application program with others, enabling the Trojan horse to populate other computing devices. A worm is a malicious computer virus that exploits known weaknesses in network security commonly referred to as a security hole. The worm searches for the security hole, copies itself to networked computing devices, and then causes its intended disruption.
Inside a Computer Virus A computer virus is like any computer program in that it contains instructions that tell the processor to perform tasks, except these tasks exploit the computing device. A virus may initially modify the interrupt table of the operating system. As you’ll recall from Chapter 3, the kernel of the operating system is the program that controls the processor. The kernel monitors requests from applications that are running and from devices such as the keyboard. Some requests require the processor to stop what it is doing and process the request. This is referred to as an interrupt. Each interrupt is identified by a code. The kernel uses the interrupt table to know what to do next. The interrupt table contains an interrupt code and a memory address that contains the program that needs to be run to fulfill the request. A computer virus may manipulate the interrupt table by changing the memory address to the memory address that contains the virus. Furthermore, the virus then uses the legitimate memory address at the end of the virus to call the requested program. The user may notice a brief delay before the legitimate program runs—otherwise there may be no indication that the virus is running, especially if the virus is not impeding the computing device. Some viruses “infect” a legitimate program by attaching themselves to programs. The program runs normally when selected by the user and then continues by running the virus. The virus may instruct the processor to send sensitive information generated by the legitimate program to a remote computer on the internet without causing suspicion. The only telltale sign is that the size of the program file increases once it becomes infected. A computer virus can be attached as an executable file with the file extensions of EXE, COM, or VBS. Computer viruses can also be attached as JPG files. Did You Ever Wonder Why It Is Called a Virus? A real virus is a microorganism that infects the cells of living things. It takes control of the cell, disrupting the cell’s normal function. A virus is able to evolve quickly and change itself based on the environment, which makes it difficult for scientists to develop medication to combat a virus. The virus has changed many times before the medication is available to the public. A computer virus is not a microorganism. It is a program written by a programmer (hacker) that has many of the characteristics of a real virus. The computer virus infects a computer—that is, the operating system or application where it takes control. Other programmers write virus detection programs (medicine) to detect and neutralize the computer virus. However, some computer viruses are
Hacking
189
programmed to rewrite themselves, making them difficult to be detected by virus detection programs. Programmers who develop virus detection programs are always one step behind computer viruses.
Rootkit A rootkit is a stealth program that is difficult to detect because it modifies programs designed to detect computer viruses. A rootkit is a computer virus that modifies administrative tools used to manage the operations of a computing device, causing the administrative tool to make inappropriate decisions. A rootkit focuses on the operating system kernel. The kernel is the portion of the operating system that interacts with the computer hardware, boots the computer, loads programs and files, and directs the central processing unit. The kernel level of the operating system is the most trusted program running on a computing device. A rootkit corrupts the kernel, making the kernel untrustworthy. A rootkit is suspected based on unusual behavior of the computing device. Boot the computing device from a trusted copy of the operating system to determine if the suspected behavior changes. If so, then reinstall the operating system.
Computer Virus Protection Virus detection programs are the best way to protect against viruses entering a computing device. Makers of virus detection software identify the characteristics of known viruses including size, bit patterns, and how the virus infects a computing device. Once a new computer virus is detected, the virus detection program is updated nearly immediately and distributed electronically to clients, depending on the licensing arrangement. Virus detection programs scan all incoming files for known computer viruses before the file is permitted on a computing device. The computing device is also scanned based on a schedule to assess if a computer virus has gained access. Once detected, the virus is isolated on the computing device preventing access to the operating system. The virus detection program then removes the virus.
Macro Virus You might have recorded a macro in Excel to automate interactions with a spreadsheet where a sequence of interactions is stored under a name. When you run the macro, keystrokes are automatically entered by the macro as if you entered them at the keyboard. This saves time when performing the same interactions frequently.
190
Chapter 6: Talking Intelligently About Cybersecurity
A macro virus is a virus in the form of a macro. Executing the macro causes the application to execute interactions that may be destructive to the contents of the Excel spreadsheet. Damage can be done before the user realizes that the macro virus is activated.
Keylogger You may think only you and your computer know what you type on the keyboard. Well, that is not necessarily accurate. Each keystroke is entered into memory and remains until the Enter key is pressed. The Enter key causes the operating system to interrupt the processor and then process the keystrokes. There are a few exceptions, such as processing the Escape key or the Ctrl-Alt-Delete keys that immediately interrupt the processor. An interrupt has a code that is associated with an address in memory that contains the program, and the processor runs in response to the interrupt. A keylogger is a program that captures and stores all keystrokes in a log. The keylogger rewrites the interrupt table so that keystrokes are sent to the keylogger when the Enter key is pressed. Keystrokes are saved, and then the real program is executed. The keystroke log is then retrieved either over the network or physically copied from the computing device. A hacker then analyzes the log to determine activities on the computing device. The hacker may replicate those activities to gain access to the organization’s applications and information, pretending to be the employee. A lot can be learned from analyzing the key log. Besides identifying the applications and the login information, the key log also identifies patterns of access. No one would become suspicious if the hacker accessed the applications using the same patterns. A keylogger is typically installed clandestinely on the computing device by malware unknowingly downloaded from the internet. Some keyloggers are manually installed on the computing device. Risk of Impersonation You’ve probably received emails from your boss telling you to do something and you simply comply without giving a second thought, especially if the request is reasonable. How do you know that your boss sent the email? You probably looked at the return email address and you recognized your boss’ writing style. Nothing raises suspicion. In fact, you might carry out the task without giving your boss any feedback. Clandestine hackers count on you to follow reasonable directives without question. Once the hacker gains information from the key log, the hacker can direct any aspect of the organization. The chance of this occurring is slim because there is little value to the hacker. However, high-value targets are decision makers in government or the military, where orders have far-reaching impact. For example, parts for key military components might be delayed, redirected, or canceled with an email from the right officer. Sentencing of a criminal might be changed if the hacker accesses the right application using the judge’s ID/password after sentencing in the courtroom. Hackers target routine types of orders that won’t attract attention. They won’t redirect the fleet
Wi-Fi Access Points
191
but may change the order for fuel so that the fuel is delayed, resulting in limiting movement of the fleet until the fuel arrives. Correcting the fraudulent order is further complicated because the same method to counter the order is used to give the original order. No one knows if the order was given by the officer or the hacker because both use the same email or the same application to issue the order.
Denial of Service Nothing is more annoying that listening to the constant busy signal or the message “You are a valued customer. All agents are helping other customers. Someone will be with you shortly.” “Shortly” can feel like hours, leading to a hang-up. This is a problem for you but it is also a problem for the business whose sales are made over the phone. Now imagine you are an online retailer and no one can access your website to shop and place orders. You are practically out of business until access to your website is restored. Sometimes, technical issues cause the website to be down. Other times, hackers shut down the website using a Denial-of-Service (DoS) attack. The hacker blocks access to the website. A website is hosted on a web server that is connected to the internet. The domain name of the website is associated with the IP address of the corresponding web server. Request for the website’s home page is sent to the IP address, causing the web server to process the request. Each request takes a fraction of a second to process. High-performing web servers can process thousands of requests each second. However, there is a limit that, once reached, causes a backup of requests. Hackers use a program that generates a constant flow of requests to the IP address—and that program typically runs on hundreds and possibly thousands of computers resulting in a total log jam of requests, practically shutting down the online retailer’s website. The web server cannot differentiate between requests from customers and requests from hackers. The proxy server receiving incoming requests is configured to discard packets from IP addresses involved in the DoS attack once those IP addresses are known. A well-designed DoS attack will have installed a program on thousands of unsuspecting computing devices that send these requests at the time of the coordinated attack, making it difficult to identify IP addresses to block.
Wi-Fi Access Points A wireless access point (WAP) is a computing device that has a transceiver and a program that connects wireless computing devices to the intranet, which in turn communicates to the internet. A WAP constantly broadcasts its existence, popping
192
Chapter 6: Talking Intelligently About Cybersecurity
up on wireless computing devices that search for WAPs. Each WAP is identified by a name created by the operator of the WAP, which usually implies the name of the organization. Hackers create a fraudulent WAP with a misleading name used to encourage unsuspecting users to access the internet through the fraudulent WAP. The fraudulent WAP name may abbreviate the real name of the WAP or misspell the real WAP name. The fraudulent WAP displays a login webpage that gives the appearance of being the real WAP. Once the unsuspecting user gains access, hackers then monitor all activities on the fraudulent WAP including websites visited, login IDs and passwords, and any information that crosses over the WAP. Unless a virtual private network is used (see Chapter 2), all information transmitted over the fraudulent WAP is intercepted by the hacker. A fraudulent WAP is a form of phishing. Phishing is a technique used by hackers to pretend to be a trusted site by making the site and the WAP have all the official trappings. A common phishing practice is to give the impression that the hacking organization is the official tech support for a reputable organization. For example, there are technical support numbers on the web that lead one to believe you are calling Microsoft. It connects to a firm outside the United States who seems to be acting like a tech support technician logging on to your computer with your permission and running a troubleshooting program that discovers a virus causing the problem. You are then talked into purchasing $180 service to immediately remove the virus and fix the problem. There is no virus. They do not represent Microsoft. In fact, they are careful to say that they “support Microsoft products” but they’re not associated with Microsoft.
Identity Theft Once personal information is intercepted, it can be used to impersonate the unsuspecting person, called identity theft or identity fraud. The stolen identification can be used for social program fraud, credit card fraud, and financial fraud, among other deceptive practices. A patient’s medical benefits identifier is one of the most commonly misappropriated documents. The medical benefits identifier enables the identity thief to receive free medical benefits in the name of the other patient. The healthcare facility delivers services to the identity thief and then files reimbursement claims with the third-party payer. The third-party payer rejects the claim. Further complicating the situation is that patient records of the identity thief and the patient are comingled, resulting in misleading information. Test results for the identity thief are likely to be different from that of the patient. Treatments might be ordered based on inappropriate test results. Fixing the problem for the victim is an uphill challenge; the victim has to prove that they did not receive services and incur expenses associated with the services.
Wi-Fi Access Points
193
Disputed claims are usually sent to collection rather than court, resulting in a negative impact on the victim’s credit rating.
Cookie Theft A cookie is a small piece of information that is stored on your computing device by a website to uniquely identify the interaction with the website. Cookie theft, sometimes called session hijacking or sidejacking, occurs when the hacker steals a copy of the cookie and then uses the cookie to interact with the website, leading the website to believe it is interacting with you.
Network Sniffing Information travels in electronic envelopes (i.e., packets) along the internet and intranet. A program or a network device called a network sniffer can read each packet without disturbing the flow of the packet. Network sniffers are legitimately used by network engineers to monitor activities on the network. However, hackers can also use a network sniffer to copy packets that flow across the network. Gaining access to the network is always a challenge for hackers except for Wi-Fi networks where transmission is over radio waves. A hacker using a Wi-Fi sniffer from the street can capture network packets. The law is unclear on whether or not this is legal if the WAP is opened to the public.
Detecting a Hacker There are some telltale signs of hacking that can identify if a computing device has been hacked. –– High outgoing network traffic. Turn on the computing device, then let it settle down. Do nothing. Watch the transmission light on your router. If the light continues to flicker and you’re not doing anything, then some program on your computing device is communicating over the network. –– Increased disk chatter. If you are doing nothing on your computing device and you hear the computer accessing the disk drive, then some program on your computing device is accessing files. Don’t be too alarmed, because some operating systems automatically back up files and perform disk maintenance. –– Examine the firewall log. Your computing device might have a firewall or run anti-virus software, both of which have a log of request and related IP addresses that were blocked. A high number of log entries from the same IP address may indicate a hacker attack.
194
Chapter 6: Talking Intelligently About Cybersecurity
Mobile Computing Device Mobile computing devices, including cell phones, pose a vulnerability that is overlooked by many organizations based on a report by Verizon. Many organizations fine expediency and increased business performance outweighing security. In particular, organizations tend to permit employees to transmit sensitive data across open public networks and don’t restrict apps that employees can download onto their corporate mobile computing devices. Apps may access data on the mobile computing device and transmit data to a hacker’s computing device even if the mobile computing device is in lockout mode. Bluetooth is typically set to enabled, alerting nearby hackers that the mobile computing device is ready to receive instructions. Joining public Wi-Fi hotspots risks that transmissions might be intercepted. Hackers gain access to mobile computing devices by: Malware: Malware is an app that contains instructions to transmit data stored on the mobile computing device behind the scenes even when the device is locked and seems to be turned off. The app, like other apps running on the device, has access to practically everything on the mobile computing device. Synchronization: During the synchronization process when data is updated between a PC and a mobile computing device, malware on either device can jump to the other device. Denial of service: Malware can block normal service on the mobile computing device to a point where transmissions are blocked.
Chapter 7 Risk Management and Disaster Recovery “Planning for a rainy day” is an old expression that is so true to any organization. What happens today is likely to happen tomorrow. You’ll arrive at work with the normal traffic hassles. Boot your computer. Read your email. And the rest of the day is rather predictable. Even when things don’t go exactly as planned, you handle them routinely because that’s relatively normal too. What if something doesn’t go as planned? Not only is your routine upset, but normal operations of the organization are also disrupted. Roads leading to work are unpassable. The building is shut down because of a fire. There is a power outage— lasting days. A key employee quits. No orders are received because of a system failure. There are an endless number of events that can happen that have a short- and longterm damaging effect on the organization. Sustainability is the primary goal of every organization. A business model describes how the organization will survive financially when everything goes according to plan. A disaster recovery plan describes how the organization will survive when things don’t go according to plan, based on a risk assessment/management plan that identifies all possible and probable events that can negatively affect the organization and how each risk is mitigated. A key element of every disaster recovery plan is how the organization functions when computing devices and applications are no longer available—a disaster that can bring an organization to a screeching halt. Every organization needs a comprehensive, well thought-out disaster recovery plan that can be implemented flawlessly in a heartbeat when disaster strikes. Employees, customers, vendors, regulatory authorities, lenders, insurers, and shareholders all expect that a disaster recovery plan be in place to protect their interest in the sustainability of the organization. This chapter addresses managing risks associated with organizations requiring high availability such as hospitals and utilities. Much of it can apply to an organization of any size. The first half of the chapter will talk about the various forms of risk and the second half will discuss what can be done to ameliorate those risks.
Disaster Mention the word disaster and it conjures images of a catastrophic event that includes fire, floods, and powerful storms that may disrupt business operations. A wind storm, for example, makes accessing the facility impossible because of down trees and power lines. Employees are unable to leave the building and other employees are unable to come to work. Some may decide to work from home, but the systems, the network, and databases are all unavailable even from remote locations. And the storm seems to DOI 10.1515/9781547400812- 007
196
Chapter 7: Risk Management and Disaster Recovery
come out of nowhere with no forewarning and no time to prepare. Failure of heating, ventilation, and air conditioning in the facility; loss of a key employee(s) due to injury on or off the job; a strike at the facility or at a vendor; all place the organization in disaster mode. A disaster to an organization is any event that negatively impacts the operation of the organization. This can be a strike by union workers, a vendor going out of business, or loss of access to the building. This involves anything that severely disrupts normal operations—not limited to floods, fire, and explosion. Anything that can arise from fear, uncertainty, or doubt of continuity, referred to as the FUD factor, can lead to a disaster. A disaster is categorized in a number of ways. These are general classifications, class of emergency, and the tier system. There are two general classifications of disasters. These are: –– Natural disasters: A natural disaster is an act of nature such as storms, floods, and earthquakes. Natural disasters cannot be prevented. –– Human-made disasters: A human-made disaster is an act of humans, such as a failure of the infrastructure resulting in a hazardous material spill. A humanmade disaster may be preventable by monitoring and implementing procedures that reduce the likelihood of such an event. Each general classification is further categorized as a class of emergency. A class of emergency defines the emergency condition by the length of time of the emergency. These are: –– Class 1: Class 1 is an emergency that lasts a few hours, such as a brief power outage or an injury onsite. –– Class 2: Class 2 is an emergency that lasts 72 hours or less and is less serious than a Class 1 emergency, such as a contained fire that caused slight damage to the facility. –– Class 3: Class 3 is an emergency that lasts more than 72 hours that affects one area of the facility, such as the data center. –– Class 4: Class 4 is an emergency that lasts more than 72 hours that affects the entire facility. –– Class 5: Class 5 is an emergency that affects the entire community, such a storm or flooding. An alternative classification method is the tier system. The tier system separates operational functions into three tiers: –– Tier 1: Tier 1 consists of functions that need to be operational with the first 72 hours of the disaster. –– Tier 2: Tier 2 consists of functions that need to be operational by the end of the first week of the disaster.
Risk Assessment
197
–– Tier 3: Tier 3 consists of functions that need to be operational by the end of the first month of the disaster.
Risk Assessment An organization faces many risks to its operation. A risk is the possibility of harm or long-term loss as a result of an event. There are obvious risks, such as fire in the facility, and less obvious risks, such as only one employee being able to program the old, dependable order entry application that hasn’t been changed in years. The organization comes to a halt should the order entry application fail and that employee quit. There’s no one to fix the application. The organization identifies all risks that might disrupt the operation by conducting a risk assessment. Risk assessment is a process of identifying risks and assessing the magnitude of the potential interruption to the organization should the event occur. Risks are classified as direct risk and indirect risk. –– Direct risk: A direct risk is an event that directly affects the organization, such as a systems failure, fire in the facility, or a power outage. –– Indirect risk: An indirect risk is an event that affects another party needed for the sustainability of the organization, such as an employee, customer, or vendor. For example, a fire at a vendor’s facility can disrupt supplies to the organization. The risk assessment must also consider secondary consequences that may occur as a result of an event. A secondary consequence is an obligation of the organization, such as loss to customers when the organization is unable to provide service to the customer. The disruption might result in contractual penalties, regulatory violations, and potential litigation. Risk to the environment—drinkable water, power, and heat—should not be overlooked. Loss of water to the facility, for example, prevents flushing toilets and makes the facility uninhabitable. Even if some employees agree to continuing working, a lack of flushing toilets is violation of local health regulations, causing government officials to temporarily declare the facility unsafe and require that the building be evacuated. There are three questions that should be asked when performing a risk assessment: –– What is the risk? –– What is the probability that the risky event will occur? –– What is the impact to the organization should the risky event occur? There are several formal methods that can be used to identify risk: –– Disaster-Based Risk Assessment—focuses on hazards –– Asset-Based Risk Assessment—focuses on assets –– Business Impact Analysis—focuses on the business
198
Chapter 7: Risk Management and Disaster Recovery
Disaster-Based Risk Assessment Disaster-based risk assessment focuses on hazards rather than processes and systems. The goal is to identify all potential hazards; the likelihood that a disaster will occur; the impact to business operations; and whether the hazard can be avoided. The organization may be willing to do nothing to prevent a terrorist attack because there is a low probability that an attack will occur, depending on the type of organization. However, the organization is willing to invest in a backup power generator to power business operations because there is a greater risk of a power failure. Each hazard is entered into a weighted list and assessed for the likelihood that the hazard will occur and a contingency response is planned. A weighted list contains each risk and the probability that the risk will occur.
Asset-Based Risk Assessment The asset-based risk assessment focuses on identifying assets that are vulnerable to hazards. Assets are people, equipment, information, systems—any person or thing that is necessary to keeping the business operational. List each asset, its location, and hazards that may affect the asset. Assign each hazard a probability of occurrence—the chance that the hazard will happen. The asset list helps identify the vulnerability of each asset, and then focuses attention on how business operations can continue without the asset. A proper risk abatement plan includes developing controls to mitigate those hazards from occurring, and then measuring the effectiveness of each control. A control is a process that reduces the likelihood that the risk will occur.
Business Impact Analysis A business impact analysis is part of the risk assessment that focuses on the impact a risk has on the operations of the organization. The business impact analysis examines each business process, looking for steps in the process that are at risk for failure. The risk is then evaluated for the probability that disruption might occur. The business impact also determines the effect the risk has on the sustainability of the organization. Let’s take a look at the order entry process. Here are just some of the resources that may be unavailable: –– Sales representatives –– Sales assistants –– Computing devices used to enter orders
Risk Assessment
–– –– –– –– –– –– –– –– ––
199
Electricity to power computing devices Backup power supplies Network cables Network routers (see Chapter 2) The application server that runs the order entry application The database server that runs the order database The database management system The database The data center that houses the application server and database server
Nothing is assumed to work properly during the risk analysis, including backup resources. In a power failure the backup generator may not work. Here are elements to consider in the business impact analysis: –– Assess the minimum effort needed to maintain operational levels. –– Review the impact of disruptions. –– Identify steps in all processes. –– Estimate recovery point objectives. There are steps in the recovery process referred to as recovery points. Each recovery point has a goal referred to as a recovery point objective such as restarting all computing devices. –– Assess the needs of direct support departments. –– Identify gaps in the operations that can fail.
Legacy Systems Every organization has an old, reliable application that has been running for decades without a problem, but may also be a hidden mine field for potential disasters. It’s so reliable that even the MIS department doesn’t give it much thought. Yet the organization would come to a standstill if it stopped working. Since the application hasn’t attracted attention, there is a high likelihood that the MIS department would have to hunt down the original program files (see Chapter 3) and, if it could be found, fix the problem. And this assumes that there is a programmer who knows how to fix it. This application is a legacy system. A legacy system is a system that may be critical to the operations of the organization that has not been replaced or upgraded for a long time. A legacy system can be a computer-based system, a manual system, or a combination. The system may be totally under the control of the organization or a process provided by a vendor. Legacy systems operate without problems for many years—so much so that managers tend not to properly manage the system. In essence, managers may forget about legacy systems. Legacy systems become problematic when the system ceases to operate and no one presently on staff is familiar with the details of the system—especially the computerized portions of the system—to fix it. The organization may discover that the
200
Chapter 7: Risk Management and Disaster Recovery
vendor who supplied the system is no longer in business or no one is available to fix the legacy system on short notice, if at all. The technician who was intimately knowledgeable about the system has long retired or the computer language used to write the application is no longer taught in school. And the backup systems haven’t been tested for years. Bottom line: no one can change or fix the legacy system on short notice, and going back to a manual system is too hard to implement. It is critical during a risk assessment to identify legacy systems and assess the preparedness to provide adequate support or to decide if there is a need to replace the legacy system. It is also vital that those responsible for the legacy system prove beyond a reasonable doubt that they can repair or modify the system. For example, the MIS manager who oversees the legacy system may identify the programmer responsible for maintaining the system. This is fine but the risk assessment requires that the programmer display the source code (the instructions written by the programmer) and the necessary tools (compiler) to convert the source code into an executable program (the program that actually runs on the computer) and then recompile the source code to recreate the executable program (see Chapter 3).
Points of Failure The initial step in the risk assessment is to identify points of failure within the organization by performing an information technology (IT) audit. A point of failure is an element in the organization’s operation that might fail, such as failure of the local area network from transmitting electronic data throughout the facility. When a fail point is identified—a vulnerability to the operations—assess if steps have been taken to mitigate the risk to prevent the likelihood that the event will occur. Determine the significance of the event to the operations. What are the chances the event will happen and what impact does that event have on business operations? An IT audit is a detailed survey of computing devices, programs, operating systems, networks, cables, and anything required for processing, including employees who use a computing device and employees and vendors who support and maintain computing devices. The IT audit is conducted by IT auditors who have a background in all areas of information technology. These are usually former technicians or IT managers who use their knowledge to verify that the organization has addressed points of failure. Their goal is to determine if policies and procedures are in place that adhere to industry standards and mitigate points of failure. IT orders also verify that policies and procedures are implemented. Findings are reported in an IT audit report that also contains recommendations to mitigate any failures in policies, procedures, and practices. The IT audit begins with the end point—the results of the process. Managers are interviewed to determine their expectations. The organization’s policies and procedures are reviewed, as are regulatory requirements, if any. Employees are observed
Risk Assessment
201
as they use the process. Any deviations from management expectations and policies and procedures are noted. IT auditors then follow the process—some call it following the cable because IT auditors practically trace the cable from the computing device. Identifying points of failure requires tracing the hardware used to access the information. Hardware includes computers, cables, servers, and other computing devices. The trace follows the cable. It starts at the network cable leading from the computing or the Wi-Fi connection and then traces the cable through walls into the communication closet. A communication closet is typically a small room on the floor where all cables connect to routers and other computing devices, including more cables that transmit data to a central communications hub somewhere in the facility. The central communications hub is connected to the data center or to outside vendors that operate the servers that run applications, database management systems, and databases. Each computing device is a point of potential failure. Each communications closet or hub is a point of failure. Each cable connection is a point of failure. Any of these could disrupt operations. In the data center, IT auditors examine security that includes physical access to the facility; physical access to computing devices; physical and electronic access to applications, database management systems, and data. Every element of a process is closely examined. Auditors look for facts. Affirmations—taking someone’s word—are unacceptable. IT auditors trust employees but verify that what is said is true.
Recovery Point Objective (RPO) and Recovery Time Objective (RTO) A recovery point objective is the acceptable period of time when the process is unavailable to the organization. That is, how long the business can live without the process. For example, the recovery point is zero for Amazon. Hundreds of thousands of orders will be missed if the process goes down. The risk is losing business. However, business-to-business organizations that sell through a sales staff to other businesses may have a recovery point objective of three hours. Orders are usually for large quantities and arrive at various times throughout the day. Most orders can be written by hand and then later entered into the order entry system. The risk is failure to deliver products within an acceptable time period. The recovery time objective is the time needed to restore the failed process—how long it takes to recover the operations once a disaster occurs. Let’s say the application server running the order entry application crashes. IT requires two hours to replace the application server and restore the order entry application. This is the recovery time objective. RTO focuses on how long it takes to recover and RPO on how long the organization can operate without the system. The difference indicates the organization’s risk exposure. In the ideal world, the RTO should be less than the RPO. In the real world,
202
Chapter 7: Risk Management and Disaster Recovery
this may not be the case and the disaster recovery plan must address how the organization will respond should a point of failure occur.
Data Synchronization Point Orders continue to flow through the order entry application stored in the database throughout the day. There is always a risk that the database or the computing device running the database will fail. A data synchronization point is a state of the database that was saved prior to the database failure. When a database failure occurs, the database is restored at the data synchronization point. A disaster recovery plan requires that databases be backed up regularly. Timing of the backup depends on the nature of the application. For example, an online retailer may backup each transaction immediately while a less data-dependent organization may backup data at the close of each business day. Data entered between data synchronization points are lost should the database fail. Knowing the data synchronization point enables the organization to develop a contingency plan to deal with failures that occur between data synchronization points.
Unseen Fail Points An unseen fail point is one that is not obvious during the risk assessment, such as the configuration of computing devices. You’ve probably experienced this hassle when you get a new computing device and you need to reset all your favorite settings, the default browser page, and your bookmarks. This is more an inconvenience than a disaster for you. It is different for your organization because many settings are for security or are needed to properly run applications. Settings of a computing device are based on the policies and procedures of the organization and the interaction of the computing device with the network and other devices. Failure of the computing device usually requires reinstallation and configuration of the device. Unseen fail points are instructions on how to reinstall and configure the computing device. During a risk assessment, IT auditors watch the recovery of unseen fail points, especially during off hours. IT auditors ask the following questions. Answers reveal weaknesses in the recovery plan. –– Who installs the application? –– Has that person installed the application in the past? –– Are there special configurations that need to be made to the application? –– Are there written instructions on how to install and configure the application?
Risk Assessment
203
–– Where are those instructions stored? –– Does the person who is installing the application know where to access the instructions? –– How would you operate the organization if information necessary to run the business was in your building, and you had to evacuate the building? –– Can employees function if their belongings (car keys, house keys, and so on) are in the building and the building is evacuated? –– Who is going to contract with vendors to provide backup locations? –– Are maps, routes, and accommodation details stored offsite? –– Who has the contact list of employees, and is that person available during the disaster? –– Do the critical employees know they are critical to the operation? –– Who is contacting vendors and customers and telling them about the temporary change in business operations?
Disaster Recovery Disaster recovery is the process of restoring functionality to the organization during and after the disaster—first returning to partial operations and then eventually, full operations. The goal of disaster recovery is to get back to normal business. The initial step is to reorganize employees and business operations when a disaster strikes. There are many challenges. –– How will you communicate with employees, and how will employees communicate with each other? –– Where will employees go if they can’t get back to the office? How will they keep doing their jobs? –– What systems and business units are crucial to the basic operations of the organization? –– Will employees be focused on the organization during a disaster? Consider Super Storm Sandy. Employees’ homes were destroyed and families uprooted. The organization wasn’t the primary focus. The primary focus is to maintain the minimum sustainable level of services within the organization and to customers. In doing so the goals include minimizing the potential losses—loss of revenue and penalties for breach of contracts with customers. It is important to identify crucial systems and processes, then spend funds to restore those systems.
204
Chapter 7: Risk Management and Disaster Recovery
Disaster Recovery Team The disaster recovery team is responsible for disaster planning, testing, and enacting the disaster recovery plan should a disaster occur. The team is composed of key employees, administrators, government agencies such as police and fire departments, and outside organizations such as utilities, vendors, transportation firms, and business partners. Each provides a unique perspective on potential disaster and how to prevent, mitigate, and recover from a disaster. The disaster recovery team begins with a charter. A charter is a written document that authorizes the team to develop and carry out the disaster recovery plan. The charter contains: –– Mission statement: The mission statement is the goal that describes the purpose of the plan and why the organization decided to create a disaster recovery plan. –– Scope: The scope is the team’s authority—what the team can and cannot do—and the time frame within which to develop the disaster recovery plan. –– Sponsor: The sponsor is the executive who sponsors the disaster recovery plan. –– Team: The team leader and team members are designated along with resources— such as a budget—that can be used to develop, test, and enact the disaster recovery plan. –– Disaster management: The charter must define every aspect of disaster management, including: ○○ What is an emergency? ○○ Who declares an emergency? ○○ Who is the incident commander? ○○ What is the command structure? ○○ Who opens the emergency operation center? There are two categories of disaster recovery teams: –– The primary disaster recovery team is led by the recovery manager and consists of key coordinators, each responsible for leading the recovery process for a specific aspect of the business. –– The secondary disaster recovery team supports the primary disaster recovery team’s efforts and focuses on rebuilding the business operation from minimum levels of operations.
Disaster Recovery Plan Critical to the recovery is the disaster recovery plan. Think of a disaster recovery plan as a cookbook for creating a Thanksgiving dinner. You open the first page and prepare the first recipe. When completed, turn the page and continue with the next recipe. A
Risk Assessment
205
disaster recovery plan contains processes to follow for specific disasters. The goal is to return the organization to normal operational levels with high-priority processes returning immediately and lower priority processes returning over time. Getting management buy-in is a challenge for many disaster recovery plans. Support peaks moments after a disaster occurs and then declines weeks and months after the event. There is no disaster recovery plan without management support. The key selling point of a disaster recovery plan is cost benefit—you don’t lose business. A lot of time, effort, and money is spent creating and testing a disaster recovery plan without contributing to the bottom line. A disaster recovery can be justified by pointing out that: –– The organization cannot afford business interruption insurance or cannot get adequate insurance coverage. –– Auditors and regulatory agencies may require a disaster recovery plan, and a disaster recovery plan can give investors peace of mind. Not all disaster recovery plans are successful, primarily because of common errors that can easily be avoided. –– Inadequate planning: The presumption is that planning is a straightforward process. In reality, planning is a complex process that requires focus on many details that are overlooked during normal business operations. –– Inventory of assets: There isn’t a complete and updated list of assets that clearly identify the location of the asset, its role within the organization, and requirements to operate each asset such as how to configure the asset. –– Minimizing the recovery effort: The presumption is that staff is available and has the skill set and knowledge on how to recover from a disaster. –– Invalid assumptions: Recovery and business continuity needs are based on unfounded assumptions rather than on measurements developed during a risk assessment. There are two major parts of the disaster recovery plan. These are: –– Disaster recovery: Disaster recovery focuses on returning the organization to marginal functionality within the first week of the disaster. This is like picking up the pieces and restoring some semblance of order in a chaotic situation. –– Business continuity: Business continuity is focused on the long term, restoring operations beyond one week following the disaster. Let’s say a storm disrupted power to the organization. The disaster recovery focuses on using battery backup power and the facility’s own generator to provide power to keep the organization marginally operational. Business continuity focuses on restoring full power.
206
Chapter 7: Risk Management and Disaster Recovery
Elements of a Disaster Recovery Plan The disaster recovery plan defines disaster recovery control measures. A disaster recovery control measure is a process for managing an element of a disaster. There are three disaster elements specified in a disaster recovery plan. These are: –– Preventive measures: Preventive measures are processes that prevent a disaster from occurring. For example, placing power lines underground prevents a loss of power due to a storm disrupting power lines. –– Detective measures: Detective measures are processes that discover that a disaster is occurring, such as the activation of a fire alarm. –– Corrective measures: Corrective measures are processes that restore functionality to the organization during or after a disaster. For example, an electrical generator provides power to the facility until the main power is restored.
Assumptions The staff that develop a disaster recovery plan make assumptions about risks that may disrupt the sustainability of the organization. Assumptions are based on the probability that a specific disaster will occur. Probability is determined by evidence that supports the likelihood that the disaster will occur. The staff looks at the experience of the organization and of similar organizations within the region. Government and scientific projects and data are also considered when setting the probability. A list of potential disasters and their probability are generated. The assumption used as the basis for setting the probability is listed for each potential disaster. For example, there might be small tremors over the years, but no earthquake sufficient to cause structural damage based on a review of one hundred years of data for the area. The assumption is that there will never be a significant earthquake and, therefore, there is no need to include an earthquake in the disaster recovery plan. An assumption may be reasonable but not necessarily true. Although there hasn’t been a significant earthquake in the area, that doesn’t mean there couldn’t be one in the future. Therefore, it is critical that a disaster recovery plan consider all types of disasters—even those that may be remote based on history. An earthquake can happen and the organization needs to be prepared to recover.
Risk Tolerance How much are you willing to take a risk? The answer depends on the risk. There is risk each time you drive your car. On a clear day the risk is minimum. On a stormy day the risk is moderate because the weather increases the risk of a storm related accident. You weigh the benefits of driving to the risk when you decide whether or not to drive.
Risk Assessment
207
You may drive to work in a storm because you need to get paid. However, you may forgo driving in a storm to go shopping because the risk of becoming involved in an accident outweighs the benefit of shopping. Whether or not you drive depends on your risk tolerance. The same question must be answered by every organization. How much are you willing to take a chance that the risky event will occur? The answer also depends on the risk tolerance of the organization. An organization’s risk tolerance is a factor in the development of the disaster recovery plan. Each risk is identified along with its probability of occurring. The disaster recovery team then decides an appropriate response to each risk. There are four common responses to a risk: –– Accept: Accept the risk and do nothing now—deal with it if the risk should materialize. –– Mitigate: Reduce the risk by doing something that lowers the probability that the risk will occur. –– Transfer: Transfer the risk to a vendor. The vendor takes on the responsibility of deciding how to respond to the risk. The organization is still exposed to the risk. –– Avoid: The administrator can change the situation to avoid the risk entirely. Deciding on a response must balance the effort to respond to the risk with the likelihood that the risk will materialize. The effort is usually measured in financial expenditure. How much money is it worth spending now to address the risk? For example, accepting the risk means no expenditures are made unless the risky event occurs. Mitigating the risk means some expense has been expended, such as purchasing insurance. Transferring the risk also requires expenditure: hiring the vendor. Avoiding the risk also requires expenditures that may change how the organization operates.
Risk Management Risk management is the process of managing fail points. The organization has options. –– It can do nothing and take the risk that it will not fail. –– It can buy insurance to cover losses should it fail. –– It can take steps to prevent failure. –– It can have redundancy to minimize business interruptions. Are you going to spend $100,000 for a fence to prevent unauthorized access to your facility? How do you know the risk of unauthorized access is more than $100,000? There is no absolute answer. The decision must be an informed decision based on the probability of the disaster occurring and the loss that would be realized should the disaster happen.
208
Chapter 7: Risk Management and Disaster Recovery
The question that every organization must ask: Is a potential event an inconvenience or a disaster? You can ask a subject matter expert to help answer the question or you can follow the money. Enough cash in the bank can probably sustain an organization through nearly any disaster. The question is: What is enough cash? The answer depends on the organization. There is a point when cash in the bank will run out. Under normal business, revenue from sales replenishes cash removed from the bank. When revenue stops flowing, there is a cash drain—a disaster. All efforts are focused on restoring the revenue stream to stop the cash drain. If you follow the money and identify all processes involved in maintaining the revenue stream—that is, putting cash in the bank—you’ll know if an event is an inconvenience or a disaster.
Detail Analysis Is Critical
Let’s say this is the flow of money into the organization—the old-fashioned way. –– The mail room staff picks up mail at the post office –– Mail containing checks is sent to the accounts receivable department –– An accounts receivable employee opens the envelope and separates out the check –– The check is endorsed –– A messenger takes the checks to the bank –– A bank employee opens the package of checks –– Checks are sent to a clearing house for processing –– The bank credits the company’s account once the check clears Now let’s take a look at the critical processes and assets. –– The mail room staff picks up mail at the post office ○○ The employee must awaken ○○ Get dressed ○○ Feel comfortable leaving their family to go to work ○○ Travel safely to work ○○ The office building must be intact and opened ○○ The elevators in the office building must be working ○○ The employee must be able to travel to the post office ○○ Postal employees must have arrived at work ○○ The post office must be open ○○ And so on... What can go wrong? –– The employee oversleeps –– The employee’s home is destroyed by the disaster
Risk Assessment
–– –– –– –– –– –– –– –– ––
209
The safety of the employee’s family is a higher priority than going to work Transportation is disrupted and the employee is unable to get to work The office building is closed due to an emergency The employee responsible for opening the office building doesn’t show up for work There is a power outage, preventing the elevators from working Transportation to the post office is disrupted Postal employees cannot go to work The post office is closed And so on...
Think in very basic terms: –– You need breathable air, drinkable water, power, and heat. Without one of these, your business cannot operate. What would happen if there were a water main break causing the utility to turn off water to your office building? The office building would be evacuated. –– You need people to build, sell, and buy your products. Without one of these, your business cannot operate. You can build a warehouse full of products but no one will buy them if a series of blizzards force malls to close. –– You need revenue—money coming into the business. Without a revenue stream, your organization will use cash on hand (savings) to pay expenses. Eventually the organization will run out of cash. –– Define disaster scenarios and the impact each would have on the organization: ○○ What if a major supplier suddenly went out of business? ○○ What if employees of a major customer went on strike?
Low-Level Focus The disaster recovery plan focuses in on the operational level, where detailed plans clearly provide the process to recovery from a disaster so when a disaster strikes, the staff need only to follow the disaster recovery plan. There is no need to assess the situation and then decide how to respond. Assessment and the optimum response are made in advance of the disaster when there is time to evaluate risks and options. Here are common details that need to be considered in a disaster recovery plan. –– What is the staffing level needed to maintain minimum functionality? –– Are staffing levels maintained at the minimum level for all shifts? –– How will staff arrive during a disaster? –– Where will staff be stationed in the facility during a disaster? –– Does the staff have the skillsets necessary to provide minimum functionality?
210
Chapter 7: Risk Management and Disaster Recovery
–– Where will off-duty staff who are not leaving the facility go to sleep, shower, and change clothes? –– Is there sufficient food available for staff for the duration of the disaster and the first week following the disaster? –– Will employees be more concerned about the disaster affecting their families and homes than coming to work? –– Are employees able to come to work if mass transit is not operational? –– How long will supplies last? –– How will staff communicate with each other during a disaster? Telephone communication within and external to the facility may be unavailable.
Disaster Recovery Options There are many options to respond to a disaster—some are better than others. The worst plan is to wait until a disaster strikes and then try to devise viable response options. The military coined the phrase “the cloud of war,” which describes the mindset that exists during a disaster. Few think clearly in the heat of battle. This is why the military has staff dedicated to anticipating conflicts and devising well thought-out response options for each possible conflict. Disasters—like military conflicts—should be anticipated and response options well defined before a disaster occurs. A response option definition should clearly state what to do; when to do it; how to do it; and how to measure if it worked successfully. When the disaster occurs, the focus is on identifying the best response option and then following the plan for implementing the response option. Be realistic. The disaster recovery plan must provide detailed instructions on every aspect of how employees will do their jobs during and after the disaster. Most important, the disaster recovery plan should consider the impact the disaster has directly on the employees and the employees’ families. It is not reasonable to expect that the employees will forego the care and safety of their families to handle the organization’s disaster. Think for a moment... if your house was destroyed and your employer’s operations were disrupted, what would you focus on first? The organization’s data center is critical to the sustainability of the organization since it contains applications, database management systems, databases, and related computing and networking devices needed to keep the organization functional. The failure of the data center to function is a disaster for the organization. Let’s take a look at the response options to illustrate how advanced planning for a disaster mitigates risks. –– Hot site: A hot site is a fully operational secondary data center that has all applications, databases, and computing devices found in the primary data center. The hot site is typically located in a different region of the country isolated from the environment (power, flood) that may affect the primary data center. All data from
Risk Assessment
––
––
––
––
––
211
the primary data center is nearly instantaneously copied to the hot site when data is stored in the primary databases. If a disaster occurs in the primary data center, a switch is activated, directing the organization to the secondary data center. There is no downtime. Warm site: A warm site is a secondary data center usually located in a different region of the country that has all the computing devices as the primary data center; however, applications and databases need to be installed and configured before the warm site can be activated. There is relatively short downtime. Cold site: A cold site is a secondary data center usually located in a different region of the country that doesn’t have computing devices, applications, or databases. The data center must be practically rebuilt within the cold site. There is a long downtime. Outsource site: An outsource site is when the organization contracts with a vendor to supply data center services. The risk of a data center disaster is transferred to the vendor. The vendor is responsible to anticipate and devise response options to potential disasters. Reciprocal agreement: A reciprocal agreement is an arrangement between companies that have similar technology that allows the other to use the technology during a disaster. Consortium arrangement: A consortium arrangement is an agreement among a group of firms to create a disaster recovery site that can be used by its members.
Each option has its advantages and disadvantages. The hot site has no downtime but is the most expensive since the organization is practically running a duplicate data center. The warm site costs less to operate. Typically, the organization contracts with a vendor to use its standby data center. However, there will be several days when the organization will have to operate without access to the data center. The cold site has the lowest ongoing cost but also can take weeks to become operational. Outsourcing the data center places a key element of the organization’s operation in the hands of a vendor. The contract explicitly states services that the vendor will provide to the organization. Services not included in the contract will not be provided. The organization usually has little influence on how services are provided unless stated in the contract. The organization’s sustainability depends on the sustainability of the vendor. Anything that influences the vendor’s operation (strikes, suppliers) also affects the organization. The outsource data center must meet security, regulatory, and compliance requirements. Terms of the contact should include: –– Contract duration –– Termination conditions –– Testing –– Costs –– Special security procedures –– Notification of systems changes
212
–– –– –– –– –– ––
Chapter 7: Risk Management and Disaster Recovery
Hours of operation during recovery Specific hardware and equipment requirements for recovery Personnel requirements during the recovery process Circumstances constituting an emergency Process to negotiate extension of services Priorities for making the recovery site operational
Selection of the recovery site should address the following factors. –– Number of available sites –– Distance between sites and distance for employees to travel to the recovery site –– Facilities requirements –– Office supplies –– Meals –– Living quarters for recovery employees –– Postal services –– Recreational facilities for recovery employees –– Travel cost –– Site cost –– Cost of temporary living for recovery employees –– The decision to own, rent, or share the recovery site with other organizations –– Communication requirements –– Rerouting mail A data center must be in a low-risk area to natural disasters such as floods, hurricane, tornadoes, and earthquakes. Likewise, employees of the outsource data center must live in low-risk areas too. A data center’s service to the organization is dependent on its employees. If employees are personally affected by the disaster then there is a high risk that the data center is unable to provide service. The data center must have a high level of redundancy. If any element of the data center fails, there are two or three elements that can take its place quickly. For example, if a database server fails, a replacement can be fully operational within an hour. A backup power source is necessary for all facilities. There are two types of backup power sources: battery backup and an on-site generator. The battery backup is used to power certain electrical devices, such as computing devices, for a few hours. The on-site generator is used to power certain electrical devices until the main power source is back online. Only electrical devices that are needed for the highest priorities should be on the backup power system since limited power may be available during the disaster. Make sure that backup power sources are always in working condition and are sufficient to meet the current needs of the facility. More backup power is required as the organization increases its dependency on applications.
Risk Assessment
213
Service Level of Agreement Outsourcing transfers the organization’s responsibility to a vendor. It is important to understand that the organization remains responsible for the service although the contract with vendor appears to transfer those responsibilities to the vendor. Failure of the vendor to provide the service on behalf of the organization does not relieve the organization from the responsibility to provide the service to customers. The vendor must provide the organization with a service level of agreement (SLA). A service level of agreement contains objective metrics that both the vendor and the organization can use to measure the vendor’s performance. The service level of agreement is typically part of the contract with the vendor and contains remedies should the vendor fail to perform to the expectations of the service level of agreement. The service level of agreement specifies the minimum service that the organization will receive from the vendor. Metrics used to measure the service depend on the nature of the service. Let’s say the vendor provides data center services. A common metric to use is mean time to recovery (MTTR). Mean time to recovery is the average time necessary to restore the data center functionality to the organization. If the vendor manufactures computing devices such as a server, the commonly used metric is mean time between failures. Mean time between failures is the average time period that the computing device will work before the device breaks down. This is important to know when acquiring and managing computing devices. Manufacturers test computing devices under various conditions and simulate extended usages. Test results identify a time range after which the computing device is likely to fail. You should acquire the computing device that meets your specifications and has the longest mean time between failures. Here are other commonly used metrics: –– Turnaround time (TAT): Turnaround time is the time that is necessary to complete a specific task. –– Uptime (UT): Uptime is the amount of time that the application, computing device, or data center is functioning. For example, a computing device may be unavailable for four hours a week while the MIS department performs maintenance on the device. –– First call resolution (FCR): First call resolution is the percentage of calls to a help desk that are resolved without the callers calling the help desk again. –– Time service factor (TSF): Time service factor is the percentage of calls that are answered within a specific time period. –– Abandonment rate (AR): Abandonment rate is the percentage of callers whose calls are not answered. The caller who is on a wait queue hangs up. –– Average speed to answer (ASA): Average speed to answer is the number of seconds for the help desk to answer the phone.
214
Chapter 7: Risk Management and Disaster Recovery
The IT department and operating units should have an operational level of agreement. An operational level of agreement is similar in concept to the service level of agreement except the agreement is between internal entities within the organization. For example, the IT department agrees to respond to a problem with an application within a half hour of a call to the help desk. The response is material and not simply an IT department representative answering the telephone. That is, someone knowledgeable about the system will address the concerns. IT managers can staff and plan according to the operational level agreement. Both a service level of agreement and an operational level of agreement focus on outcomes and not how those outcomes are achieved, except for ensuring that methodologies will comply with regulatory requirements. The vendor or IT may bring in another source to meet the obligation to deliver the outcome.
Disaster Recovery Operations The organization should create an emergency incident command system (EICS) that takes over operations of the organization during a disaster or emergency. EICS has a chain of command that enables fast, ongoing assessment of the disaster and the impact the disaster has on the operations. The EICS structure enables the emergency incident response team to respond to known problems and anticipate and mitigate problems that might be forthcoming. The chain of command structure is documented in a job action sheet. The job action sheet lists each position in the command structure and the corresponding roles and responsibilities. Information about the disaster including the job action sheet is shared among operational staff in the emergency incident command site. Each member of the emergency response team can view, assess, and determine the course of action appropriate to the team member’s responsibility. The emergency response is led by the emergency incident commander (IC). The emergency incident commander is the person in charge of the emergency response. All decisions rest with that person, although the emergency incident commander relies heavily on subject matter experts such as the medical team and governmental emergency management. There are four areas of concern for the emergency incident commander. Each area is called a section and has a section chief who is responsible for addressing issues within the domain of that section. Sections are: –– Operations: Operations involves maintaining an adequate level of business operations. Included are the organization management, administrative operations, production of services, regulatory and contractual compliance, and customer services.
Disaster Recovery Operations
215
–– Logistics: Logistics is the management of resources both internal and external to the operations. Logistics involve staffing, supplies, food supplies, receiving and distribution within the facility, garbage removal—everything necessary to maintain the business and care for the staff during the disaster. –– Planning: Planning involves the emergency response team anticipating needs and devising a way to meet those needs in advance. This includes developing a disaster recovery plan. –– Finance: The organization must have funds to pay for ongoing operations and for expenditures that are associated with responding to the emergency, such as overtime cost. Furthermore, the organization must ensure that incoming revenue stream is not disrupted. –– Communications: Leadership in the organization must have contact information of staff, customers, and vendors offsite so the disaster recovery team can communicate with them from anywhere during a disaster. Highlight staff, vendors, and customers who are critical to the operations.
Emergency Operations Center (EOC) The emergency operations center is a location in the facility, if feasible, and is where the disaster is managed. Typically, the emergency operations center is located in a central location within the facility, such as a large conference room or auditorium. The room should be divided five areas. One area is for the emergency incident commander and each of the other four is for a section. Each section must be clearly identified and always staffed by at least one representative of that section’s team. Communication connections should be established for each area, enabling a free flow of communication to the field, if necessary. The emergency incident commander section should display the job action sheet on a whiteboard or flip chart so each member of the emergency response team can clearly identify their role. Another whiteboard/flip chart should list the status of operations, preferably by unit and department. The status should include required staffing levels, actual staffing levels, supplies, and other factors required to operate the organization.
Downtime Procedures Downtime procedures are processes that are enacted when a disaster or emergency occurs. These are well thought-out steps that, if followed, will maintain functionality of the operations. Each downtime procedure is a recovery script that clearly states who does what and when—and specifically what should be done if the downtime procedure
216
Chapter 7: Risk Management and Disaster Recovery
doesn’t work as planned. Where possible, downtime procedures should be automated, such as having backup power automatically activated when the power fails. Downtime procedures must reflect any changes in the business process and production systems. Members of the disaster recovery team must review the rationale for changes, approve changes, and then incorporate those changes into the disaster recovery plan. There are two elements of a downtime procedure. These are to keep the organization operational and to recover once the disaster has passed. For example, sales information prior to the disaster is recorded electronically in the sales order application. However, the sales order application might be unavailable due to a power outage. Therefore, sales information is recorded on paper as part of the downtime procedure. Once the disaster is over, a procedure is necessary to record the information in the sales order application—otherwise, the sales information database is incomplete. All downtime procedures should be incorporated in the disaster recovery plan.
Contact Lists Contact lists are easily overlooked yet are critical to basic business operations. These are lists of employees, vendors, suppliers, and customers. The disaster recovery plan should specify: –– Who initiates the contact –– The priority of making contact –– Method of how contacts are made –– Instructions to give when contacted –– Contingencies if unable to contact
Disaster Drills A disaster recovery plan is only as good as the number of times that the plan is tested. Every disaster recovery plan must be fully tested to identify gaps. Does the plan work? The only way to answer that question is to test each scenario as if the scenario has occurred. Testing the disaster recovery plan is challenging. The test requires reliance on backup procedures and backup systems. Executives must determine an acceptable level of business disruption during the test. –– Can work stop? –– Can employees be diverted from their work? –– Is there time to test? –– Is there a budget for testing?
Disaster Recovery Operations
217
–– How much of a disaster do you want to create to ensure that test results are accurate? Testing a disaster recovery plan is challenging. The World Trade Center had more than 20 million square feet of office space. After 9/11, there was only 10 million square feet of office space available in Manhattan. Businesses affected by 9/11 had limited relocation options. The organization must hold disaster drills on a regular schedule during the course of the year. Disaster drills should simulate real-life disasters to test the response of the emergency response team. Although drills are scheduled, the drill should be held spontaneously. The emergency response team and the facility staff should not be alerted to the drill since disasters are rarely known in advance. The disaster drill can be segmented. For example, the data center or a portion of the data center can operate on backup power for a few hours to test whether or not the backup power is sufficient to support the data center. There are several types of disaster recovery tests. –– A checklist test is a walkthrough where no work stoppage occurs. –– A simulation test pretends a disaster occurs and uses utility software to check if the hardware and software are recoverable. No production stops. –– Parallel tests create a disaster in a parallel system—no production stops. This is a full interruption test in the non-production system. –– A recovery production test requires the business to use a hot recovery site. Backup activities are tested during a disaster drill to ensure that expected operational function is maintained by using the backup. The disaster recovery plan must be modified if backup activities are unable to support operational levels. A disaster recovery plan that is untested regularly should not be considered a valid disaster recovery plan because it has not been validated by scheduled testing. Here are factors to consider when planning a disaster drill: –– All employees, including administrators, must participate in the disaster drill. –– Make exercises realistic enough to tap into employees’ emotions. –– Practice crisis communication with employees, customers, and the outside world—assume that phone lines are down. –– Each employee should perform their expected role in a disaster during the disaster drill. –– Be sure that the disaster drill is realistic. A real disaster increases stress on staff. You want to assess how well the staff will perform under the stress of a disaster. –– Include community services such as police and fire personnel in the disaster drill. –– The goal is to find weaknesses in the disaster response and not simply walk through tasks associated with the disaster drill.
218
Chapter 7: Risk Management and Disaster Recovery
–– Make sure staff are trained to perform roles secondary to their primary responsibility (i.e., administrators are able to move food carts from the kitchen to the floors). –– Make sure employees who evacuate the premises take their belongings with them. They won’t be able to go home without car keys, house keys, and other personal belongings.
Chapter 8 Vendor Negotiations and Management Rarely can you do everything yourself, even with your own team of experts on staff. Projects are too complex and hiring expert employees is expensive, especially for a project that finishes within a few months. There’s no work for those experts once the project is over—no one is giving up a full-time job elsewhere to work on a shortterm project. The project team is supplemented by contractors who perform tasks not performed by the core team. Contractors—referred to as vendors—enter into agreements (contracts) with the organization to perform specific services within a specific time period for a specific amount of money. The service can be tasks that otherwise would be performed by the organization’s staff. The service can be to provide an entire application. The service can be to provide equipment. There is a clear distinction between an employee and a vendor. Tasks performed by an employee are controlled by the organization. The manager determines the task, when the task is performed, and how the task is performed. Every detail of the task is under the manager’s controlled unless the manager lets the employee independently handle X details—no manager wants to micromanage. A vendor is different. A vendor is contracted to produce an outcome such as delivery of a fully operational order entry application. The vendor’s contract specifies conditions within which the outcome is to be produced—time frame, cost, and other constraints. The vendor hires employees, engages other vendors (subcontractors) and determines work schedules and work rules. The vendor focuses on details. The organization simply wants results within constraints specified in the contract. Bringing a vendor onboard is complex. Success of the project—and sometimes the sustainability of the organization—depends on the vendor delivering the outcome on time and fully functional. Engaging, negotiating, and managing vendors once onboard is critical to the success.
Procurement Procurement is the process of acquiring goods and services from one or more vendors. The goal of procurement is to make the acquisition at the best possible cost of ownership, at the right quality, quantity, place, time, and from the right vendor. A good is an item such as a computing device to run an order entry system. A service is to deliver an outcome such as a vendor installing the order entry application. Think of a service as performing a task and goods as an item. Cost of ownership is the total cost of acquiring and using a good or service. This includes development, implementation, and maintenance. Development is the DOI 10.1515/9781547400812- 008
220
Chapter 8: Vendor Negotiations and Management
“effort” to get the item ready to use. Implementation is the effort to install the item, and maintenance is the ongoing effort to continue to use the item. All costs are estimated prior to choosing a vendor. For the order entry application, development includes acquiring new computers, upgrading the network, redesign of workflows, and training. Implementation includes the installation and testing of the application. Maintenance includes the ongoing support for the application, which includes hiring specialists to oversee the application. Also included is the cost of electricity, maintenance of computing devices that run the application, and the maintenance agreement with the vendor.
Procurement Process The procurement process begins when the organization identifies needs by performing a needs assessment. A needs assessment is the process of clearly identifying the desired outcome, such as automating the order entry process. Needs must be specific—in this case, clearly defining “automating” and defining “order entry processes.” The needs assessment is challenging because generally, terms and concepts must be defined. The needs assessment must explicitly identify stakeholder expectations and regulatory requirements. Needs are prioritized and the highest priorities are assigned to a project manager to meet those needs. The term “project manager” is more a functional title. The person may have a title other than “project manager.” The project manager must be able to express in writing specific needs to vendors. If needs cannot be expressed in writing, then it is not the right time to make the procurement. If the project manager doesn’t know what is needed, then how will a vendor know what is needed?
Finding a Vendor The next step in the procurement process is finding a vendor. The goal is to identify a small group of qualified vendors, each of whom can submit a proposal. A qualified vendor is a vendor who can provide the need within the requirements set forth by the organization. Here are a few common qualifications: –– Financially solvent: The vendor must have the financial resources to be self-supporting during the project. A vendor is typically paid upon delivery (i.e., successful implementation of the electronic medical record system) although some procurement arrangements may require progress payments. A progress payment is an amount paid by the organization to the vendor at designated milestones in the project. All expenses incurred by the vendor leading up to delivering the need
Procurement
––
––
––
––
––
221
are financed by the vendor. If the vendor lacks financial stability, then the vendor may be unable to pay its expenses and will be unable to complete the project. Existing relationship: A vendor who has successfully met other needs of the organization should be considered for the project if the vendor has the expertise to fulfill the need. The organization and the vendor have a comfortable working relationship with each other. Expertise: The vendor must know how to provide the need. Some needs can be met by many vendors, such as printing training materials. Other needs are highly specialized, such as implementing a specific type of order entry application. Experience: Don’t confuse experience with expertise. Expertise is the knowledge to do something, and experience is having applied that knowledge many times. A highly desirable vendor is one who has met the exact needs required by the organization many times for other similar organizations. Similarities are based on size, specialty, demographics, revenue source, and other factors that distinguish an organization. General contractor vs. integrated contractors: A general contractor is a vendor who takes on the responsibility to meet the need and will likely hire other vendors called subcontractors to perform some or all tasks needed to fulfill the need. An integrated contractor is a vendor who takes on the responsibility to meet the need and will perform all tasks required to fulfill the need. Subcontractors are integrated into the vendor’s organization. Both types of vendors can meet the need. Capacity: The vendor must have the capacity to fulfill the need within the time frame. A vendor may be financially solvent and may have the necessary expertise and experience but have too many obligations with other clients. The vendor lacks the capacity to take on the project.
Once vendor qualifications are defined, the next step is to find qualified vendors. Review current and past vendors who have done business with the organization to determine if they might be qualified for the project. Speak with counterparts in other organizations who have had similar needs to determine vendors considered for their project and the vendor selected for their project. Contact vendors who worked for the organization on similar projects. They may be able to recommend candidates. Professional associations are also good resources for suggesting vendors.
Contacting Vendors Next, prepare a “request for information” document that can be sent to prospective vendors. The request for information document provides general information about the need, the organization, and the request information about the vendor and the vendor’s products/services.
222
Chapter 8: Vendor Negotiations and Management
A representative of the vendor may follow up personally to begin discussions about the project. Discussion should focus on verifying the vendor’s qualifications and providing generalized information about the project. Project specifications will be sent later to selected vendors in a formal document called a request for proposal (RFI), once you are sure that vendors are qualified for the project. Some vendors may not be interested in the project for a variety of reasons, including that the vendor doesn’t feel qualified to meet the need. A vendor may decide that the project is too small or too big for the vendor. The vendor may also decide to pass because there is not enough profit for the vendor in the project, or the vendor doesn’t like doing business with the organization. There are two outcomes from the request for information. The project manager learns more information from vendors about the realities of meeting the need. Initially, administrators and the project manager may think this is a relatively straightforward project. However, the complexities of the project become known after speaking with vendors. The other outcome is weeding off the vendor list both by vendors who feel the project isn’t right for them and by the project manager who determines that a vendor is not a good candidate for the project.
The Proposal Once information is received from vendors and vetted, the project manager creates a request for proposal (RFP), sometimes referred to as request for quotation. The request contains specific information about the project including expectations, timeline, and constraints. Expectations are specific outcomes. Constraints are limitations within which those outcomes are to be achieved. It is critical to be specific in the request for proposal because this is the information that the vendor will use to propose how the vendor is going to meet the needs. The request for the proposal must state a deadline when proposals are due. The vendor will submit a proposal to the project manager. The proposal is a description of how the vendor is going to meet the need based on information in the request for proposal. The proposal should contain the price to be paid to the vendor; however, some vendors may prefer to omit the price to give the vendor leverage in negotiations. Don’t review proposals until after the deadline specified in the request for proposal. The project manager should assemble a committee of appropriate stakeholders to review all proposals. The review should consider: –– Completeness: The proposal should address all specifications mentioned in the request for proposal. Be alert. Some vendors address some but not all specifications. There are many reasons for submitting a partial proposal. Regardless of the
Procurement
–– ––
––
––
223
reason, the vendor who submits an incomplete proposal is indicating non-compliance with the organization’s request. Specificity: The proposal should address requested specifications in sufficient detail so there can be a comprehensive understanding of the proposal. Restatements: Statements of requirements mentioned in the request for proposal should remain unchanged in the proposal. Some vendors may reword the statement, leading to vagueness in the requirements that were clear in the proposal. For example, the request for proposal may state that the vendor will provide an order entry system that includes web-based order entry by customers, order management reporting, and an online order tracking system for customers. The proposal might state that the vendor will provide an order entry screen and not mention that it is web-based. Time, price, and other elements of the proposal may not reflect those components. Authorization: The proposal must be submitted under the name of the company representative who has authority to submit a proposal. For example, a sales representative may not have authority to submit binding proposals. Feasibility: The committee must determine if the proposal is feasible. If all but one proposal extends the timeline mentioned in the request for proposal and the cost is 30% less than other proposals, then there is a question of whether it is feasible for that vendor to fulfill the terms of the proposal.
The goal is to select two or three of the most promising vendors for a more thorough review. The review entails personal interviews, sampling their product/service, visits to vendor clients, and other activities that provide detailed information about the vendor.
Risks of Procurement On the surface, most or all responsibility and related risks associated with fulfilling the need are shifted from the organization to the vendor. If the vendor is contracted to implement an order entry application, then the vendor is responsible for the implementation. All risks associated with the implementation are born by the vendor. In reality, the organization remains exposed to those risks also. Although the organization may withhold payment to the vendor for non-delivery, the need remains unfulfilled. Repercussions for not fulfilling the need, such as regulatory obligations, must be handled by the organization. Therefore, the project manager must monitor a vendor carefully during the project to ensure that the vendor is fulfilling the contract and meeting the need. There are common areas where procurement is likely to fail. These are:
224
Chapter 8: Vendor Negotiations and Management
–– Unclear statement of work: The request for proposal and the proposal do not specify all the detail of the work and therefore appears less complex than the work is. What the organization needs is more than what appears in the request for proposal. –– Overstatement of work: The request for proposal and the proposal are too specific, not giving the vendor sufficient flexibility to deliver what the organization needs, not what they want. –– Communication breakdown: Specifications that are costly or impracticable in the request for proposal are pointed out by the vendor, but are not changed and remain in the proposal. –– Poor diligence: The project manager did not verify everything in the vendor proposal, resulting in a proposal that favors the vendor over the organization. The vendor is promising everything but is not capable of delivering it. –– No basis for price: The proposal should specify how the vendor arrived at the total price for the project. This provides a foundation for understand the pricing of the project. The project manager should have a good understanding of how the vendor arrived at the price. This enables the project manager to estimate the cost of fulfilling the need within the organization. This estimate should compare with the vendor’s proposal to determine if the vendor’s proposal is reasonably priced. –– One-sided proposal: Either the organization or the vendor have self-serving terms that are not negotiable.
Negotiation Negotiation is the process of two or more parties reaching an agreement. The negotiation process can be as simple as two people agreeing to meet at a specific time and place or as complex as a labor contract between the organization and the bargaining unit. The project manager will be required to negotiate contracts with vendors to provide various services to the organization. Negotiations with a vendor can be as simple as agreeing on a price to print training material supplied by the organization or as complex as acquiring the rights to use an application. The complexity of negotiations is determined by the complexity of the product/service that the vendor provides and the impact that the product/service has on the organization. For example, implementing an application involves not only acquisition and installation of the software application but also many other factors. These include: –– Liability: Success of the business depends on the vendor’s application working properly. If the vendor’s application malfunctions, then there is a potential loss for the business. It is important to identify if the vendor is liable and will reimburse the business for losses related to malfunctions of the vendor’s application.
Negotiation
225
–– Data: If information collected by the application is stored in a database maintained by the vendor then there must be an agreement as to who owns the data. –– Accessing data: Data collected by the application can be accessed using the application. The organization and the vendor must decide if the organization can directly access the data from the database bypassing the application. For instance, it may be important to generate reports that are not available in the application. –– Transferring data: The organization should determine how data can be transferred to a different application should the organization decide to terminate the agreement with the vendor. –– Regulatory compliance: Standards required by regulatory authorities are typically incorporated into the application. The organization and the vendor must agree that the vendor will ensure that the application is always regulatory compliant. –– Price: Both parties must agree on what is and is not covered in the price of the application. Likewise, there should be agreement on a pricing structure for items not covered under the quoted price. For example, minor upgrades are covered and major upgrades are not included. Also included should be guidelines for prices when the agreement is renewed, such as no more than a 10% increase in price. –– Termination: All agreements end. The goal is to agree on an amiable termination arrangement during negotiations. This includes not renewing the agreement, breach of the agreement, and if the vendor is to be sold, merged, or go out of business.
Preparing for Negotiations Complex negotiations require a strategy similar to a chess game where there is a gambit (opening move), counter moves, testing the opponent in a mid-game strategy, and then developing the end game winning move. Chess is a zero-sum game—there is a winner and a loser. Negotiation is not a zero-sum game—there are no ties. Instead, each party wins something and loses something. The goal for both sides is to win elements that are important to them. For example, the price must reflect a reasonable profit for the vendor. Both sides negotiate to define “reasonable.” There is a price point where the vendor will walk away from the negotiation because there isn’t sufficient profit to be made on the project. Likewise, there is a price point when the organization is unwilling or unable to pay and will walk away from negotiations. If both sides can’t define a reasonable price, then there is no purchase.
226
Chapter 8: Vendor Negotiations and Management
Preparation is the first step in the negotiation process. Preparation will identify: –– Required factors: Required factors are factors that are not negotiable. If those factors are not in the final agreement, then there is no agreement and the organization will walk away from negotiations. For example, the application must be compliant and remain compliant with regulatory authorities. –– Negotiable factors: Negotiable factors are factors that are nice to have but will not stop the organization from agreeing with the vendor. For example, the application is fully integrated with other applications. However, the organization will accept a partially integrated system. –– Not required factors: Not required factors are factors that are unimportant to the organization. For example, the organization is not a member of the vendor’s customer advisory committee. Preparation will also assess the same factors for the vendor. That is, required factors, negotiable factors, and not required factors for the vendor. The assessment is not perfect, but will provide a rationale for developing a negotiation strategy. Know the worth of the project to the vendor and to the organization. Does the vendor need the organization more than the organization needs the vendor? The answer to this question determines which side is negotiating from strength. Once factors are identified for both sides of the negotiation, the organization focuses on developing a negotiation strategy—a strategy for each stage of negotiations. The organization’s negotiating team plots moves. The vendor’s proposal is the opening gambit for the vendor. The organization determines a counter proposal and projects the vendor’s possible reactions to the counter proposal. The outcome of the planning process is a well thought-out plan for negotiation. No negotiation strategy is perfect. Unanticipated actions by the vendor can set any well-designed plan into turmoil. Therefore, the organization must remain flexible throughout the negotiation process.
Negotiation Strategy Decide your breakpoint. The breakpoint is the point where the terms are the least that are acceptable to the organization, such as required factors. Next, open with an extreme position that is far above the breakpoint. These are required factors and negotiable factors that may be presented as non-negotiable. The worse that can happen is the vendor returns with a counter offer. The difference between the organization’s initial offer and the vendor’s counter offer is the bargaining range. A settlement will be reached somewhere within this range. Therefore, it is important to make the initial offer as extreme as possible in order to give the organization sufficient room for negotiation. Negotiations should
Negotiation
227
incrementally narrow the bargaining range. Each increment should be perceived as a major concession regardless of the actual importance of the factors negotiated. Identify incentives and disincentives. An incentive is something of value to the vendor that can be offered if the vendor performs certain actions. For example, the vendor will receive a 10% increase over the negotiated price if the project is delivered fully operational six months ahead of the schedule. Of course, all specifications must be met. A disincentive is something of value to the organization—and not to the vendor— that should occur if the vendor does not perform specific actions. For example, the organization will penalize the vendor 1% of the price for each month over the scheduled deadline. A disincentive typically specifies conditions related to the lack of action, such as the penalty being voided if the organization materially alters the original specifications.
Value Value is the perception of worth. The perception of value is used in negotiation to create a relatively low-cost incentive to reach terms of a contract. Let’s say that a vendor who supplies computing equipment offers the organization printers at a reasonable cost if the organization contracts with the vendor to supply computers for the entire facility at market value. That is, other vendors can provide the same computers at relatively the same price. The vendor perceives printers as having a relatively low value because the vendor has many printers in inventory. Yet the organization views printers as a high value because the printers are being acquired at what is perceived as below market value. The organization should evaluate perceptions of value when deciding to negotiate incentives.
Do the Math A good negotiator will estimate how much it costs to provide the product/service that is being negotiated. Let’s say that the vendor is going to charge $8,000 to provide a vendor’s technician at your business location during the first four days of when the vendor’s application goes live. Is this a good deal? One way to assess the value of this support is to estimate the expenses that the vendor incurs by sending the representative to the organization’s location. There is the cost of air travel, car rentals, hotel stay, a reasonable amount for meals, transportation to and from the organization’s site, and the salary/benefits paid to the employee. Calculating the cost gives a rough estimate of whether or not the $8,000 is reasonable. This also serves as a basis for negotiation. The vendor may be willing to lower
228
Chapter 8: Vendor Negotiations and Management
the price because they perceive the value to be relatively low. That is, the employee’s compensation is being paid whether or not the employee is onsite or in the office. Therefore, the vendor may be willing to waive the employee’s compensation part of the $8,000 if pushed by the organization. Superstores, such as Walmart, that offer extremely low prices, calculate every cost associated with products purchased from vendors. The superstore knows how much it costs to make the product. They leverage this knowledge to negotiate low prices from vendors who supply the product. In essence, the superstore knows terms that make the deal marginally good for the vendor.
Payment The payment schedule is another bargaining point. The vendor is expected to finance the project depending on the nature of the product. Be sure that the vendor is economically sound before engaging the vendor. A vendor should be able to finance the entire project without progress payments. The organization makes one payment to the vendor once the project is delivered and accepted. In some projects, the vendor requires an initial payment either at contract signing or when the project begins. This is referred to as a good faith payment. The vendor may also request progress payments throughout the project. A progress payment is a portion of the total payment that the organization makes to the vendor at specific milestones during the project. The remaining payment is made when the organization accepts the delivered project. Progress payments help the vendor’s cash flow during the project. Cash flow is the time when actual amounts are paid by the organization and received by the vendor. A progress payment replenishes funds that were expensed for a segment of the project. Progress payments must be associated with delivered value. Let’s say the vendor is contracted to upgrade computers, servers, and the local area network and implement an application for the organization. These are milestones of the project. A progress payment can be made after computers are upgraded because the vendor delivered a value to the organization. The organization has state-of-the-art computers even if nothing more is done on the project. The vendor can walk away from the project and the organization can contract with another vendor to complete the other segments of the project. Progress payments must not be made if the vendor does not deliver value. Let’s say that the vendor installed a program that enables the vendor’s application to be modified by the organization’s programmers. Although this is a project deliverable, it is of no value to the organization. The program can only be used for the vendor’s application. The program is of no value to the organization should the vendor walk away from the project.
Negotiation
229
General Contractor A project may require a number of specialist vendors. For example, the vendor supplies the application, and vendors upgrade computers, upgrade the network, upgrade servers, and train the staff to use the application. You must decide to use a general contractor or deal directly with specialty vendors. A general contractor is a vendor who takes on responsibility for the entire project. The primary benefit of using a general contractor is that the organization interacts with one vendor. There is no need to search for specialty vendors. There are two major disadvantages of engaging a general contractor. First, the organization has no control over subcontractors. Although the organization typically sets rules for subcontractors such as granting access to the facility, the general contractor controls subcontractors. Subcontractors take orders from the general contractor, not the customer. The other factor is price. The general contractor’s price includes a charge for overseeing subcontractors plus a charge for each subcontractor. Typically, the price is an aggregate price for the entire project, not broken down by subcontractor pricing. The organization must weigh the options of being its own general contractor engaging each specialty vendor or engaging a general contractor. Taking on the role of the general contractor enables the organization to negotiate prices for each specialty vendor and give the organization control over specialty vendors. However, this doesn’t guarantee a better price than that given by the general contractor. The general contractor may be in a better negotiating position than the organization. For example, the general contractor has leverage where the subcontractor may work on other projects for the general contractor. The organization has one project.
Face-to-Face Negotiation Face-to-face negotiations occur only when the project manager is prepared for negotiations. Never enter unprepared, regardless of pressures placed by the organization or the vendor. The goal of negotiations is to achieve a contract that sets terms of how the vendor will provide products/services to the organization. In theory, the vendor wants to do as little as possible to achieve the outcome while receiving the maximum amount of money for the outcome. The organization wants to achieve the best outcome while spending little money. Negotiations bring both parties to realistic terms. Face-to-face negotiation is more about perceptions, personalities, and salesmanship than specifications of the project. Every facet of negotiations influences the negotiations. An objective is to control each facet, such as the site of negotiations. The negotiation of the site location is a stage that influences perceptions. For example, an organization is perceived to be negotiating from a power of strength if negotiations
230
Chapter 8: Vendor Negotiations and Management
are held on the organization’s site, especially if discussion takes place in the executive offices. Knowing this, critical contracts are negotiated at a neutral place, such as in a hotel conference room, so neither party has an advantage over the other party. Your behavior sets the tone of negotiations. You want to project a perception of strength, giving the impression that you negotiated contracts successfully many times in the past even if this is your first negotiation. Speak loudly—more than usual— and greet with a strong handshake. Lead the negotiations process. Say, “Thanks for coming today. Let’s begin by reviewing our needs.” This opening shows that you are in control and the other party is following your lead. Your body language communicates your feelings before you say a word—therefore, always maintain a professional, strong body language. Sit up straight. Maintain eye contact at all times. Display neutral or friendly facial expressions even if you are worried. Speak with clear terms. Avoid vague terms. You need an order entry system that is fully integrated with your current applications. Never use “about,” “approximately,” or “it would nice to have,” because these give rise to each party defining the terms in their favor. Don’t read too much into the other party’s body language. A good negotiator will telegraph misleading messages using facial expressions.
Negotiating Terms Never be the first to talk price. Price is the value placed on the product/service. Each party has its own valuation of the product/service. The first party to talk about price sets the approximate value of the product/service. Let’s say that the organization is willing to pay $200,000 for the order entry application. The vendor is willing to provide the application for $100,000. If you are the first to mention $200,000, then the vendor is likely to ask for slightly more than $200,000. The value of the product/ service before either party spoke about price is unknown. The value could be $10, $10,000, $100,000, or a million dollars. The first to speak about price sets the magnitude of the value. At some point, one party breaks down and mentions price. In the theory of negotiations, the vendor’s first price is higher than the eventually agreed upon price and the organization’s price is lower than that settled price. The vendor may use an alternative negotiation strategy that settles for an unrealistically low price. Here’s how it works. The vendor’s price is based on written specifications provided by the organization. However, specifications are incomplete. The organization’s written request failed to specify all the elements necessary to achieve the desired outcome. The vendor presents an attractive price; contracts are signed and the project begins. At some point during the project, the vendor and the organization realize additional work is necessary. The vendor is in a position of strength to dictate the price for the additional work. The work is necessary. Another vendor
Negotiation
231
is unlikely to bid on that work since the project is underway. It is critical that both parties explicitly state what is and is not included in the price. Likewise, it is critical that the organization take time to develop a complete set of specifications for the project before asking for proposals. Hold back items you want until the close of negotiations. This is the time when both parties are usually in a positive mindset, looking beyond negotiations toward starting the project. Introducing one or two items that you want included in the contract at the time of signing places pressure on the other party. They can agree and move ahead with plans or disagree and start the process all over—finding a vendor, negotiating a contract. Avoid hostility leading up to and during negotiations. A party may refuse favorable terms because the party holds a grudge with the other party to the negotiations.
Terminate Negotiation Negotiations end either with an agreement of understanding or no agreement. An agreement of understanding is a written document that contains key items that are in agreement and is signed before parties leave the last negotiation session. It is common that this agreement is handwritten and serves as the foundation for attorneys to draw up formal contracts. If no agreement can be reached after several negotiation sessions, then negotiations are terminated—the breakpoint has been reached—and the organization seeks another vendor. Each party determines its own breakpoint. A breakpoint is when the party walks away from negotiations because the other party is unreasonable in the eyes of at least one of the parties and/or not negotiating in good faith. Negotiating in good faith means that a party is sincerely interested in reaching an agreement. There are times when a party has an ulterior motive for entering into negotiations. For example, an executive in the organization may not be in favor of implementing the application, but enters negotiations with a vendor for political purposes, making it appear that the executive is interested. In reality, the executive is not negotiating in good faith and is doing everything to encourage the vendor to terminate negotiations without an agreement. Large consulting firms that offer a variety of managerial and MIS services may provide the initial service at a very reasonable price. Employees of the consulting firms will prospect for additional work while providing the contracted service. Their goal is to find other workflows not working within the organization and offer to fix them for an additional price.
232
Chapter 8: Vendor Negotiations and Management
Conflict Resolution Negotiations will likely bring disagreements between the organization and the vendor. Disagreements can also occur during the project even though terms are defined in a contract. An effort should be made to resolve conflicts amicably through the conflict resolution process. The conflict resolution process helps parties focus on the facts in dispute and then try to rationalize viable options that will bring parties to an agreement. Disputes tend to be disproportionate to factors in the agreement, resulting in distortions of the situation. Let’s say the vendor successfully delivered 99% of deliverables and 1% was unacceptable, though the vendor believes it complied with the contract. Both the organization and the vendor disagree that 1% of the deliverables is acceptable. Discussions focus on the unacceptable deliverable rather than reviewing the whole project. Focusing on the negative intensifies the importance of the item in dispute. The primary goal of conflict resolution is to help parties develop a balanced view of the project.
Fact Finder, Mediator, and Arbitrator Parties to a dispute should ask a third party to intervene and help resolve the conflict. There are three types of third parties that can intervene in a dispute. These are: –– Fact-finder: A fact-finder is an unbiased third party whose sole purpose is to review relevant information including contracts, performance records, industry standards, and other elements that influences the dispute. The fact-finder assembles facts into a logical flow and presents a list of facts to both parties. The parties use the list of facts as a starting point to renegotiate the conflict. –– Mediator: A mediator is an unbiased third party who is trained to mediate disputes. The goal of a mediator is to help both parties to resolve the conflict. The mediator begins with fact-finding and then interviews each party separately and confidentially to understand the situation and the party’s concern. The mediator then tries to find a middle ground and options that are acceptable to both parties. The mediator does not resolve the dispute. Both parties resolve the dispute with the help of the mediator. –– Arbitrator: An arbitrator is also an unbiased third party; however, an arbitrator resolves the dispute—not the disputing parties—in a process called arbitration. An arbitrator is judge and jury. Each party submits facts and their position to the arbitrator either in person, in writing, or both. The arbitrator makes a judgment and the dispute is resolved. There are two types of arbitration. These are non-binding and binding. Non-binding arbitration occurs when either or both parties can reject the decision of the arbitrator. Binding arbitration occurs when
Conflict Resolution
233
the decision of the arbitrator legally binds both parties. Both parties must accept the resolution.
Suing Disputes related to contracts are usually not resolved in court. Although each party to a contract dispute has the right to sue, but suing is most often not practical. In addition to legal cost, it can be years before the parties appear before a judge. The impact to the project is devastating. Both parties lose. Binding arbitration is a preferred route to suing in court because parties can be heard before an arbitrator within weeks rather than years. The arbitrator is likely to be a retired judge or lawyer who applies many of the same practices found in court. The issue can be expedited within a reasonable time period with limited impact on the project. A contract usually has a clause that states all contract disputes will be resolved in binding arbitration. Parties who sign a contract that contain the binding arbitration clause are obligated by law to agree to the judgment of the arbitrator. Furthermore, parties agree to waive their right to bring the dispute to court.
The List A disagreement is typically small, although a small disagreement can stop parties from agreeing. The mediator’s role is to illustrate to both parties exactly what is in disagreement. In this way, the parties may see a balanced view of issues affecting the project. More times than not, both parties are in agreement. It is just that the disagreement overshadows items that are in agreement. The mediator lists all items in agreement on a legal pad or on a whiteboard. The objective is to make this list visibly long. The list is assembled from fact-finding conducted by the mediator prior to bringing both sides together to review the mediator’s findings. The mediator presents a fact and asks each party if they are in agreement. The mediator knows the answer because the mediator has met individually with the parties and explored the same information. The mediator asks and lists only factors that the mediator knows are in agreement. Beside the list of agreed upon items, the mediator lists items in disagreement. Again, the mediator knows these items but still asks each party for their opinion and lists the item in the second column of disagreed items. By nature and sometimes by the mediator’s design, the list of disagreed upon items is noticeably smaller than the list of agreed upon items. This illustrates to the parties that they agree more times than disagree.
234
Chapter 8: Vendor Negotiations and Management
Next, the mediator examines each item in disagreement and decomposes the item into subcomponents. Subcomponents are entered into two new lists—subcomponents in agreement and subcomponents in disagreement. The list of subcomponents in agreement typically is longer than those in disagreement. The process continues until there are many sets of lists, each dissecting items in disagreement until the mediator ends with a few items that cannot be dissected further. This becomes the focus of the mediation. The mediator then asks each party to focus on options to resolve the remaining disagreement. In doing so, the mindset of parties in the mediation conference changes from adversarial to cooperative. Disagreement related to a situation with the project can likely be resolved because both parties are interested in completing the project. However, a disagreement in deeply rooted value, called a disagreement in principle, is difficult to resolve because the disagreement has little or nothing to do with completing the project.
Stages of Adoption Disputes sometimes center on the adoption process. When we ask someone to change and adopt something new, the person works through the stages of the adoption process. Each stage must be successfully completed before moving to the next stage, leading to the acceptance of the change. Let’s say the vendor wants to change the project manager’s opinion on how the order entry application should be implemented. A dispute is likely to arise if the vendor does not give the project manager time to work through the stages of adoption. The vendor wants the project manager to make a decision quickly. The project manager is likely to resist, leading to a dispute. There are five stages of adoption. These are: –– Awareness: In the awareness stage, the project manager becomes aware of a possible solution to a problem involving implementation of the order entry application. The solution is proposed by the vendor. –– Exploration: In the exploration stage, the project manager takes a superficial look at the solution to determine if this is a possible solution. –– Examination: In the examination stage, the project manager takes a detail look at the possible solutions to uncover reasons why the solution will fail. –– Test: In the test stage, the project manager tries the solution under various scenarios to assess whether or not the solution is viable. –– Adoption: In the adoption stage, the project manager agrees with the vendor that the solution will solve the problem.
Contract
235
In order to avoid disputes, the vendor must give the project manager time to work through each stage of adoption. The vendor can assist by providing the project manager resources, information, and other facts that help assess each stage.
Contract A contract is an agreement between two or more persons to do something or to refrain from doing something in exchange for something of value called consideration. A person is a legal entity. Individuals can also create a legal entity to limit liabilities by forming a partnership or corporation that has the right to act as a person. Liability is the risk associated with conducting business. For example, the vendor is liable to perform terms of the contract. Failure to do so exposes the vendor to legal action taken by the organization. There are three common forms of legal entity. These are: –– Individual: An individual is a legal entity who can conduct business. The individual pays all expenses associated with the business and receives all revenues generated by the business and is personally liable to fulfill obligations stated in a contract. –– Partnership: A partnership is formed by two or more individuals who work toward a common goal, such as providing a product/service to earn a profit. Each partner provides value to the partnership in the form of assets (money or property). Partners then share the revenues and profits realized by the partnership. There are two types of partnerships: ○○ General partnership: In a general partnership, each partner is responsible for the liability of the partnership. A general partner’s personal assets can be used by creditors to pay the liabilities of the partnership. ○○ Limited liability partnership: A limited liability partnership legally limits a partner’s liability to the partner’s investment in the partnership. Personal assets cannot be used by creditors to pay the liability of the partnership. Each partner participates in managing the business based on the partnership agreement. The partnership agreement specifies the terms of the partnership. One partner may manage the business (managing partner) and other partners are advisors. Alternatively, all partners manage the business. –– Corporation: A corporation is a legal entity formed by individuals called stockholders authorized by the government to act as a person. Stockholders provide the corporation with assets—usually money—in exchange for shares of stock. Each share has the right to receive a portion of profits generated by the corporation. Profits are distributed per share. Stockholders elect a board of directors
236
Chapter 8: Vendor Negotiations and Management
who hires individuals to manage the business. A stockholder does not manage the business.
Elements of a Contract There are two types of contracts. These are: –– Oral contract: An oral contract is a verbal agreement between two parties that contains all the elements of a contract. Although terms of the agreement are not written, the contract is enforceable under the law. However, the Statute of Fraud specifies that some contracts, such as for real estate, must be written. –– Written contract: A written contract is a document that contains term of an agreement between two parties. Contracts between the organization and vendors are typically written contracts. Every contract must have four elements. An agreement that does not have these four elements is not a contract. These are: –– Offer: An offer is the promise to do something or refrain from doing something in the future. –– Consideration: Consideration is something of value promised to the party who makes the offer. The value can be money, a product, or a service. Don’t confuse a gift with consideration. A gift is the transfer something of value without receiving something (consideration) in return for the transfer. –– Acceptance: Acceptance is the action that clearly agrees to the offer. Acceptance can be conveyed in words, deeds, or performance accordance with the terms of the offer. Actions contrary to the offer can be construed as rejecting the offer or making a counter offer. Acceptance must occur within a reasonable time. –– Mutuality: Mutuality means that both parties to the contract understood the terms of the contract when the contract was signed. During negotiations, each party can make an offer. Once an offer is made, the other party can accept or make a counter offer. Once a counter offer is made, the original offer is no longer valid. The party who rejects the offer cannot accept the offer once a counter offer is made. Let’s say that the vendor offers to implement an order entry application that performs a specified functionality for $100,000. The organization proposes a counter offer for additional functionality for $90,000, which is rejected by the vendor. At this point, no offer is on the table. The organization cannot say, “OK, I’ll take it for $100,000” and expect the vendor to be obligated to return to the original offer.
Contract
237
Breach of Contract A breach of contract occurs when one party to the contract fails to perform some or all terms of the contract without legal cause. There are two types of breaches of contract. These are: –– Material breach: A material breach occurs when the other party receives something substantially different than that specified in the contract, such as the order entry application not performing as specified in the contract. –– Minor breach: A minor breach occurs when the other party receives substantial things specified in the contract but not a material item. Let’s say that the contract calls for the vendor to implement the order entry system by a specific date and the application was implemented two weeks late. This is a minor breach unless the due date was material to the organization’s operation, such as the existing application being deactivated by the vendor and the organization being without an order entry application for two weeks. A breach of contract is not enforceable if there is legal cause to breach the contract, which is commonly referred to as a defense to the breach of contract. Common legal causes are: –– Illegal: Actions specified in the contract are contrary to law, such as committing a crime. –– Missing element: The contract lacks one or more of the four elements of a contract. –– Mutual mistake: The contract contains terms that both parties misunderstood. –– Unilateral mistake: One party to the contract agreed to what is an obvious mistake and the other party knew or should have known of the mistake. –– Fraud: One party obtained the agreement fraudulently. –– Performance is impossible: A party is unable to perform a requirement of the contract. Let’s say that the vendor is to implement an order entry application by a specific date. The organization is responsible for supplying the necessary computing devices; however, the computing devices are not operational by the due date, preventing the vendor from implementing the order entry application by the contracted date. –– Lack of capacity: A party to the contract is not of legal age or is mentally incompetent to enter into a contract. –– Met industry standards: Industry standards are written or unwritten standards of performance within an industry. As long as the vendor meets industry standards—even if those standards are not written into the contract—then the vendor has not breached the contract.
238
Chapter 8: Vendor Negotiations and Management
Industry Standard Although terms of a contract govern the performance of each party to the contract, a vendor is obligated to perform according to industry standards. An industry standard is either a formal rule set forth by a regulatory authority, by a professional organization, or a practice widely accepted in the business community. For example, you place an order for lumber 2 inches thick and 4 inches wide (2X4). You receive lumber that is 1.5 inches thick and 3.5 inches wide. The lumber company did not breach the contract because it met industry standards. The milling process trims a half of an inch from the lumber. Industry standard is either implied or explicit in a contract. Both parties are expected to know the industry standard and understand that meeting the standard is acceptable performance of the contract. Industry standard is explicitly mentioned in a contract to define expectations for the deliverable. For example, the contract stipulates that the order entry application meets regulatory requirements. Regulatory requirements associated with the order entry application become an enforceable part of the contract terms. Always inquire about industry standards that might apply to the contract during negotiations with the vendor. The vendor may assume there is a mutual understanding that industry standards will apply without explicitly mentioning those standards.
Contract Interpretation A contract is composed of words and phrases either spoken or written. Both parties must have a good understanding of the meaning of those words and phrases. Any misunderstanding that is not resolved before the contract is signed can lead to a material or minor breach of the contract and lead to discourse between the organization and the vendor. Properly written contracts by an attorney will define key words and phrases at the beginning of the contract to reduce any misunderstanding. However, no matter how carefully written a contract, there is always room for misinterpretation. Each party might be able to honestly arrive at a different interpretation for the same word or phrase. Contract disputes are commonly resolved by an arbitrator or the courts by the arbitrator or judge who determine the meaning of the word or phrase. The decision is based on the entire contract, industry standards, and the ordinary meaning of the word or phrase. The arbitrator or judge initially looks at parties’ intentions by comparing elements of the contract and the performance of each party. Focus then turns on the customary usage of the term within the community (i.e., specific business or location).
Contract
239
The Uniform Commercial Code (UCC) in the US The Uniform Commercial Code is a recommendation of laws that states should adopt to promote uniformity of commercial laws throughout the United States. The Uniform Commercial Code was developed by the American Law Institute and the National Conference of Commissioners on Uniform State Laws. The goal is to create a set of rules for commercial transactions. Rules for commerce are set in law by each state. Issues can arise over commercial transactions that cross state lines. Conflicts can be avoided if each state adopts the Uniform Commercial Code as state law. The Uniform Commercial Code defines rules for the sales of goods, leases, negotiable instruments, banking, letters of credit, auctions, liquidations, title documents, and securities. Furthermore, the Uniform Commercial Code contains recommendations for formation of a contract, breach of contract, and so on.
United Nations Convention on Contracts for the International Sales of Goods (CISG) The CISG is a multinational treaty signed by 70 countries that creates a uniform law to govern international sales of goods. CISG is similar to the UCC, but there are differences. For example, the CISG does not require a contract for the sale of goods to be in writing. The presumption is that an international sales contract is governed by the CISG unless parties to the contract expressly exclude the CISG from terms of the sales. Don’t assume that all countries have adopted the CISG. The CISG is not enforceable unless the country where the party resides signed the treaty. Also, the CISG usually preempts the UCC. The CISG is adopted by the federal government. UCC is adopted by each state government. Federal law precedes state law. This means if a party in the United States enters into a contract for the sales of good with an international party, the CISG rules are enforceable—not the UCC rules.
Warranty Products or services provided by a vendor are covered under a warranty. A warranty means that the vendor provides assurance that the organization will receive the desired outcome. For example, that the order entry application will manage orders according to the specification in the contract. If expectations are not met, then the vendor will remedy the situation. There is an implied warranty granted by the vendor even if a warranty is not mentioned in the contract. An implied warranty means that by the nature of the agreement, the vendor assures that the product/service will meet the promised outcome. For example, the order entry application is expected to manage orders.
240
Chapter 8: Vendor Negotiations and Management
A limited warranty is a warranty defined in the contract. For example, the vendor may specify that a computing device has a 30-day warranty. After 30 days, the organization is responsible for parts and labor should the computing device fail. All terms of the warranty are written into the contract. A warranty covers normal wear and tear on the product. If the organization abuses the product by not following the manufacturer’s recommended usage, then the warranty is voided. Likewise, the warranty will not cover acts of nature or malicious destruction. An act of nature is an event that occurs outside of anyone’s control, such as a natural disaster. An extended warranty may be available from the vendor or from an insurer to cover events that are excluded from the conventional warranty. The organization must carefully weigh the likely occurrence with the expense of the extended warranty.
Remedies A breach of contract can be remedied by providing compensation to the other party. This is commonly referred to as damages. There are two types of damages. These are: –– Compensatory damages: Compensatory damages return the other party to whole. That is, the party who breached the contract gives the other party compensation to make up for the party’s actual loss. Let’s say that the order entry application malfunctioned, causing the organization to use a paper backup system. The cost of implementing the paper backup system is considered compensatory damages. –– Punitive damages: Punitive damages punish the party who breached the contract. Punitive damages are also known as exemplary damages. For example, a vendor who abandons the implementation of the order entry application halfway through the project may be punished by the arbitrator or judge for this intentional act. Both compensatory and punitive damages can be awarded. Typically, settlements are reached by both parties for less than the awarded amount, as there are additional legal expenses to appeal the decision. Remedies are commonly written into the contract to avoid litigation. Both parties anticipate common reasons for a breach of contract and agree to remedies that will automatically be implemented should a breach occur. For example, the vendor agrees to compensate the organization for data lost due to a malfunction of the application. A specific dollar amount may be specified or a formula for calculating that amount may be written into the agreement.
Working the Contract
241
Modifying a Contract A contract can be modified with mutual consent at any time after the contract is signed. Minor modifications are written as an addendum to the existing contract and placed at the end of the contract. Major modifications typically require the reopening of negotiations. Modifications must take into consideration that one or both parties have begun work on the project based on terms in the existing contract. Modifications should not attempt to alter completed or partially completed work unless both parties consider such changes reasonable.
Memorandum of Understanding A memorandum of understanding is a written agreement that clarifies an understanding between two parties. A memorandum of understanding may or may not be a contract, depending on the structure of the memorandum. If the memorandum has all elements of a contract, then the memorandum is a contract and is enforceable. A memorandum missing at least one element of a contract is not a contract and therefore is not enforceable. However, the memorandum may be used to show the intention of both parties to an arbitrator or judge.
Contract Termination A contract terminates when the conditions of the contract are met by both parties and when both parties mutually agree to terminate the contract prior to meeting the contracted conditions. The contract can be terminated by mutual consent at any time. A contract may contain a termination clause. A termination clause is a portion of the contract that specifies when the contraction can be terminated and terms of how parties will be compensated, if necessary.
Working the Contract A contract is a working document during the life of the project and contains all of the items agreed upon between the vendor and the organization. Both parties must manage the project within the terms of the contract. Neither party has the right to deviate from the intent and the letter of the contract. The intent of the contract is the overall purpose of the contract as defined by words and phrases within the contract. For example, the intent is that the vendor and the organization will implement the application for the organization. The letter of the contract refers to words and
242
Chapter 8: Vendor Negotiations and Management
phrases used in the contract. The contract might state that the vendor may substitute materials. The term may mean that the vendor has the option, not the organization. Both the intent and letter of the contract must be followed during the project. Failure to do so results in a breach of contract (see “Breach of Contract”) that may lead to termination of the contract and damages. Each party fulfills its obligations specified in the contract according to the party’s interpretation of the contract. Ideally, both parties have the same interpretation. In reality, both parties agree to a reasonable interpretation. That is, the interpretation may differ but generally the interpretation is the same. For example, the contract may state that the vendor will deliver an item on a specific date. The vendor interprets this as an approximate date defined as the week of the specified date. The organization interprets this as the specific date; however, the vendor’s interpretation is acceptable because the item was delivered within a reasonable time period.
Reasonableness Reasonableness plays an important factor in a working relationship between parties. Although a contract is legally binding, a contract is a goal with conditions to reach the goal. In reality, sometimes those conditions are not practical, which is not known until the project is underway. Therefore, both parties must use reasonableness when confronted by conditions that differ from conditions specified in the contract. Let’s say that the vendor is late delivering an item based on conditions in the contract. Technically the vendor breached the contract. The organization has a choice to terminate the contract or continue with the contract. Termination may not be the best recourse because it would defer from achieving the goal of the contract. Although the delay violated the terms of the contract and frustrated the organization, reasonableness dictates that the contract continues and the delay is accepted. Minor breaches of the contract may be overlooked as long as the project is on track and will eventually produce the desired outcome. Major breaches of the contract need to be analyzed carefully before deciding the course of action.
Penalty Clause and Performance Incentives Financial disincentives and incentives can be included in a contract to encourage desirable performance by the vendor. A financial disincentive, commonly referred to as a penalty clause, reduces compensation to the vendor for failure to comply with terms of the contract. For example, the vendor’s compensation will be reduced by 1% per month for each month beyond the deadline. The penalty clause is considered an agreed upon remedy for a breach of contract related to the missed deadline.
Working the Contract
243
A performance incentive, commonly referred to as a performance incentive clause, is additional compensation provided to the vendor for desirable performance as defined by the organization. For example, the vendor’s compensation will increase by 1% per month for each month ahead of schedule. The value of the performance incentive clause should correspond to a financial benefit to the organization, such as reduced staff overtime cost. If there is no financial benefit, then there should not be a performance incentive. That is, if the organization is not saving money or increasing revenue by having the electronic medical records system implemented earlier than the deadline, then there is no rationale for increasing compensation to the vendor for coming in under the deadline.
The Contract Manager There should be one person designated as the contract manager, sometimes referred to the clerk of the works. The contract manager is responsible for ensuring that terms of the contract are being met during the life of the project. The contract manager monitors deliverables, facilitates the vendor working onsite, coordinates resolution of conflicts between the vendor and the organization, and verifies that deliverables meet contract requirements. The contract manager is authorized by the organization to accept deliverables on behalf of the organization. Once the contract manager signs an acceptance memorandum, the vendor’s obligation to perform is completed.
Service-Level Agreement A service-level agreement (SLA) defines the minimum service that a vendor will provide to the organization. Minimum service is measured in various ways, depending on the nature of the service. The vendor is expected to meet the minimum service levels—otherwise, the organization may consider the contract breached and remedies might be implemented. If the service is for equipment, the minimum service is defined by the mean time between failures (MTBF), mean time to repair, or mean time to recovery (MTTR). Mean time between failures is the period when the equipment is expected to be operational. Equipment manufacturers test equipment to determine when the equipment is likely to fail. Manufacturers recommend preventive maintenance to occur prior to the project failure. Mean time to repair is the time that transpires between when the vendor is notified of the failure to the time when the equipment is operational again. Mean time to recovery is similar to mean time to repair, except the operational downtime is likely caused by equipment failure, software failure, or a combination of factors—not simply equipment failure.
Chapter 9 The Importance of Cloud Computing The cloud refers to software and computing services that run on a remote computer and are available over the internet using a web browser or applications on your computing device. The term cloud is new, but not the concept. Actually, it is a twist on an old practice. Remote computing is nothing new. Many organizations operate a data center that provides software and computing services throughout the organization over a virtual private network (VPN). Software resides on an application server—not on local computers throughout the organization. Data resides on a database server in the data center. All computing services are provided by the data center. Cloud computing is provided by a vendor such as Amazon Cloud Drive, Microsoft OneDrive, Apple iCloud, Google Drive, Dropbox, Yahoo Mail, and Netflix. Many cloud computing services offer storage for data and applications. Some offer their own applications such as Microsoft Office and Google Docs, enabling collaboration on projects within and outside the organization. Data and applications are available 24/7 over the internet. The cloud vendor is responsible for maintaining the cloud environment—the data center, servers, network connections, power, and applications. In theory, there is no end to the cloud. The organization can use as much or as little space as necessary—for a fee. The cloud vendor has computing resources ready to meet practically any demand for service. Cloud computing is any pay-per-use service in real-time over the internet that extends an organization’s existing capabilities at a fraction of the cost of expanding a data center. Rather than acquiring new servers, networks, and related applications within the organization’s data center, the organization uses a cloud service. The cloud provider is an aggregator of computing services from other vendors, offering one-stop shopping for an organization. The benefits of the cloud are seen as a viable alternative to operating a data center. It is estimated that every piece of data—voice, data, and images—that is transmitted over a public network at some point is in the cloud.
It Can Rain Too The cloud seems to be the utopia to all the organization’s computing needs. However, there is a downside. Cloud-based applications and data require internet access. No internet connection means no access to applications and data. Furthermore, technical issues affecting the cloud vendor’s data center become the organization’s issues too, because when the cloud goes down, the organization no longer has access to applications and data. Compounding the problem is that the organization has no control over rectifying these issues. Unlike with the organization’s data center where
DOI 10.1515/9781547400812- 009
246
Chapter 9: The Importance of Cloud Computing
the organization controls every facet of resolving technical issues, the cloud provider is responsible for fixing the problem. Organizations that place their data and applications in the cloud are at risk of losing control. Data and applications that reside in the cloud vendor’s data center are out of the organization’s control. The organization must trust the cloud vendor to provide adequate security measures to protect the organization’s data. Furthermore, the organization must determine if the organization itself can be sustained if the cloud vendor denies access to the data and applications.
Governmental Access An organization’s data is always at risk for being legally accessed by governmental agencies. The government typically gains access by serving legal notice to the organization that holds the data. The organization itself receives such notice if the data is held in the organization’s data center. What happens if the data is held at the cloud vendor’s data center? Must the cloud vendor hand over the data without notifying the organization? These are much-debated questions. An organization can expect privacy unless data is disclosed to a third party. This is referred to as the third-party doctrine. A cloud vendor providing cloud computing services can be considered a third party; the government may search the organization’s data with the proper legal papers and issue an indefinite gag order to the cloud vendor, preventing the cloud vendor from disclosing that the government searched the data. Cloud vendors are taking steps to address privacy concerns. Microsoft relocated its cloud servers to data centers in Germany and transferred both physical and logical access to cloud data to a data trustee. This greatly reduces Microsoft’s access to customer data. However, the privacy debate continues. Microsoft reported that within an 18-month period, the government made 5,600 legal demands of Microsoft to provide customer data stored on remote servers. Half required Microsoft not to inform the customer of the search indefinitely.
The Cloud and Data Science Data science, commonly referred to as big data, focuses on making sense out of large amounts of data by finding data patterns that can be used to develop predictions. Data scientists were relatively stifled by technological limitations available to extract, store, process, and analyze huge data sets. The computing power available within an organization lacked the processing power, production environment, memory, and storage to effectively study sizable amounts of data.
The Cloud Services
247
Data scientists hit an electronic wall. The local computing environment was not scalable. Data grew on a magnitude scale monthly while the organization’s computing technology remained stagnant, and resources for big data analysis competed head-to-head with mission-critical applications. They simply ran out of computing resources. Living within the allocated computing technology required big data analysis to be performed in steps—loading and unloading and loading data and applications resulted in reliability errors and performance degradation. Compounding the challenge was the heavily data processing requirements to clean the data for analysis and the need to test and retest fine-tuning data models using the massive amount of data. The cloud radically changed data science by removing the electronic wall that held back the big data revolution that is driving machine-learning and other eye-opening knowledge. The cloud offers practically unlimited scalability, using the most powerful computing environments and technology that is available all at a cost that most organizations can afford. As large amounts of data compound monthly, the organization acquires additional cloud services to store and process the data at an incremental cost without the hassle of investing in new equipment, expanding the data center, and hiring staff. There are cloud providers who have services especially designed to manage big data. They have the capability to acquire, clean, store, and share the data throughout the organization, and the resources to develop, test, and implement data models based on big data. The cloud enables data scientists to quickly build prototypes without worrying about computing assets. Once proven, the full version of the data model can be implemented in the cloud.
The Cloud Services Cloud technology is the latest in the evolution that began with stand-alone computing. Late in the last century, computing devices were connected to servers using client/server architecture. The computing device, called a client, requested services from a remote server over a local area network, known today as an intranet. Services included applications, data, and processing. In client/server architecture, some processing is performed locally on the computing device while processing required by all clients is processed on one or more common remote servers. Client/server architecture is referred to as two-tier architecture, with the client as one tier and the server as the second tier. Multiple-tier architectures are commonplace today. For example, a client accesses a remote application and the remote application accesses a database. This is three-tier architecture: client, application, and database. Client/server architecture has a major disadvantage (Figure 9.1). There are no economies of scale. Investments in new infrastructure and new software licenses are necessary to expand capacity.
248
Chapter 9: The Importance of Cloud Computing
Client
Client
Server
Application Databases
Figure 9.1: Client-server lacks economies of scale
There are three types of services offered by a cloud provider. These are: 1. Software as a Service (SaaS): With SaaS, the cloud provider offers access to applications hosted by the cloud provider using a web browser point of access. The cloud provider is responsible for deploying, managing, and maintaining applications. Examples are Google Apps, Dropbox, and Salesforce. Organizations subscribe to the service. The cost of ownership of applications is covered by the cloud provider. 2. Platform as a Service (PaaS): With PaaS, the cloud provider offers the platform that can be used to develop and deploy applications. The cloud provider offers the organization the operating system and related hardware and network infrastructures to develop and run the organization’s own applications. The organization focuses on building the applications. The cloud provider offers the tools and scalability to enable the organization to quickly respond to changing markets by requesting access to additional resources from the cloud provider. Examples are: OpenShift, Heroku, and Google App Engine. 3. Infrastructure as a Service (IaaS): With IaaS, the cloud provider offers the basic infrastructure building blocks to the organization, enabling the organization to assemble computing resources on-demand. The cloud provider enables the organization to build a virtual data center. Components of the virtual data center can be accessed by the organization as if it were the organization’s traditional data center. However, there is no need for the organization to invest in the data center. It simply pays for components as needed. The cloud provider is responsible for management and maintenance of the physical data center. IaaS gives the organization virtual control over servers, storage, and processing. Examples are:
The Private Cloud
249
Exoscale, Navisite, and SoftLayer. IaaS is sometimes referred to as utility computing because it provides a utility-type service to organizations.
The Private Cloud A private cloud is very similar to the traditional data center architecture in that services are provided only to entities within the organization. There are no commercial clients. An entity is a division of the organization, sometimes considered as internal clients to the group that operates the private cloud. Internal clients don’t have control over the cloud environment. Control resides with the group that operates the private cloud. The private cloud operation creates a virtual environment for each internal client using a pool of computing resources. The group operating the private cloud reconfigures computing resources to respond to the needs of internal clients. The private cloud can be created in one of three ways. The organization can own and operate computing resources that create the cloud—the traditional data center environment. The organization can outsource the private cloud to a vendor where computing resources provided by the vendor are solely used by the organization and not shared with other organizations. A hybrid is another option—referred to as cloud bursting—where primary computing resources are owned and operated by the organization and additional on-demand computing resources are provided by a cloud vendor. Non-sensitive computing assets are moved to the public cloud, freeing private cloud resources for sensitive computing assets. Private clouds are ideal for organizations that require secured processing and storage because the organization is in total control of security. Communication with the private cloud is conducted over private-leased, secured lines with encryption. This offers greater security than is provided in the public cloud—all computing devices in the private cloud operate behind the organization’s firewall and applications and personnel are under the organization’s control. No resources are shared outside the organization. Private clouds come at a cost because there is one client—the organization—who underwrites the entire operation. Economies of scale are limited to internal clients, compared to the many clients associated with a public cloud operation. The organization can allocate computing resources quickly since it controls cloud resources. Using a public cloud may delay allocation because an agreement to use those resources must be reached between the organization and the cloud vendor.
250
Chapter 9: The Importance of Cloud Computing
The Public Cloud The public cloud offers computing resources to the public over the internet to individuals and organizations who do not require the security provided by a private cloud. The public cloud offers computing resources on demand for typically a monthly fee. Computing resources can include expensive sophisticated applications, processing devices, and storage devices that otherwise might be out of the financial reach of the client. Access is seamless from anywhere at any time. Clients pay for servers they need for as long as they need those services. The public cloud offers economies of scale because expensive cloud infrastructure, computing devices, and applications are shared among many organizations. The public cloud vendor can provide state-of-the-art centralized operations with redundant architectures and environments because costs are leveraged among its client base. Redundancy enables the vendor to balance loads, which provides an expected level of services regardless of demand. Multiple computing devices and cloud operation centers located in multiple states and countries guarantee continuous availability of the cloud to all clients as long as the client has internet access. The cloud vendor accepts the operational risks associated with the cloud. It ensures that services are available; applications and operating systems are updated; and computing resources are maintained to meet the client’s and regulatory requirements. Furthermore, the cloud vendor incorporates sophisticated security measures that might be out-of-reach in a private cloud environment. The cloud vendor also has certified full-time staff with skill sets that that may not be economically available to organizations that operate a private cloud.
Hybrid Clouds A hybrid cloud is a combination of a private cloud and a public cloud. The private cloud is used for sensitive processing and the public cloud is used for non-sensitive processing. Access to both clouds is seamless by using a browser. Users gain access through a browser-based portal that redirects requests to either the private or public cloud. A key benefit of a hybrid cloud is the private cloud can be used to satisfy regulatory requirements for secure processing and storage of data, while the public cloud provides the flexibility to meet growing demands. There are a number of ways to implement a hybrid cloud. An organization can use two cloud vendors to supply the cloud—one for the private cloud and the other for the public cloud. Alternatively, a cloud vendor can provide a complete service where the private cloud computing resources are not shared and the public cloud computing resources are shared. Still another option is for the organization to internally provide a private cloud and rely on a vendor to provide a public cloud. The drawback to implementing an
Why Implement a Cloud?
251
internal private cloud is limited scalability. The organization would need to acquire more computing resources to expand. A cloud vendor needs only to reallocate existing resource to the private cloud (Figure 9.2).
Figure 9.2: A cloud vendor offers flexibility
Why Implement a Cloud? The cloud offers many advantages for an organization that is growing and whose computing resource requirements fluctuate. The cloud provides the operational agility to meet growing demands with a sound economic foundation. –– Easy to increase computing capacity. –– Scalability both up and down. –– A competitive advantage by increasing/decreasing computing capacity as needed without incurring long-term financial obligations. –– Taking advantage of the latest technology without the burden of acquiring scarce resources. –– Reduced time to market. Start-up time for a new initiative might require nine months to acquire computing resources. The cloud offers computing resources within days. –– Disaster recovery. The cloud provider has the computing resources and expertise to handle recovery in a disaster. A cloud provider typically has replicated cloud data centers throughout the United States and outside the country.
252
Chapter 9: The Importance of Cloud Computing
–– Frees real estate. The cloud is off the organization’s premises. Space used for computing resources can be reallocated for other purposes. –– No upfront investment in computing resources. The organization pays for computing resources using a subscription model. –– No maintenance. The cloud vendor takes care of software updates and security patches as part of their core business. –– No longer an information technology organization. Information technology has become a necessary part of the organization’s operation, although information technology is not the organization’s core business. The cloud shifts information technology to a cloud vendor whose core business is the cloud. The cloud vendor’s investment in cloud technology is an investment in the cloud vendor’s core business. –– Shared resources. The cloud enables the staff to collaborate in real-time, increasing productivity. –– Balanced work schedule. The cloud enables the staff to collaborate from anywhere in real time over the internet. Cloud vendors also offer cloud apps that can be used on mobile computing devices, giving staff access to the organization’s computing resources while on the go. –– Reduces the carbon footprint. Rather than the organization maintaining computing resources that have a large carbon footprint, the organization shares those computing resources with other organizations in the cloud. –– Staff focuses on business. In many organizations, the information technology staff account for 25 percent of the employees. The organization frees up headcount by moving to the cloud. Fewer information technology staff are required. –– Hidden security. Staff can work with cloud-based applications that automatically save files to the cloud rather than on a local computing device. Files are never lost even if the local computing device crashes.
Why Not Use the Cloud? The cloud is less than a perfect technological solution to computing. Here are common disadvantages: –– Connectivity. The cloud internally and externally depends on network operations. Internally, the cloud provision connects its data centers located around the world over a network—the same public network that is used to connected everyone else. The organization connects to the cloud over the same network. Any network issues are also a cloud issue. –– Traffic volume. The public network is a multi-lane highway that can handle very high volumes of traffic. An off-ramp is a narrower roadway to a cloud provider’s site. A traffic jam occurs unless the cloud provider manages the load to its sites as
Why Not Use the Cloud?
––
––
––
––
253
demand for its cloud services increase. Failure to do so results in slow response time, which is something an organization doesn’t expect from the vendor. Software incompatibility. The presumption is that applications run on all computing devices, which is not necessarily the case. An organization may be using older applications and databases that are not compatible with computing resources offered by the cloud provider. The organization’s custom applications and thirdparty applications are built using a framework such as Java, C++, MySQL, and Oracle that require frequent upgrades, both on computing devices accessing the application and computing devices running the application. Cloud vendors are noted for installing upgrades faster than the organizations that use their services. Some upgrades require upgrades to both computing devices. Failure to upgrade prevents access to the application. Likewise, some upgrades may not be compatible with an organization’s applications, preventing the application from running in the cloud. Support. An advantage of using the cloud is to offload responsibility for most of the organization’s computing responsibilities to the cloud vendor. The presumption is that it is economical for the cloud vendor to hire specialists since the expense can be allocated to other customers who also require those services. However, legacy applications that run an organization can become problematic since other organizations may not require the same specialist to maintain the application. The cloud vendor may refuse to accept the application or charge a premium to accept it. Security. Responsibility for providing cybersecurity moves from the organization to the cloud provider. The cloud provider has multiple data centers around the world, each connected to customers and to each other. If a security gap exists in any of the data centers or connections, then it is highly likely that the vendor’s entire infrastructure is susceptible to the breach. An organization typically has one or a few data centers, decreasing points of failure compared with the cloud vendor. Dependency. By switching computing responsibility from the organization to the cloud vendor, the organization’s sustainability is dependent on the sustainability of the cloud vendor. The organization cannot change cloud vendors quickly. If the relationship between the organization and the cloud vendor breaks down, the organization needs to have a contingency that enables the organization to move its cloud business to another cloud vendor with minimum interruptions. The breakdown in the relationship may not have anything to do with providing services. For example, the cloud vendor may be taken over by another cloud vendor, which might result in the organization sharing computing resources with competitors.
254
Chapter 9: The Importance of Cloud Computing
Mitigating Risk Although computing risks seem to be offloaded to the cloud provider, the organization remains at risk. The organization remains exposed. However, steps can be taken to mitigate risk by carefully selecting a cloud provider. Here are steps that need to be taken (Figure 9.3).
Risk mitigation Encryption
Termination
Demarcation
Costs
Data replication
Security
Data ownership
Limitations
Application ownership
Bandwidth
Service level agreement
Figure 9.3: Cloud risks to mitigate in a SLA
Encryption. The organization’s data must be encrypted at all times. AES-256 encryption is the most desirable because it has never been broken. Demarcation. The cloud is a multi-tenant environment. All clients can share resources. Therefore, the cloud provider must demonstrate how the organization’s applications and data are segregated from other customer’s applications and data. There should be an electronic or physical wall between clients. The cloud provider may cage-in computing devices for each client and require a separate key to unlock the cage. Data replication. The cloud provider must show the organization how data is replicated and restored as part of the organization’s and cloud provider’s data recovery plan, should the cloud provider’s facilities experience a catastrophe. Data ownership. Make sure it is clear to the cloud provider that the organization owns the data and the format of the data. The cloud provider is simply providing storage and applications to manipulate the data.
Why Not Use the Cloud?
255
Application ownership. Make sure it is clear who owns the rights to the application. Let’s say that the cloud provider licenses a SQL database management system (DBMS). Queries are used to interact with the DBMS. Who owns the queries? This is especially important if the cloud provider’s staff writes queries for the organization. The sustainability of the organization may depend on those queries. Termination. Negotiate terms of terminating the relationship prior to engaging the cloud provider. Termination terms clearly define who owns what and the process for moving the organization’s owned resources to another cloud provider. Furthermore, the terms of termination should clearly explain how applications and data residing on the cloud provider’s computing devices will be destroyed after they are moved to another cloud provider. Termination terms also identify the conditions under which the relationship can be terminated. If and when the time comes to move, simply execute terms of the termination agreement. Costs. Identify all costs associated when engaging the cloud provider. There should be no surprises or hidden costs. Costs are initial setup costs; ongoing costs; maintenance costs; change costs; and termination costs. Initial setup costs involve expenses to transfer the organization’s computing operations to the cloud provider. Ongoing costs are usually included in the monthly fee. Maintenance costs involve routine upgrades to applications and databases. Will the organization be charged a fee for moving data from the cloud to its own facility? Change costs involve non-routine enhancements to the cloud services, such as new applications, new databases, and services not covered in the original agreement. It is important to come to terms about these changes before they occur so there are no surprises at the time of the change. Termination costs are expenses associated with termination of the agreement, including transferring applications and data to another cloud vendor. Security. Be sure that the cloud provider upgrades security to meet the organization’s requirements. The organization should set minimum security requirements and not waiver from them. Limitations. Ensure that the cloud provider has the computing resources and staff that they promise in their sales presentation. Trust but verify. The organization is buying experience and the cloud provider’s organization should reflect that experience. Years of operation are not the only criteria to consider. The cloud provider’s infrastructure must reflect current technology. Bandwidth. The cloud provider must have sufficient bandwidth today to meet demand for the next five years. Think of bandwidth as highway lanes. There should be sufficient lanes on the electronic highway—both the off ramp from the internet and internal highways—to maintain an acceptable response time. Many cloud providers are in a Catch-22 situation. Do they invest in a super-speed infrastructure hoping to attract clients or build the super-speed infrastructure as they bring on clients? The organization should be looking for a cloud provider who has the financial resources to build a super-speed infrastructure first. How much bandwidth is needed? There are
256
Chapter 9: The Importance of Cloud Computing
tools available such as the Microsoft Assessment and Planning Toolkit that help an organization assess its needs. Service-level agreement. The service-level agreement defines the relationship between the organization and the cloud provider. It contains expectations, limitations, liabilities, responsibilities, termination, fees, and other understandings that govern the relationship between both parties.
The Cloud Life Cycle The cloud offers many options from à la carte to full-service. The cloud life cycle process helps to decide which options to choose. There are eight steps in the cloud life cycle process. 1. Define the purpose. Decide the organization’s requirements first. The cloud can meet a variety of needs once those needs are identified. An organization experiencing a surge can use the cloud to quickly expand its capabilities practically overnight. An organization that hasn’t kept pace with technology can use the cloud to become current with technology to operate the organization. Still other organizations use the cloud to expand services to customers. For example, Adobe produces many creative applications, originally selling each product separately. The cloud is now used to provide customers access to all their creative applications online for one monthly subscription fee. 2. Define the hardware. The cloud vendor offers a variety of hardware to run an organization’s applications, data, and computing operations. 3. Define storage service. Storage is the place in the cloud in which you house applications and data. Vendors offer different services that are optimized for backing up applications and data or archiving it. 4. Define the network. Decide on the requirements for communicating with the cloud. Factors to consider are security; amount of network traffic generated by the organization, such as data, voice, and video; and transfer speeds. 5. Define security. Security factors are authentication, authorization, encryption at rest, and encryption in transit. 6. Define management processes and tools. Management processes and tools are used to give the organization control over its cloud assets. These include monitoring activities and managing applications, data residing in the cloud, and developing and deploying applications to the cloud. 7. Define building and testing requirements. The cloud is more than a remote data center. The cloud can be the organization’s computing environment within which developers build and test applications. Identifying the organization’s needs to continue creating and maintaining applications in the cloud helps to select the best vendor and services to use for the organization.
Cloud Architecture
257
8. Define analytics. Analytics are used to monitor operations and provide decision support information to assist management in making decisions. Vendors are able to provide an assortment of analytical tools that can provide instant results and can respond to any query for information. The organization must identify its analytical requirements when selecting a vendor.
Cloud Architecture The cloud architecture is a service-oriented architecture where the focus is for the cloud vendor to provide a wealth of services to customers. Each customer picks services that augment its organization’s operations. Customers pay only for services that they use. The vendor’s objective is to identify needs and provide services to meet the needs of its customers. The vendor then leverages the costs of development, operations, and maintenance of each microservice across customers who subscribe to the microservice. A key element of the cloud architecture is microservices used to develop an application (Figure 9.4). The microservices concept has been seen elsewhere in computing such as with the Unix operating system and web services. The basis of microservices is to create self-contained mini-applications called services that do something very well. Each performs a granular function that can be assembled with other microservices to form an application (Figure 9.4).
Application
Microservice
Microservice
Microservice
Microservice
Figure 9.4: Microservices in a cloud architecture
Think of a microservice as an event handler. An event handler is a common structure in a Windows-like operating environment in which there are many events happen-
258
Chapter 9: The Importance of Cloud Computing
ing at the same time. An event handler is a self-contained function that responds to a specific event. For example, in a Windows-like operating environment there are multiple applications appearing on the screen. When the user resizes the window of an application, all other applications need to adjust their screen to accommodate the change. Each application has a function called an event handler that contains code that resizes its window. Microservices are like event handlers, except the microservice is outside the application and is called in response to events occurring within the application or with any application that uses the microservice. Let’s say an application needs to process credit card payments. Instead of embedding code that processes credit card payments into the application—and other applications that need to do the same—a microservice that processes credit card payments is created and is used by applications that need to perform this task. Developers need to call the microservice, provide it with necessary information, and process data returned by the microservice. Each microservice is developed independently of other microservices to meet the needs of vendors in the cloud community. However, each has an application programing interface (API) that is shared with developers. The API describes the microservice function; information that is needed to perform the function; any codes to turn on or off sub-features of the function; instructions on how to call the microservice; and instructions on how to interpret values returned by the microservice. APIs, Fintech and Blockchain The flexibility offered by open APIs and microservices has helped spur the rapid growth in developments in the financial technologies (fintech) arena. Companies like PayPal whose API enables safe payments worldwide utilizing their technology to tie into third party applications large and small. Today, hundreds of startups are providing services that interface with the customer in new and unique ways using big data to tap into their needs and providing useful services which are driving innovation in the financial fintech sector. These startups are offering developers best of breed technology, saving time and money often beyond the expertise of a project development team. The fintech area has caught fire, combining with developments in cybercurrencies, utilizing distributed ledger (blockchain) technology, creating alternative currencies such as bitcoin, Ethereum and others. But perhaps more importantly, blockchain technology is likely to support a new breed of innovations which utilize the immutability of the blockchain to enable smart contract based applications that do not require expensive and time consuming third party support and maintenance as the blockchain is by design, irreversible.
The microservice is maintained by a development team. Upgrades are made usually without the knowledge of developers who use the microservice unless the change affects the API. For example, a change in credit card processing is implemented immediately and brings all applications that use the microservice current. One change instantaneously occurs in many applications. Furthermore, a microservice may be assembled from other microservices. For example, processing a credit card requires sub-processes such as authorizing access to perform the process; access secure information relating to the purchase from a
Cloud Architecture
259
database; and updating activity logs. Each of these might be a microservice that can be accessed by other applications aside from processing a credit card. The idea is that a microservice can be called from anywhere and from any application that is authorized to use the microservice. There is a tendency to associate microservices with a vendor, but that’s too narrow a scope to view microservices. Keep in mind that the cloud can be a private cloud, public cloud, or a hybrid of both public and private. An application can be configured to use microservices available on a private and public cloud—and clouds offered by different vendors. Microservices must have a product owner who is responsible for maintaining the microservices and upgrade them based on feedback from developers. Microservices must be organized within a library management system, making it easy for developers to locate microservices that can be incorporated into their application.
Serverless Computing Another element of the cloud architecture is serverless computing. When developing and deploying an application, the organization needs to consider the computing resources necessary to run the application. Computing resources include various hardware and software components. At times, developers are limited to building an application that can run on the existing computing resources. Other times, developers have to estimate computing resources needed to run the application, and then the organization needs to allocate the finances to acquire those resources. Furthermore, the organization has to allocate computing resources among applications. The cloud practically eliminates the challenges of building an application to run in the organization’s computing environment by giving developers the freedom to design an application without consideration of computing resources. In another words, developers and the organization are working with serverless computing, which is computing with a virtually endless availability of hardware and software to run an application. Yes, applications require computing resources, including servers. However, the cloud vendor has what appears to be all the computing power an organization would ever require. Therefore, it seems as if the cloud is serverless. The cloud vendor offers computing resources on an as-needed basis. Let’s say an application requires heavy data crunching, but only occasionally. The organization pays for the computing resources for those moments. There is no idling time. The organization no longer needs to acquire the computing power to crunch the data. Computing power is acquired just when it is required—and the acquisition is automatic once the application is configured for the cloud. The operation switches to the needed computing resources behind the scenes. Developers and the organization focus on building the application using a blend of custom code and microservers without concern over limitations of computing
260
Chapter 9: The Importance of Cloud Computing
resources. The cloud environment ensures that the necessary computing resources are available when required by the application. Configuration of the application for the cloud takes care of the fulfilling the computing requirements for the application.
DevOps There are many scenarios that may be used inside or outside of a relationship with a cloud provider. The methods used may be the most important factor in your decision on a cloud vendor, multiple cloud vendors, or hybrids. How are you to run your sales organization, your backend services such as accounting and finance, your supply chain, your web presence and customer outreach, and your development needs on all of the above? Who provides these services? Development operations (DevOps) is the process used in the cloud to eliminate barriers between applications’ development and the operations that run the applications. DevOps replaces the traditional development and delivery methods that require many processes and staff who typically work in silos that impede the agility required for fast, economical responses to the organization’s demands. This was commonly referred to as the waterfall method, in which one silo passed along the work to the next until the last silo deployed the application. DevOps automate many of the processes required to move an application from development into production. Developers move applications into the cloud using DevOps tools directly. The cloud provider may then manage the process of functional and nonfunctional, unit and iterative testing (continuous testing); version control; configuration management; change management; and other functions necessary to deploy the application. At each stage of implementation, the application is either returned to the previous stage if there are issues with the application or pushed forward in the deployment process. For example, the cloud returns the application to the developers if the application fails to pass a test. In doing so, DevOps refocuses the organization on developing the application while the cloud is focused on managing the process. The DevOps process enables developers to build code and move it into building an application followed by automated testing, and then automatic deployment where the application is immediately used. The operations portion controls image management, rolling upgrades, security configuration, patch management, and environment configuration and deployment. The DevOps process brings a synergy of development staff and operations staff by forming a uniform process across silos, removing barriers that traditionally exist in the development and operational environments. With the developer figuratively pressing the button in DevOps to test an application, testing then occurs that identifies policy issues, coding problems, quality problems, and issues regarding security. Test results are returned to the developer, who
Cloud Architecture
261
then modifies the code accordingly to address those problems. Results are returned by the DevOps process. Before DevOps, the development team and operations team worked relatively independently, resulting in risky deployment of new applications because of a lack of collaboration and synchronization. This led to increased costs and challenges tracking changes to applications. DevOps enables both teams to work as one team, each looking to produce a quality application. There became a continual feedback cycle that uses automated DevOps processes to help the team monitor and share information about development. The entire process from development through operations becomes measurable, and any delays in the process clearly highlight the breakdown in the process, thereby making the delay actionable. Key to DevOps is a lean methodology that automates hand-offs between development, operations, and customers. Prior to DevOps a “customer,” internal or external, enters a ticket for a change to an application—perhaps through the help desk, which is part of operations. The operations team records and sends the request to the development team who works on the changes. The upgraded application is then sent to the testing team. The testing team needs operations to set up the testing environment. Testing also reviews security requirements, quality control, and compliance with the organization’s policies. Results of testing are then sent to the development team. The application is then modified and returned to testing if changes are necessary. Otherwise, the application is turned over to operations to begin the deployment process. There are too many gaps and hand-offs where details can be overlooked. Furthermore, delays occur because each group knows about the application when it receives the application. DevOps reduces the number of manual hand-offs by making all stakeholders aware of the status or the project beginning with the initial change request. Tools are used to automate the process where possible. In some situations the tool performs the process and in others the tool enables the team to efficiently perform its role. For example, DevOps typically produces real-time reports that help the teams improve the process. These include change fail rates that determine the rate at which changes fail to achieve the desired goal; mean time to recover (MTTR) that calculates the average time to recover from a failure; and lead time for change, which is the elapsed time from the time the request for change is received and the time the change is fully implemented. DevOps uses selective automation to optimize the development and operations process. The goal is to automate the process of developing applications and getting the applications deployed so customers can use them. Each phase of the process is automatic to track the application and objectively measure the progress, giving feedback to both the development and operations teams who then improve the process. The DevOps process provides staff with tools needed to optimize their role in the development and operations process.
262
Chapter 9: The Importance of Cloud Computing
It is smart to begin adoption of DevOps with a pilot application that can be used as a proof of concept. This is often done in coordination with a cloud provider. The cloud vendor provides the tools and environment to implement the DevOps process. The pilot application uses a lean development and operations team of approximately ten staff compared with an estimate of thirty staff members for implementing a typical application. The goal is to demonstrate that the concept of DevOps is a viable option for the organization. Aspects of the DevOps process are proven and there is no need for the staff to reinvent it since they can leverage existing solutions. The pilot application also identifies training needs for the developers and the operations staff on how to use the DevOps tool to automate their processes. Once the pilot application has successfully been developed and implemented using the DevOps process, the organization makes a conscious effort to break down silos and bring the entire staff onboard using the DevOps process. In its purest form, all applications going forward must use the DevOps process without exception. Applications should be designed around microservices. Rather than focusing on designing a complete application, developers should be focused on designing microservices that provide functionality that can be utilized by many applications.
The DevOps Maturity Model Not all applications are suited for the cloud. The DevOps maturity model helps to identify applications that are appropriate for the cloud. The DevOps maturity model is used to categorize applications based on objective criteria that are organized into five levels. These are: –– Level One: Ad-Hoc Communication. There is no automation; no governance of the process; and no quality standards exist. –– Level Two: Controlled Communication and Collaboration. Automation is ad-hoc without a formal automation process. There are no governance standards and quality management is ad-hoc with no formal quality management plan in place. –– Level Three: Standard Communication Process. There is a standardized automation process in place and a standardized form of governance over the process. However, there are no quality standards in place. –– Level Four: Communication Metrics Exits for Improvement. Automation metrics are in place to measure progress in developing and deploying the application with application goals. There are also metrics to measure the effectiveness of governance over the process, and quality metrics are in place to measure improvement performance. –– Level Five: Constructive Communication Environment, Tools, and Processes. Optimization methods are in place to maximize throughput, govern the process, and provide continuous quality improvement.
Compliance
263
Compliance Depending on the business, organizations are governed by countless regulations. In the US, healthcare organizations must comply with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) that requires the organization to protect health information. Public corporations must adhere to the Sarbanes–Oxley Act (SOX). Organizations that use information from European citizens must adhere to processes defined in the General Data Protection Regulation (GDPR). Failure to adhere to regulations exposes the organization to fines and possibly litigation. The organization’s data is in the cloud. It is critical that the cloud provider has the necessary measures in place to ensure that regulatory requirements are met. The organization needs to perform a detailed walkthrough of processes available in the cloud to provide the degree of compliance required by regulators. In addition to protecting data, the cloud provider must have tools in place for internal auditors and regulatory auditors to use to audit the organization’s data to ensure regulatory compliance. The cloud provider and the organization must make sure data protection is compliant and both can prove it to regulators.
Cloud Security The thought of placing the organization’s mission-critical information and applications in an unseen, remote location called the cloud is frightening. All the confidential and innermost data required to run the organization seems to be somewhere in space—obviously not space, but in remote servers owned and operated by the cloud provider. The reality is that the cloud is more secure than the organization’s own facilities that house data and applications. The cloud provider has the resources and motivation to employ the latest security measures and to ensure that those measures are updated (at times, hourly). Many organizations see security as a necessary evil that is secondary to its business. This attitude usually exposes the organization to potential security faults. “Trust, but verify” is the foundation of using any vendor. Trust that the cloud provider has the best security defenses in place, but also verify this fact before a cloud provider is engaged. Executives of the organization remain liable if a security breach occurs, even if it occurs in the cloud. Here are some data breaches: –– Denial of service. Denial of service occurs when services are cut off or in some way limited, often when a hacker floods the cloud’s IP address with requests more than the cloud can process, resulting in decreased response time. The cloud provider must explain how it defends against such an attack.
264
Chapter 9: The Importance of Cloud Computing
–– Encryption break-in. Breaking into an encrypted file is difficult—however, older encryption algorithms could be defeated. It is important to ensure that the cloud provider uses the latest encryption algorithms for files at rest and in-transit. –– Physical theft. By now you realize that data and applications don’t reside in a cloud but on a server located in the cloud provider’s data center. Visiting the data center provides the opportunity to assess the physical security policies and practices of the vendor. –– Ransomware. Ransomware is software that prevents access to applications and data (denial of service) by using encryption. Only the hacker has the ability to decipher it. –– Data theft. Employees from the organization and from the cloud provider have access to the organization’s data. Assess what steps are employed by the cloud— and within the organization—to prevent such theft. –– Vulnerability exploitation. Operating systems, applications, and development tools are not perfect when it comes to security. Hackers are aware of this and exploit these vulnerabilities to gain access to information. The cloud provider— and the organization’s applications—must be using the latest products that have removed these vulnerabilities. The old reliable sales management system, for example, may have known vulnerabilities that haven’t been addressed. The cloud vendor may suggest that these be addressed or replace the system with new technology.
Levels of Security A cloud provider typically has data center facilities in one or more regions, possibly in a region of the United States or in countries outside of the United States. The organization can select the region for its applications and data. Furthermore, the organization can have different regions used for specific applications and databases. The organization can add a level of security by encrypting data on the client-side, where only the organization can decipher the data. This is in addition to encryption provided by the cloud vendor in-transit and at-rest in the vendor’s facility. Even if data is intercepted, encryption makes the data useless to the hacker who gains access to this data. Application-level security focuses on preventing unauthorized access to the application. The organization and the cloud provider should have logs that indicate when the application is accessed and the IDs and IP addresses that have access. Logs should also indicate all writing and reading of data with specific information to trace who had access or at least what computing device was used. Another important security implementation is for the cloud provider to have application programming interface (API) logs. The cloud offers microservices that can be accessed from practically anywhere in the cloud. API logs record information about
Cloud Security
265
when the microservice was called and the application that called it. This enables the security staff to trace access back to the application if it was hacked. Data import and export logs should also be in place by the cloud provider to record any large movement of data. Ideally, the cloud has an alert system that calls attention to unusual transfers of data. The security staff can immediately monitor and investigate the activity and possibly halt the transfer. Similar alerts should occur when there have been a set number of failed attempts to access the application or data. Alerts should also be sounded when access is attempted from an unexpected IP address. Alerts trigger a real-time response to a potential hack. Object-level security is another area to focus on. Objects are a collection of data in a database. Security concerns are at the database level and at the data level. Database-level security centers on access of the database, while data-level security looks at access to specific types of data within the database. In addition to encryption, data can be limited by views of data. Based on authorization, the database management system can assemble virtual tables of data from tables in the database. Platform-level security is a security process that prevents unauthorized access to the computing device such as computers, network services, application servers, and database servers. Without access, data and applications are secured. It is important to ensure that the cloud provider offers and implements all security levels to the product and the organization’s applications and data. Critical to successful security of the cloud is the organization’s ability to manage security access. As employees are hired, terminated, and transferred into new roles, the organization must modify security access to the organization’s computing resources. Some resources are internal and others are on the cloud. The cloud provider should offer a way for the organization to change security access settings for cloud resources quickly and in coordination with changes to internal security settings. Ideally, changes to the internal security settings should flow automatically to the cloud security settings. An option to consider is acquiring security services from a third-party other than the cloud provider. Third-party vendors offer security services across cloud providers. This is a valuable service to consider, since organizations tend to use multiple cloud providers. These vendors have the knowledge to leverage the assets of each cloud provider to the advantage of the organization.
Chapter 10 Decision Support Systems, Data Analysis, and Big Data In this chapter, we will examine how companies use data to help make important decisions. Today, companies can take advantage of data collection schemes and software to analyze and support their decisions. A decision support system is an application (or, more likely, applications) that analyzes data and presents data as information that is used to make decisions more easily than traditional methods of decision making. Decision support systems are just one piece of a puzzle that is used by information technology professionals, analysts, and managers to do their jobs today. A decision support system is beneficial when complex data must be considered to reach a decision, such as understanding how a consumer makes a decision. Decision support systems have less of an impact on relatively simple decisions, such as setting a departmental budget for the next fiscal year, though in large organizations that may not be the case. Decision support systems have seen rapid changes due to many factors, such as revolutionary improvements in computing power, data capture, data storage, data categorization, data mining and analysis, machine learning, the internet, the cloud and in particular the smartphone, driving changes in presentation of data, and as a result, vastly improved business intelligence (data presentation) applications. As technology speeds forward, it enables innovations to those organizations that are in a position to take advantage. Companies such as Google, Facebook, Salesforce, Amazon, and the major cloud players have realized the importance of gathering data for marketing, utilizing advances in artificial intelligence, data analysis, and other technologies to influence the behavior of the buying public while modeling behavior or systems to gain advantage and customer acquisition. Making a decision in today’s business environment is challenging to say the least. Consumers demand customized, diverse products, and quick delivery. Consumers are less loyal and quick to move to a competitor if demands are not met. Technology has brought down traditional barriers of meeting customer demands. Competition is strong and global. Transactions occur on-demand, in real-time, 24/7, giving consumers unheard of power. The complexity of the business environment is compounded by growing government regulations (and de-regulation) cyberattacks, outsourcing innovations—new products and services that increase obsolescence. Decision makers are faced with building an organization that is sustainable and responsive to the marketplace in a business environment that is generating information overload—but at the same time change—both for the decision maker and for customers. Making decisions is further complicated by changes in the decision-making environment that may occur continuously. By the time facts are gathered and a decision DOI 10.1515/9781547400812- 010
268
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
is made, variables affecting the decision may have changed, possibly invalidating the decision. In addition, there is time pressure on the decision maker, forcing the decision maker to come up with a good enough decision rather than an optimized decision. In many cases, there is insufficient or too much information that can impede the decision-making process. Furthermore, analyzing a problem requires time and money, both of which may be unavailable to the decision maker. Business organizations frequently experience a gap between current performance and desired performance. The gap is caused by market factors that place pressure on decision makers to meet market demands and take advantage of market opportunities, forcing decision makers to develop a real-time response. Decision support systems—using the business pressures-responses-support model—provide the analysis that enables decision makers to make timely or informed decisions. Decision support systems extend the decision maker’s capability to make decisions by using simple and complex computer applications to make sense out of structured, semi-structured, and unstructured decision situations. A decision support system applies various analytics to data from internal and external sources and output from other computer systems to add to the decision maker’s—and the organization’s—knowledgebase. The goal is to transform data and information from other systems into a foundation for decision making. Although we tend to view a decision support system as one system, a decision support system is an umbrella for a number of subsystems. For example, the data management subsystem uses a database management system to extract data from production, finance, and marketing systems—all systems throughout the organization. Requests for data can be made interactively by the decision maker through a management interface or through modeling and other subsystems.
The Decision Process Decision making is a process of choosing among two or more alternative courses of action in order to achieve a desired goal. Some decisions are referred to informally as double door decisions. This is where making the wrong decision doesn’t have a material impact because the decision maker can always backtrack and make a different decision. For example, taking the wrong fork in the road doesn’t have a material impact on the trip because you can always return to the fork and then travel in the other direction. However, other decisions do have a material impact and therefore the decision maker must evaluate many options and consult with experts before making the decision. A wrong decision may not be easily reversed. There is always a tradeoff between accuracy and speed when making a decision. A fast decision may result in less accuracy and result in a detrimental effect on the goal. A slow decision may result in a more accurate decision but if too long is taken, the situation may have changed, making the decision inaccurate. Some decisions are
The Decision Process
269
considered “in and out” types of decisions. These decisions are made quickly but can be reversed quickly too, enabling the decision maker to try a different approach to the problem. The reversal has minimum impact on the organization and decision makers are encouraged to avoid elaborate analysis that delay the decision. For example, an emergency room physician makes a quick decision to stabilize an unstable patient, then sends the patient to the inpatient unit where another physician has the time to diagnose and treat the underlying condition that caused the patient to become unstable. The emergency room physician makes decisions that are fast, effective, and good enough. The other physician makes a decision that is optimal to treat the problem. The decision maker must properly perceive the problem, solution, and goal—otherwise, the effort to reach a decision will be misdirected. This can be challenging because the decision maker needs to sift through the noise to identify the problem. For example, a manager decides to call IT to get help to format an Excel spreadsheet that contains data that the manager copied from reports generated by the accounting system. Formatting is not the problem. The problem is that the accounting system isn’t generating the report needed by the manager. The decision maker needs to recognize the real problem. Making a decision that addresses the real problem (generating a report from the accounting system) makes the noise (formatting the spreadsheet) go away. The decision maker must also be able to deconstruct the problem into simpler sub-problems. This reduces the complexity of the problem, leading the decision maker to make several small decisions that collectively solve the problem. The decision maker must also have the necessary background and the know-how to make the decision. This might involve having the skills to personally make the decision or call in experts that have the unique capability of making the decision. Managers use a four-step process to reach a decision. –– Problem definition: Clearly define the problem. –– Abstraction: Real-life problems are complex. There is a tradeoff between the cost of analyzing all aspects of the problem and the benefit of doing so. A common practice is to strike a balance by using the abstraction process. The abstraction process simplifies the problem by making reasonable assumptions about elements that influence the problem. –– Model building: A model is built (by a person with appropriate expertise) to describe the problem as it relates to the real world. The model identifies variables that influence the problem and relationships among those variables. There are different types of models and each provides a specific kind of result. –– Identifying options: Options are generated to address the problem. –– Selecting the best option: Options are compared and the best option is used to solve the problem. The best option considers risk. Risk is the exposure to harm
270
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
or loss as the result of the lack of precise knowledge (commonly referred to as uncertainty), and is measured using probability. –– Change management: Once the best option is selected, the change to solve the problem must be implemented. Decision support systems facilitate the decision process using a range of information systems and collaboration by providing data management, knowledge management, and analytical support available anytime, anywhere. (1) Decision Modeling One of the harder parts of the process in decision modeling is essentially taking a real-world problem and transforming it into the appropriate type of model that can be solved. Decision modeling is analogous to quantitative analysis and is a branch of managerial science (operations research). There are two types of models: deterministic models, for which the data that is to be used is already known (determined) and probabilistic models. There is a wide array of models used to solve business problems. (2) Deterministic Models Deterministic models are used to solve problems where you have the data needed to solve the problem. Generally, in deterministic modeling, you are converting word problems to equations, like you did in high school math. Linear programming involves setting up systems of linear equations (multiple equations) to solve a problem with unique solutions where the solution may be a maximum or minimum. For instance, the solution may be to maximize profit or to minimize costs. The possible solutions are where the lines intersect. Integer programming, though similar to linear programming, only allows for solutions that are whole numbers. Integer programming also includes solving problems that involve binary solutions. Nonlinear programming is another modeling solution that, unlike linear programming, allows for nonlinear equations to be added in and to be solved for. (3) Probabilistic Models Quite often when you are trying to model your project, you run into a situation where you are simply missing one or more variables. So, the problem requires using a probabilistic model to solve the problem. Examples of probabilistic models include, simulations, queuing, and forecasting problems.
Data and Business Analytics
271
Business Intelligence Business intelligence is an umbrella term that describes key information that decision makers use to make decisions. It transforms data into information and knowledge used to make decisions. A business intelligence system provides the decision maker with a dashboard that enables data manipulation, data mining, and analyzing data in an organization’s data warehouse. In addition, a business intelligence system helps the decision maker monitor and analyze the organization’s performance.
Data and Business Analytics Data is the foundation of a business intelligent system and is collected and electronically stored throughout the organization during normal business operations. Data can logically be organized into a data warehouse. Think of a data warehouse as the electronic organization of all databases in an organization that can be accessed by the business intelligence system. The user interface—usually in the form of a dashboard—can access and manipulate data stored in the data warehouse in an effort to make effective decisions. The business intelligence system monitors actual performance and compares it to performance goals. The result is displayed on the dashboard. Business analytics is a component of a business intelligence system that provides the decision maker with three categories of analytics: –– Descriptive analytics: Descriptive analytics in the form of scorecards and reporting to tell the decision maker what happened or is happening. –– Predictive analytics: Next is predictive analytics in the form of data mining and other tools to tell the decision maker what will happen and why it will happen. –– Prescriptive analytics: Last is prescriptive analytics in the form of simulation, optimization, and decision modeling to tell the decision maker what should be done and why it should be done.
Technology Supports the Decision Maker Decision making is the process of choosing one option from among two or more options for the purpose of attaining a goal. A decision can be based on principle or pragmatism. For example, we should not conclude a task until it is perfect—it is a matter of principle. However, we may never complete the task. So there is a trade-off. We tend to be pragmatic when making decisions and completing tasks that are good enough to be effective. Effectiveness is defined as tolerance, determined by the
272
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
decision maker before the decision is made. The way decisions are made is a matter of style. Here are competing styles of decision making. –– Heuristic decision: Heuristic decision making is the most commonly used style and is when the decision maker focuses on one aspect of a complex problem. This works well for most circumstances, but tends to fail as circumstances deviate from the decision maker’s experience. –– Analytical decision: Analytical decision making is most time-consuming because it requires the decision maker to identify all factors that can influence the decision and calculate the influence of each under likely scenarios to determine the best possible outcome. –– Autocratic decision: Autocratic decision making is where the decision maker makes the decision with little or no input from others. –– Democratic decision: Democratic decision making is where the decision maker consults with others to develop consensus on the decision.
Simon’s Decision-Making Process Simon’s Decision-Making Process breaks down decision making into four phases. –– Intelligence phase: The intelligence phase makes assumptions of reality and simplifies the problem, leading to the problem statement. The focus of the intelligence phase is formulating a problem statement. To do this, the problem is often decomposed into simpler sub-problems that are easier to understand. However, there are issues that can impede defining a problem statement. –– Design phase: Next is the design phase, where a model of reality is created and criteria for selecting alternatives are defined. During the design phase, the model is tested by comparing the outcome of the model to historical results. The design phase results in viable options. There are three types of models used in the design phase. ○○ Normative model: The normative model seeks to optimize the result by choosing the best of all possible options. The assumption is to maximize the goal and that all options—and resulting consequences—are known. ○○ Heuristic model: A heuristic model is considered a sub-optimization model because the model selects the best option from the known options, not from all options. The goal is to reach a good enough solution fast. ○○ Descriptive model: The descriptive model represents things as they are or are believed to be. A descriptive model provides information that may lead to a solution, but it does not provide a solution. Simulation is the most common descriptive model that allows the decision maker to experiment with options.
Technology Supports the Decision Maker
273
–– Choice phase: Next is the choice phase, when each option is analyzed to determine the best option, which is then implemented. –– Implementation phase: The implementation phase is when the choice is executed. If the implementation fails, then the process returns to one of the previous phases. If the implementation succeeds, then the best decision was made. Each phase of Simon’s Decision-Making Process is supported by a decision support system. The intelligence phase is supported by artificial neural networks, management information systems, data mining, online analytical processing, expert systems, and enterprise resource planning systems. The design and choice phases are supported by some of the same systems, plus supply chain management systems, executive support systems, and commercially available analytical systems. The implementation phase is supported by knowledge management systems in addition to other decision support systems.
Business Reporting Many decisions are made with fundamental information about business operations— without the use of sophisticated modeling. A business is measured by data, such as the number of products sold each day; the cost to ship products; and the cost of servicing customers. Data from business operations is recorded by a computer system and stored in a database. A management information system extracts data and presents the data in a report that helps the decision maker make a decision. Reports can be in a PDF file, Word doc, and loaded directly into Excel, enabling the decision maker to manipulate data. Today, business reporting has changed with the introduction of visual analytics. Visual analytics is a decision support system that digests data for the decision maker, focusing the decision maker on outcomes rather than analytics. Data is presented in a way that tells the decision maker what happened and what is currently happening. This is referred to as information visualization. Data is also forecasted, telling the decision maker what will happen and why it will happen. This is referred to as predictive analytics.
Performance Dashboard A performance dashboard is a common form of visual analytics. Think of a performance dashboard as the dashboard of your car. You know the status of every system of your car by glancing at the dashboard. A performance dashboard does the same for an organization.
274
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
A performance dashboard monitors key performance indicators, analyzing each indicator to determine if the indicator is on or off target and enabling the decision maker to manage the organization by drilling down to details that are represented by the indicator. A business performance management system is a decision support system that alerts decision makers of impending problems in real-time and assists the decision maker with making and implementing a decision. Key performance indicators (KPIs) are at the heart of a business performance management system. Business performance measurements are depicted on a balanced scorecard that consolidates performance management and management methodology into actionable initiatives for the decision maker. A key performance indicator measures performance against a specific organizational goal. There are two types of key performance indicators. –– Driver key performance indicator: The driver key performance indicator drives the business and is called a leading indicator of performance. For example, the number of sales leads indicates there are potential customers interested in doing business with the organization. –– Outcome key performance indicator: The outcome key performance indicator is the result of business activities and is referred to as a lagging indicator. For example, revenue lags behind closing a sale.
Models and How Models Are Used to Help Make Decisions A model is a representation of a segment of real life and is created by initially defining the segment and then identifying data associated with the segment. The value of data normally changes in real life. In the model, data is represented as a label called a variable. It’s called a variable because the value of the label changes as it does in real life. Next, we set out to determine if there is a relationship among variables. A model is designed to estimate or predict a value based on data. The value estimated by the model is called the dependent variable. It is dependent on values of other data in the model. It is a variable because its value changes. Data is used to estimate or find the dependent variable based on other variables called independent variables. The value of each independent variable is not influenced by other independent variables and is not influenced by the value of the dependent variable. Let’s say you want to estimate how well the organization is meeting customer needs (dependent variable). You ask each sales representative for their opinion (independent variable). However, their opinions are biased because they have a vested interest in giving a positive opinion since it is their job to meet their customers’ needs. Therefore, sales representatives’ opinions are not independent variables. A survey of customers, trends in each customer’s sales, or comparisons of products/services
Models and How Models Are Used to Help Make Decisions
275
offered by competitors are likely better choices for independent variables since there isn’t a bias. There are many types of models. These include optimization models, simulation models, and predictive models. Each helps the decision maker in different phases of the decision-making process. Models are used in static analysis and dynamic analysis. –– Static analysis: Static analysis takes a snapshot of real life and tries to understand the situation represented by the snapshot. For example, investigators look at all pieces of wreckage to understand how a crash occurred. –– Dynamic analysis: Dynamic analysis examines the flow of data from a changing situation. For example, the black box in an air crash provides a flow of data that led up to the crash. Dynamic analysis provides a more realistic view of a situation because you can see patterns developing over time. Static analysis is limited because you don’t see what happened before or after the snapshot.
Mathematical Models Many models used in decision support systems are mathematical models. A mathematical model predicts one value (dependent variable) based on other values (independent variables) For example, profit is a dependent variable that is affected by many independent variables, such as cost, price, competition, and demand. Each independent variable has a limited effect on the dependent variable. That effect is measured by historical mathematical relationships. Over time a natural relationship is established. It is that relationship that enables the model to predict the dependent variable. A decision variable is a factor that the decision maker controls, such as the price of a product. An uncontrollable variable is a factor that the decision maker cannot control, such as competition. There are challenges when developing and using a model. First is the problem of identifying variables that influence the outcome. There are many factors that influence a real-life situation that are not easily identified. A model of that situation is flawed if factors are not included in the model. Another challenge is to collect the data both to develop and use the model. A relatively large amount of data for each variable is required to verify the accuracy of the model; however, the data may be difficult to gather electronically and may not be in the appropriate format to be used by the model. Likewise, the same issues with data occur when using the model. There must be an efficient and effective method of collecting data in the formation that can be used by the model to predict the future.
276
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
Further complicating the use of a decision model is the compounding error factor. Let’s say that the model contains five variables. Each variable has an error factor, indicated by plus/minus a statistical value. The prediction of the variable might be off by 10%. This might be true for each of the variables. There is a possibility that all variables may have at the extreme of that range—each is 10% off—and therefore have an accumulative error effect on the outcome of the model. And in some situations, multiple decision models may be in use, each assisting in making a decision on one aspect of the problem. Each model and each of its variable could be on the extreme error range. Sensitivity analysis is used to assess the impact of changes that input variables have on the output. Here are strategies for reducing the sensitivity of variables. Reduce the number of variables: Fewer variables decrease the accumulative error factor on the output. The model should use only variables that have a major impact on the outcome. Eliminate too-large sensitivities: Variables that have a large error factor should be eliminated from the model (if feasible) to reduce the error of the output. Obtain better estimates of sensitive variables: Variables that have a large sensitivity and are necessary for the model require careful data gathering and analysis. More accurate data produces an acceptable estimate and a lower error factor.
Certainty and Uncertainty Every decision is based on certainty and uncertainty and the risk that the decision will be wrong. Certainty occurs only when the decision maker has all the knowledge and potential outcomes when making the decision. Uncertainty is the lack of knowledge and unidentified potential outcomes when making the decision. Risk analysis is a process of assessing uncertainty by estimating the effect of the unknown on the decision using probability. For example, there is uncertainty each time you start a car. Will it start or won’t it start? There is no guarantee that the car will start each time you try. You can lower the risk that the car won’t start by making sure that the engine is maintained according to manufacturer recommendations and ensuring that you always have fuel in the tank. Risk analysis must be performed when making a decision. The decision maker must then determine if the decision is too risky or within the risk tolerance of the decision maker. For example, you may bring your car into the shop for maintenance once a year. You are willing to assume the risk that the car won’t start between appointments.
Models and How Models Are Used to Help Make Decisions
277
Decision Tree A decision tree is another tool used by decision makers to graphically represent the logic used to make a decision. A decision tree contains a relationship among decision points. A decision point is a step in the decision making process when a decision has to be made. Based on the decision, the decision maker follows a branch of the decision tree to either the next decision point or to a decision. Decision trees require that all decision points are identified and that each decision point results in a discrete decision. That is, the decision is either yes or no.
Search
The search is a decision-making approach that identifies the best solution within a given constraint, such as making a decision within a deadline. There are three search approaches: –– Optimization: The optimization approach continues the search until the best solution is found. –– Blind search: The blind search approach is either a full search or partial search. The full search continues until the best solution is found. A partial search continues until the deadline and then the best of the available alternatives is selected. –– Heuristic search: The heuristic search continues until a good-enough solution is reached.
Simulation Model Simulation is a model that gives the appearance of reality and is used to conduct what-if analysis. A simulation can be used to test options when the situation is too complex for other decision support techniques. The primary disadvantage is that simulation models are costly and cannot guarantee an optimal solution. Furthermore, it is difficult to model reality. The steps of creating a simulation model are: 1. Define the problem 2. Construct the model 3. Test and validate the model 4. Design experiments to evaluate the model 5. Conduct the experiments 6. Evaluate the results 7. Implement the model
278
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
Automated Decision Systems and Expert Systems An automated decision system is a rule-based system that provides a solution to repetitive, structured decisions. For example, an automated decision system can offer a specific discount to non-business travelers if only 70% of the seats on a flight are sold three days prior to departure. An automated decision system uses other decision support systems to formulate rules that determine the best decision. The outcome of an automated decision system is customized to a particular situation or used to standardize decisions throughout an organization. Artificial intelligence is a component of an automated decision system. Artificial intelligence is concerned with symbolic reasoning and problem solving with the goal to make computers learn from experience and make sense out of ambiguous situations, enabling the computer to respond quickly to a new situation. The goal of artificial intelligence is to mimic human intelligence, which is challenging because human intelligence has a vast capability that isn’t fully translated into computer instructions. Human intelligence can: –– Learn from experience –– Make sense out of ambiguous situations –– Quickly adapt to new situations –– Use reason to solve problems –– Apply what is learned to manipulate the environment An artificial intelligence system must also gather knowledge. Knowledge is information that is contextual, relevant, and actionable and acquired through education or experience. Anything that is learned, perceived, discovered, inferred, or understood is knowledge. –– Alan Turing, the father of modern computing, defined a test for intelligence, sometimes referred to as the Turing Test for Intelligence. A computer can be considered to be smart only when a human interviewer, “conversing” with both an unseen human being and an unseen computer, cannot determine which is which. Artificial neural networks (ANN) are at the heart of artificial intelligence. ANN simulates the neurons found in a living nerve system where each neuron is focused on a task. The network of neurons links together tasks (neurons) to form a process. One such process is a supervised learning process used for self-learning. There are four steps in the supervised learning process. The first step is to compute a temporary output, similar to setting margins on an unfamiliar word processing application. This task produces a result that may set the margins, may be close to setting the margins, or may not have set the margins. The next step is to compare the output with the desired target—were margins set? If not,
Models and How Models Are Used to Help Make Decisions
279
then the output is weighed. 1 indicates margins were set; 0 indicates margins were not set; and somewhere between indicates whether the output was on the right track to set the margins. Think of the value between 1 and 0 as the probability that the action actually set the margins. The last step is to repeat the process by selecting another element on the application screen; comparing the output with the target; adjusting the weight; and repeating the process. The application learns tasks that achieve the desired results and tasks that don’t achieve them. An expert system is an application that imitates an expert’s knowledge and reasoning to solve problems that are narrow in scope. Expert systems make the knowledge and reasoning of experts widely available, permitting non-experts to solve problems that require expert advice. Key to an expert system is the inference engine. An inference engine applies logical rules to knowledge to produce new knowledge. The inference engine uses either forward chaining or backward chaining to apply rules. Forward chaining starts with known facts and assertions of new facts based on a situation. Backward chaining starts with a goal and then works backward to determine how to achieve the goal. Knowledge engineering is a key to an expert system. Knowledge engineering is the process of acquiring knowledge from an expert and converting the knowledge into a knowledge base. There are five components of knowledge engineering. –– Acquired: First, knowledge must be acquired from the expert and then electronically stored in a format that can be used by an expert system. –– Validated: The knowledge must then be validated to ensure there are no conflicts among experts. –– Applied: Next, the logic for applying the knowledge—referred to as reasoning— must be electronically stored, enabling the expert system to apply the knowledge to solve a problem. –– Justification: Most important, there must be an explanation and justification for applying the knowledge. An expert can tell you why a solution was selected. An expert system must do the same. Experts deal with uncertainty using probability or beliefs. Probability is a statistical guess based on historical patterns. Beliefs are factors that the expert assumes are true based on the expert’s experience, and not based on probability. For example, the expert may tell the owner of a large store to expect a lot of customers the day after Christmas. The expert believes there are many customers who received gift cards for Christmas and will go shopping the day after Christmas; however, the expert has not performed research to support their beliefs. Uncertainty is represented as a degree of belief, expressed in a measure of belief, measured with a range from -1 (which is false) to 1 (which is true). Values within the range from -1 to 1 indicate the degree of belief—or the certainty factor. A degree of believability is different from a probability because probabilities must sum to 100— degrees of belief do not sum to 100.
280
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
Knowledge Management and Collaborative Systems Think of an organization as a person. Over the course of a lifetime, each of us gains knowledge. The same is true of an organization, except knowledge within an organization is with its people and systems. Knowledge management is the process of collecting, categorizing, and distributing knowledge within the organization. There are two sources of knowledge management. These are employees of the organization’s knowledge and knowledge stored in the organization. Employees are referred to as intellectual capital. Knowledge is information in the form of employees’ skills and data stored in the organization’s systems. An organization learns from the experience of its employees. Experience is transformed into policies, procedures, and best practices that are used by employees to make decisions. Based on the outcomes of decisions, the policies, procedures, and best practices may be modified, continuing the transformation from experience to learning. The knowledge management cycle describes how an organization acquires and shares knowledge among employees. Initially, knowledge is created through experience of one or a group of employees and is then captured on paper, in Excel, or by other electronic means. This is referred to as tactical knowledge. The value of tactical knowledge is analyzed, refined, and stored. The knowledge is then managed by a decision support system and made available to others throughout the organization. Knowledge management is used to support group work in the form of a collaborative system. Collaborative systems ensure that knowledge is shared in a timely way within the group. Emails, text messaging, video conferencing, audio conferencing, shared calendars, and shared drives are all forms of collaborative systems.
Data Warehousing and Data Mining An organization has many systems, each collecting and storing information in databases. Organizations attempt to logically assemble subject-oriented databases into a data warehouse, enabling decision support systems to search through all the organization’s databases and giving decision makers a breadth of information when making a decision. Each unit of data is non-volatile and relevant to a moment in time; a unit of data does not change once stored in the database. A data warehouse logically contains all the organization’s data. A smaller version called a data mart contains a subset of data from the data warehouse. This is referred to as a dependent data mart and is used by a division or department of the organization to make decisions. A data mart can also be independent from the data warehouse and contain strategic information accessible only by a specific business unit. This is called an independent data mart. Searching a data warehouse can identify patterns of information that provide new opportunities for the organization. However, there is an enormous effort to
Data Warehousing and Data Mining
281
create and maintain a data warehouse and to integrate all databases throughout the organization into a data warehouse. Each database stores data differently, and those differences must be resolved to create the data warehouse. Furthermore, there is no guarantee that searching the data warehouse will reveal new opportunities. A data warehouse can contain static data or real-time data. –– Static data: Static data is data that doesn’t change frequently, which impedes decisions that must be made on current data. –– Real-time data: Real-time data is data that is constantly updated. It enables real-time analysis and real-time decision making because the data warehouse is automatically being updated. However, a real-time data warehouse may be cost prohibitive. Furthermore, data may be outdated minutes after the results are received by the decision maker. Corporate leaders must determine the cost benefit of a real-time data warehouse. Data warehouses fail for a number of reasons. These include: –– Unrealistic expectations: Executives may be naive to believe that a data warehouse will produce information that will materially alter the business. In reality, spending time and money developing a data warehouse may not produce any material results. –– Loading all available data: Only data that might provide meaningful results should be loaded into the data warehouse, not all data within the organization. There must be a rationale for storing data in the data warehouse. –– Technology-oriented focus: Developing a data warehouse is technologically challenging, especially for the technologist in the organization. It is a puzzle that technologists like to solve. However, the focus must be user-oriented. The goal is to create tools that the organization can use to meet its goals and not to revolutionize technology. The data warehouse should be scalable. Scalability is the capability to replicate the design of the data warehouse to accommodate future changes. The data warehouse can grow in a number of ways. The amount of data can quickly grow. The number of users accessing the data warehouse concurrently will grow. As users become comfortable accessing the data warehouse, users will increase the complexity of queries for information. Further complicating data warehouse operations are real-time data updates and real-time analyses used to make real-time decisions. All require sufficient computing and network power to ensure prompt response time. Scalability ensures that users have the same experience accessing the data warehouse as these changes occur. Data mining is the process of searching a data warehouse—or non-consolidated databases—looking for patterns of data that increase the knowledge of the organization. A pattern is a mathematical relationship among data. Like gold mining or drill-
282
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
ing for oil, there is no guarantee that data mining will produce any data patterns that help the organization identify new business opportunities.
Text Analytics, Text Mining, and Sentiment Analysis Text mining is the semi-automated process that uses text analytics to identify patterns of unstructured information such as emails, Word documents, PDF files, and other free-form text. Text mining identifies patterns. However, the decision maker must review the results to confirm the pattern. For example, text mining is used for filtering spam emails; sometimes the spam is not actually spam, but only a person can determine the difference. Text mining is used in text-rich environments such as academic research articles, financial reports, patient medical charts, customer comments, and legal documents. Text mining is different from data mining. Text mining searches unstructured data, and data mining searches structured data. Think of structured data as a spreadsheet where information is organized into rows and columns. Each column contains the same kind of information, such as a person’s first name. Natural language processing is a key to text mining and text analytics. Natural language processing applies understanding to words in an unstructured document, similar to how we read the text. Natural language processing is able to interpret the meaning of a text sample sufficiently to filter most text that does not meet the search criteria. Search engines such as Google use natural language processing to return text that is likely to contextually match the search criteria. The result is usually a good match, although sometimes wrong text is returned. The text mining process begins with collecting written text, referred to as the corpus. Next, a term-by-document matrix is created that contains the frequency that terms are used in the collection. Rows correspond to documents in the collection and columns correspond to terms. Then, patterns are extracted from the collection. The text mining and natural language processing combination is the foundation for algorithms that are capable of learning knowledge by “reading” text, which is a goal of artificial intelligence. Sentiment analysis is a form of text analytics that identifies and categorizes a writer’s written opinion as positive, negative, or neutral. Sentiment analysis also looks for emotions such as angry, sad, and happy. Sentiment analysis is typically applied to social media to determine the sentiment of participants to current events, products, or services. There are many challenges to effective sentiment analysis. For example, it must “understand” variations in the English language and must be able to “understand” non-English terms and phrases to determine what is positive, negative, or neutral.
Web Analytics, Web Mining, and Social Analytics
283
Sentiment analysis is a four-step process. First, the text is retrieved, then words and phrases are evaluated. Each piece of text is then classified as positive, negative, or neutral. Next, the target of the expressed sentiment is identified. A target can be a person, product, or event. Finally, once all pieces of text are evaluated, the results— the number of positive, negative, or neutral pieces of text—are aggregated to produce an overall sentiment of the document. Although sentiment analysis is used for text, it can also be used for speech using a speech recognition application that converts the spoken word into a document.
Web Analytics, Web Mining, and Social Analytics The web is the largest repository of data. Web mining is the process of using web analytics to determine natural relationships of data available on the web such as text, hyperlinks, and usage of the web. Web mining uses analytics similar to those used in data mining and text mining; it analyzes both structured and unstructured data. There are challenges with web mining. The web is too big, complex, and dynamic, making effective data mining at times improbable. A search engine uses web mining and web analytics. A search engine begins with a list of URLs. Web crawler software visits each URL on the list and retrieves and stores information on webpages that are associated with each URL. Webpages are indexed and are searched for information and other URLs. The process repeats with newly found URL. When a search query is received, a query analyzer searches the search engine database for webpages that have content that matches the search query. The document matcher/ranker ranks matching webpages and displays webpages in ranked order. Clickstream analysis is the process of collecting, analyzing, and aggregating data about webpages visited and the order in which webpages were visited. Data is obtained by analyzing access logs on the web server and cookies stored on the web browser. The goal is to identify opportunities to increase customer value, improve the flow of a website, and to improve data gathering. The results of web analytics are typically available in a graphic format as a dashboard. Social media analytics is the fastest growing form of web mining and web analytics. The goal is to analyze web-based social media outlets to develop insights into specific demographics of the population—very similarly to how sentiment analysis analyzes the sentiments of a target demographic. Social media analytics identifies people who influence others by tracking the ripple effect of opinions throughout social media outlets. The goal is to understand how opinions are influenced.
284
Chapter 10: Decision Support Systems, Data Analysis, and Big Data
Big Data and Analytics Big data refers to the volume of data—massive amounts of data. For example, a Boeing aircraft generates, processes, and stores 20 terabytes of data each hour of flight. Facebook handles 500 terabytes of data daily, and YouTube processes 1 terabyte of data every 4 minutes. Big data comes from GPS systems, radio-frequency identification (RFID) used for product control, internet-based text documents, internet searches, and military surveillance. Big data is commonly defined as: –– Volume: The pure amount of data –– Variety: Different kinds of large amounts of data –– Velocity: The increasing volume of data –– Veracity: The accuracy of large amounts of data –– Variability: The variations within large amounts of data –– Value: The importance of large amounts of data Big data alone is worthless unless the data can be analyzed. Big data analytics is the process of learning from massive volumes of data. This is challenging since data arrives faster than can be handled by traditional analytics, systems, hardware, and networks. Furthermore, data from social media, the web, and textual sources are unstructured and can’t be quickly integrated into a cohesive data repository. Before embarking on big data analysis, the organization must have a clear business need for the data (value proposition) and a strong commitment from senior management. There must be a strong alignment between business and IT strategy because special infrastructure is required to store, process, and analyze big data, and the right tools and the right people must support it. Big data analytics requires speed. All data must be stored in memory, not on a disk drive. The application that is analyzing the data must be stored close to the data to reduce any delay accessing the data. A delay of a fraction of a second is too long when compared to the massive volume of data that must be processed. Furthermore, big data analytics require multiple processors working together to process the data. Big data analytics is focused on stream analytics because it may not be feasible to store data. Current platforms are limited. Analysis must be performed in real-time as the data flows through the system. The application may have to react instantaneously to changes in the data flow, such as in the power-grid industry and in aircrafts. Traditional data storage (i.e., relational database) is likely inadequate because there are large amounts of data stored and organized in different database designs (data schema). Big data requires schema-on-demand data storage, where the data schema is decided when data is acquired. This enables a variety of data types to be analyzed.
Big Data and Analytics
285
Critical success factors for big data analysis are: –– A clear business need –– Organization’s commitment to a big-data strategy –– The organization needs to be fact-based when making decisions –– IT strategy must be aligned with the business strategy –– Big data must be processed using massively parallel processing (many computing devices) –– The capability to capture, store, and process vast amounts of data in a timely manner
Chapter 11 Forensic Computing and How People Can Be Tracked Forensic computing is the area of computing that collects and analyzes digital data and reports findings that are used as admissible evidence in a legal proceeding to either prove or disprove an element in a legal action. Digital evidence is any information that is stored or transmitted in a digital form that provides probative information about an action or inaction of a party in a legal action. Probative information is information that tends to prove a purported fact. The computer forensic investigator gathers digital data at the request of attorneys and the court, which determine if the digital data is authentic and relevant to the legal action. The computer forensic investigator uses forensic techniques to collect and analyze the digital data, ensuring that the process does not interject misleading results and that the analysis is based on scientific fact. In this chapter you will learn about what actions can be taken legally to access information on your work or home devices and what data can be exposed. Is your computing device safe? The presumption is that what you do on your computing device is private; however, that’s not necessarily true. –– Your internet activities are stored in your browser history on your computing device. The internet service provider that you use has a log that identifies your computing device and the IP address temporarily assigned to your computing device while you search the internet. That IP address is also stored with the IP address of websites you visit. –– Websites you visit store the IP address that you use to access the website, along with other information gathered during your visit that identifies you. –– Your email provider (i.e., Yahoo, Gmail) retains your emails for a specific length of time. –– Chats and postings on social websites are also retained for a specific period of time. –– Friend lists are also stored by social websites. –– Nothing is private if you use your work computing device because your employer owns the computing device and has the right to access it at any time without notifying you. The IT department can access the computing device over the network even when you think the computing device is locked. The presumption should be that nothing on your computing device is private, even if you’re not surfing the internet. Technically, information on your computing device and the device itself can be seized by law enforcement if there is suspicion of a violation of law. The computing device can then be searched by a computer forensics investigator whose job it is to gather evidence, which you will learn about in this chapter. DOI 10.1515/9781547400812- 011
288
Chapter 11: Forensic Computing and How People Can Be Tracked
Usually, law enforcement must have a suspicion that your computing device contains evidence related to a violation. The suspicion is presented to a judge in a request for a search warrant. If law enforcement presents sufficient probable cause, the judge will issue a search warrant that clearly defines what law enforcement can search. Regardless if you are home or not, law enforcement agents will enter your home, present the search warrant, and leave with your computing device(s). The computing device(s) may be returned to you after the legal matter is settled, which could take years. Law enforcement may also execute a search warrant for information about you and activities performed on your computing device to ISP, cloud providers, social websites, and other organizations that may have information about you and your activities. In these situations, you probably will not be told of these searches. The search warrant is valid if the request for the search warrant is filed in good faith; the information for the probable cause to search is based on reliable information; the warrant is issued by a neutral judge; and the warrant must specifically must state the place to be searched and items to be seized. There are exceptions to the search warrant requirement. A search can be conducted if you give permission for the search. The plain view doctrine allows law enforcement agents to search and seize evidence that is clearly in sight without a search warrant. However, this is usually to preserve the evidence, referred to as an exigent circumstance. For example, a computing device lying on the seat of a car that is stopped in relation to a burglary probably can be seized but a search warrant is probably required to search the computing device. Once the computing device is in the hands of law enforcement, the evidence is protected but a proper search warrant is required to examine the evidence. The federal government can request secret authority to gather information about an individual or organization with permission from the Foreign Intelligence Surveillance Court (FISC). Although there are records of proceedings, these records are not made public. FISC warrants are good for a year and authorize the collection of bulk information related to foreign targets but can include communications between the foreign target and a U.S. citizen. U.S. customs agents can conduct a search without a warrant and without suspicion that a crime has been committed. For example, a U.S. customs agent can take your laptop when you cross the border and send it to a computer forensics investigator for examination even if you are not a suspect in a criminal proceeding. Besides law enforcement, some information on your computer device may be accessed by programs and apps running on your computing device. This includes the operating system. You probably gave permission to collect the information when you agreed to the terms of use. This is the legal document that few read when installing a program or app, right before clicking “I agree.”
Protecting Your Computing Device
289
Protecting Your Computing Device You can attempt to protect your computing device by using the lockout feature that requires authentication to unlock the device. Randomly generated passwords are a good option to prevent unlocking the computing device; however, it also makes it difficult for you to remember the password. Biometric readers are a better alternative, or encrypting data on your computer. The triple data encryption standard (3DES) is currently the strongest encryption method. A locked computing device and encrypted data may not prevent law enforcement from gaining access to the computing device and data. Law enforcement agents can obtain a court order requiring you to unlock the device and decipher the data. Efforts are currently being made to require computing device manufacturers and developers of encryption software to provide law enforcement with access to the computing device and encrypted data. This is currently being decided in the legal system. Destroying data on a drive isn’t as easy as deleting a file. Deleting a file simply tells the operating system that space held by the file is available and can be overridden. You never really know if all parts of the file are overridden the next time another file is saved to the drive. As you’ll learn in this chapter, a computer forensics investigator has tools that can try to piece together pieces of deleted files. One of the better ways to ensure that files are no longer readable is to reformat the drive. Reformatting sets storage areas on the drive to zero, making all existing data unreadable. Doing so removes everything from the drive and can take time to finish reformatting.
The Legal Environment Legal action can be taken if there is a violation of law. There are two general classifications of law: criminal law and civil law. Criminal law defines rules of society created by the government. Violating a criminal law means that the government can take action against the individual that might result in a fine or incarceration. Civil law also defines rules of society created by the government but private parties can bring action against an individual, although the government may also bring civil action. The result of civil law action is monetary and there is no incarceration. In a criminal action, the prosecutor employed by the government files a criminal complaint in the court against an individual referred to as the defendant. The criminal complaint accuses the defendant of violating a specific criminal law. The prosecutor must present evidence that proves beyond a reasonable doubt that the defendant violated the criminal law. The defendant usually is represented by another attorney called a defense attorney who presents evidence that the defendant did not commit the crime.
290
Chapter 11: Forensic Computing and How People Can Be Tracked
In a civil action, there isn’t a prosecutor. The plaintiff is the party bringing the legal action against the defendant. The plaintiff presents evidence that supports the claim that the plaintiff incurred an injury caused by the violation and that the defendant violated the law. The injury can be physical, psychological, or monetary. The plaintiff is successful if the preponderance of the evidence reasonably proves that the defendant violated the law. There are two important distinctions between the civil action and the criminal action. In a civil action, the plaintiff must have been injured related to the violation. Also, reasonable proof—not proof beyond a reasonable doubt—is necessary. Another type of civil action is to request that the court take action against a defendant, such as stopping the defendant from doing something that will cause the plaintiff injury. This is referred to as an injunction. In actions asking the court to intervene, the individual making the request is called a petitioner. The request is called a petition. The defendant is called the respondent. Both parties present evidence and the court makes a ruling.
Criminal Trial A criminal trial is the legal proceedings where both the prosecutor and the defense attorney plead their case. The defendant can choose to have a jury trial or a trial heard by a judge. In a jury trial, impartial members of the community are assembled to hear the evidence. The jury is the judge of fact. They consider the evidence and determine if the evidence is factual and if the sum of all the evidence proves beyond a reasonable doubt that the defendant committed the crime. The judge in the jury trial is the judge of law and decides if the proceedings, evidence, and everything else related to the case adhere to law. The judge, for example, can rule that a piece of evidence does not conform to the rules of evidence, leading to the evidence not being presented a trial. In a trial heard by a judge, there is no jury. The judge is the judge of the law and decides if evidence is fact and if the summary of the evidence proves beyond a reasonable doubt that the defendant committed the crime. Each state and the federal government have their own laws and rules for legal proceedings. Generally, there are three categories of criminal violation. The most serious is commonly referred to as a felony. These are very serious crimes that usually—but not always—result in a minimum term of incarceration. The less serious category of crime is commonly called a misdemeanor that can result in a fine and/or incarceration. An infraction, sometimes called a disorderly person, is the minimum type of criminal offense that usually results in a fine but could result in less than a year of incarceration. Infractions are adjudicated usually in municipal court, the judge taking on the role of the jury and at times in a general sense as the defense attorney. Crimes higher than an infraction are referred to as an indictable offense because of the seriousness of the charges and the consequences if the defendant is found
The Legal Environment
291
guilty. An indictment is a form of accusation by a grand jury. A grand jury is an impartial assembly of the community who hear the prosecutor’s evidence. The grand jury decides if a crime was committed and if there is likelihood that the defendant committed the crime. If both answers are yes, then the grand jury returns an indictment accusing the defendant of committing the crime. The case then proceeds to trial. No indictment occurs if the prosecutor’s evidence does not prove a crime has been committed and that the defendant is likely to have committed the crime. It is critical to understand that only a minimum amount of evidence—not beyond a reasonable doubt—is necessary to prove there is likelihood that the defendant committed the crime.
Civil Trial A civil trial is similar to a criminal trial. The plaintiff’s attorney is like the prosecutor who must provide evidence that proves the accusation. The defense attorney disproves the accusation by challenging evidence presented by the plaintiff or by presenting additional evidence that refutes the plaintiff’s accusation. Civil action usually is based on a violation of an agreement between the plaintiff and the defendant. The agreement is called a contract. Some contracts are verbal contracts and others are written. Contracts can be the basis for business transactions following the rules contained in the Uniform Commercial Code. The Uniform Commercial Code defines rules that states have adopted as law. Trials are costly. Both plaintiffs and defendants may pay more in litigation costs than the monetary value awarded at trial. Therefore, other legal alternatives are taken to avoid trial. –– Settlement: A settlement in an agreement to resolve the issue amiably is brought about by the plaintiff’s attorney and the defendant’s attorney. –– Fact-finding: Fact-finding is where an independent third party determines the facts of the case by reviewing the evidence. This provides a basis for a settlement. –– Mediation: Mediation is where an independent third party called a mediator attempts to bring both sides together to achieve a settlement. –– Arbitration: Arbitration is a process where the plaintiff and the defendant present evidence to an arbitrator who then defines the terms of a settlement. Arbitration is usually binding and findings must be adhered to by both sides. There are no options to bring the case to court. –– Summary judgment: A summary judgment is a process whereby the plaintiff and the defendant both present evidence to a judge and the judge then rules based on the evidence without a trial. In many cases, evidence is submitted in writing to the judge without going to court.
292
Chapter 11: Forensic Computing and How People Can Be Tracked
Decisions and Appeals Decisions in a jury trial are made by the jury. In a criminal trial, all jurors must agree on the verdict. In a civil trial, a majority or a super majority of jurors must agree. A majority is one more than half, such as seven jurors in a jury of twelve. In some states, a supermajority of jurors is necessary to reach a decision. A supermajority is 2/3 of the whole, such as nine jurors in a jury of twelve. Here are types of decisions that can be made in a trial. –– Compromise verdict: A compromise verdict is the decision made by the members of the jury after listening to each juror’s opinion. Collectively, jurors agree on the conclusion. –– Directed verdict: A directed verdict is when the judge orders the jury to return a specific verdict (guilty or not guilty) because no reasonable jury could reach a contrary decision. The jury must follow the order of the court. –– Deadlock: A deadlocked jury, sometimes referred to as a hung jury, occurs when the jury is unable to reach a verdict after extended deliberation. In such cases, the judge may order a mistrial, giving the case back to the prosecutor or plaintiff to request a retrial. –– Mistrial: A mistrial occurs when a material error occurs in the proceedings that jeopardizes the integrity of the trial. The trial stops and the prosecutor or plaintiff may request a retrial. –– Acquittal: An acquittal is a verdict that states the evidence presented does not prove the charges. It does not mean that the party is innocent. It means there wasn’t sufficient evidence to convict. In a criminal case, the same charges cannot be brought because the U.S. Constitution prohibits double jeopardy. –– Dismissal: The judge may dismiss charges with or without prejudice. “With prejudice” means that the same charges cannot be refiled. “Without prejudice” means that the same charges can be refiled. A judicial decision can be appealed if a party feels that the decision is flawed by errors in the proceedings. A trial is a forum for a legal contest that has many rules. There are rules governing how the trial is conducted—rules of evidence, and rules based on precedents (past rulings). Sometimes, the judge must interpret laws, and those interpretations might be flawed. Any time an attorney feels the other attorney, a witness, or even the judge violates a rule, the attorney voices an objection and the judge decides if the rule was broken or not. After the trial is over, the attorney can challenge the verdict in an appeal to the appellate court. The attorney states the objection and then provides evidence to support the claim along with how the faulty ruling negatively affected the outcome of the trial. The appellate court reviews the argument and evidence and decides to let
The Legal Environment
293
the original verdict stand or overrule the verdict, sending the case back to the judge who oversaw the trial for further review or to hold a new trial. If the attorney disagrees with the appellate court’s decision, then the attorney can appeal to the next highest court. In state cases, each state has an equivalent of a Supreme Court. In federal cases, cases are heard at the U.S. District Court and can be appealed to the U.S. Supreme Court. Judges on the highest court hear arguments as to why the appellate court’s decision is incorrect. A final decision is then made in the case.
Evidence Evidence is fact that supports or contradicts a belief, such as whether or not an individual has violated the law. The prosecutor or the plaintiffs have the burden of proof to provide evidence that supports claims made in the case. Evidence must be authenticated to ensure that the evidence is genuine and not a forgery. For example, before an eye-witness to an event can testify, evidence must be provided to show that the witness was actually at the event. Likewise, a copy of a document will not be admissible as evidence if the original is available. If the original is unavailable, then evidence must be provided that the copy is an actual representation of the original. The chain of custody is a technique used to ensure that the evidence collected is not tampered with throughout the legal proceedings. The evidence is identified and placed in a sealed container, and the investigator collecting the evidence initials the sealed container and records the evidence in a log. Each time the evidence is touched, an entry is made in the log. There is a paper trail of the evidence to ensure no one has tampered with the evidence. There are different types of evidence. These are: –– Direct evidence: Direct evidence is evidence provided by a witness who has direct knowledge of the fact, such as a witness that saw the defendant fire the gun. –– Indirect evidence: Also referred to as circumstantial evidence, indirect evidence is evidence that can infer a conclusion. For example, the gun was found on the floor; the defendant was in the room; the defendant’s fingerprints were on the gun; and gunshot residue was on the defendant’s hand. This leads a reasonable person to believe that the defendant fired the gun. –– Hearsay evidence: Hearsay evidence is presented when a witness testifies as to what another person said to the witness. The witness does not have any firsthand knowledge that the statement is true. Hearsay evidence with few exceptions is not permitted in a legal proceeding. –– Testimonial evidence: Testimonial evidence is an assertion made by a witness under oath under penalty of perjury.
294
Chapter 11: Forensic Computing and How People Can Be Tracked
–– Physical evidence: Physical evidence is a material object, such as a spent bullet from a gun. –– Scientific evidence: Scientific evidence is facts determined in nature or determined from experiments in a controlled environment. The burden of proof for scientific evidence is with the presenter of the evidence. For example, a technician who performed a ballistic test on a firearm must clearly present the scientific basis for the test; prove that testing methodology followed scientific principles; and that the findings were consistent with the underlying science. –– Expert evidence: Expert evidence is testimony given by an individual who by training or experience is competent enough to draw a technical conclusion based on scientific evidence.
A Computer Forensics Investigation Today, advances in technology and the ubiquitous use of technology, especially cell phones, have become key ingredients in the pursuit of the truth in criminal proceedings. Emails, phone calls, text messaging, internet activity, the files on computers used by all parties including the victim’s, geolocation, public and private security systems, cameras and public surveillance systems, and evidence found on computers and other devices have changed the nature of the evidence used. This leads to the solution of crimes or civil disputes to the point where attorneys today are expected to understand and utilize the basics discussed in the remainder of this chapter. The objective of computer forensics is to identify digital evidence for a legal case. Digital evidence can be on a computing device such as a hard disk or USB stick, or digital evidence in motion such as data transmitted over a computer network. The computer forensics investigator must collect and preserve digital data to ensure its authenticity. Furthermore, the computer forensics investigator analyzes digital evidence and then provides expert testimony on the digital evidence. Attorneys and the courts determine if the evidence is relevant to the case. For example, a suspect might be defrauding an online merchant. A computer forensics investigator may be called in to conduct a digital forensics investigation. The investigation begins by examining the vendor’s website’s log. The log contains the IP address of computers used to visit the website. Most IP addresses identify an internet service provider, not the suspect’s computer. However, the internet service provider’s activity log contains the date, start time, end time, and the MAC address of the customer’s computer who had access to that IP address. The MAC address is the media access control address that uniquely identifies a network device: the customer’s computer. This digital evidence is sufficient for law enforcement to obtain a search warrant to seize the suspect’s computer when the computer forensics investigator examines the computer’s MAC address. If it matches, then there is a digital link between the suspect’s computer and the vendor. Authorities still have to prove that
A Computer Forensics Investigation
295
the suspect used the computer to defraud the vendor or in a manner, supported by other evidence, to prove the case. Records are retained for various lengths of time. ISP IP logs are kept for approximately six months. Records of calls and cell tower usages are held for a year. Text message content is retrained for upwards of five days. Website visits are held for about ninety days. This may vary by carrier. The focus of digital forensics investigation is frequently a hard drive. The initial step is for the computer forensics investigator to write protect the evidence drive before acquiring data from the evidence drive. The evidence drive is the drive that contains evidence related to the legal case. The forensic investigation only uses a copy of the evidence drive to look for evidence and never the original drive. The original drive is preserved, enabling other investigators to conduct further study that confirms or challenges the evidence. The next step is to analyze the data. There are a number of ways to analyze data; the most common is to perform a keyword search looking for words and phrases related to the subject of the legal case. The last step is to identify evidence and present the evidence, explaining how you found it, what you found, and how it relates to the legal case. Findings are presented in a written report. Many computer forensics tools automatically generate a basic report that is transformed into the official evidence report. The official evidence report contains information specifically related to the legal case. The basic report from the computer forensics tool contains information specifically related to the evidence gathered from data from the evidence drive. Once the final report is written and reviewed, the report, the copy of the evidence drive, and the evidence drive are submitted as evidence—usually to the legal team who requested the computer forensics investigation or directly to the court.
Types of Computer Forensics Investigations There are two types of computer forensics investigations: public investigations and private investigations. A public investigation is typically a criminal investigation conducted by law enforcement to prove that the target of the investigation has committed a crime. Sometimes the prosecutor hires an independent computer forensics investigator to connect the computer forensics examination of computing devices related to the case. This happens when the prosecutor’s offices and related law enforcement agencies lack the capability to perform a computer forensics investigation. Defense attorneys also use independent computer forensics to gather computer forensics evidence that counters the prosecutor’s findings. Many private investigations involve civil litigation, where the plaintiff attempts to prove injury caused by the defendant. The result is usually a monetary penalty. It
296
Chapter 11: Forensic Computing and How People Can Be Tracked
is rare that civil litigants appear in court. Most civil litigations are settled out of court directly by attorneys or through mediation. Other private investigations are internal investigations within an organization. A corporate attorney may suspect an employee of an impropriety and request a computer forensics investigation to provide evidence to support the suspicion. The results may clear the employee or lead to the employee’s termination. In some cases, results may be turned over to law enforcement for criminal action. In most situations, the goal is to stop the offending practice and not to bring expensive litigation.
Tools of Computer Forensics Computer forensics tools are software tools and hardware tools. Computer forensics software tools are grouped together in a software suite that collectively are used to acquire evidence data, process evidence data, search evidence data, and automatically produce a report that describes these tasks. A popular computer forensics software suite is called EnCase. There are times when additional forensics software tools are required because the computer forensics software suite lacks such tools. For example, not all computer forensics software suites contain a steganography tool. A steganography tool is used to identify hidden text in a picture file. The computer forensics workstation is a computing device that has the computing power to acquire and analyze computer forensics evidence. A computer forensics workstation must have large amounts of memory and disk space and a powerful processor to store and analyze large data files. Disk space is used to temporarily store an exact copy of the evidence disk drive. Memory is used to analyze evidence data. In addition to computer forensics software, special hardware is required to conduct a computer forensics investigation. The computer hardware enhances the workstation’s capabilities. Two common enhancements are extra bays and ports used to connect the evidence drive to the workstation to acquire evidence data. Another common enhancement is a write blocker. A write blocker prevents the operating system from writing over the evidence drive. Operating systems normally write data to a disk drive and overwrite portions of the disk drive that contain deleted data. However, a computer forensics investigation requires that the contents of a disk remain intact and preserved throughout the legal proceedings. A write blocker ensures that the operating system does not accidentally overwrite data on the drive that might be recovered, even if erased, by the forensics team.
Legal Consequences of Computer Forensics It is critical that a computer forensics investigation be conducted with utmost care. Any deviation from acceptable evidence-gathering practice may result in exclusion
A Computer Forensics Investigation
297
of the evidence from the proceedings—and without computer forensics evidence, the case itself may be dismissed. Therefore, the computer forensics investigator must prove beyond a reasonable doubt that the computer forensics evidence was gathered legally and appropriately. The computer forensics investigation must be authorized. The Fourth Amendment of the U.S. Constitution prohibits unreasonable search and seizure. Before the computer forensics investigator touches a computing device that is the target of the investigation, the computer forensics investigator must be authorized to gather computer forensics evidence from the computing device. Attorneys usually address the legality of a computer forensics investigation before the computer investigator is brought into the investigation. However, the computer investigator must confirm there is legal authorization for the examination of the computing device. The owner of the computing device can give written permission to conduct the examination. This is frequently the situation in private investigations where the organization owns the computing device used by employees. However, if the owner does not give permission, then the legal team needs to obtain a warrant from the courts, which is common in criminal cases and in civil cases where the petitioner doesn’t own the computing device. The legal team must provide evidence to the court that the computing device is likely to contain evidence that is critical to the legal proceedings. Either side in the legal action can request the court’s authorization to access and examine the computing device. The warrant, if granted, specifies the conditions to seize and analyze the computing device. The plain view doctrine gives law enforcement the right to seize a computing device without a warrant if the computing device is in plain sight and if a law enforcement officer sees the computing device being used to violate the law. For example, law enforcement might be inside a suspect’s home responding to a disturbance and see child pornography on the screen of a computing device. The computing device is in plain view and the officers have a right to seize the computing device without a warrant. Once the computing device is in custody, officers can request a search warrant to search the device. Once the computer forensics investigator has proper authorization to examine the computing device, then proper methods must be used to ensure that the computer forensics evidence was accurately reproduced and proper procedures were followed to ensure the verifiability of the evidence that was analyzed. Any doubt that the proper procedures were used might cause the evidence to become non-admissible in the legal proceeding. Proper procedures require that the chain of custody be documented. The chain of custody identifies who has custody of a piece of evidence from the time evidence is gathered from the original source to the time evidence is legally destroyed or returned to the owner of the evidence. For example, a police investigator takes possession of a computing device from the owner of the computing device. Acquiring the evidence is documented in detail. The police investigator turns over the computing device to
298
Chapter 11: Forensic Computing and How People Can Be Tracked
the police officer who is responsible for the evidence room. This too is documented by both officers. The evidence room is a secured location within the police facility. A police investigator may retrieve the computing device from the evidence room to turn it over to the computer forensics investigator. Both the removal of the computing device from the evidence room and giving it to the computer forensics investigator are documented. Documentation of the chain of custody is usually in an evidence log and on the evidence itself. The evidence (computing device) is usually in a sealed envelope. Each person taking custody of the evidence documents that the envelope was sealed and untampered. If the person breaks the seal, then that person reseals it, documenting why the seal was broken and what was done to the evidence by that person. Let’s say that the computer forensics investigator broke the seal; removed the computing device; made a copy of evidence data on the computing device; then replaced the computing device in the packet and resealed it. The computer forensics investigator provides detailed documentation on what was done to the computing device and by whom when the computing device was no longer in the packet. This maintains the integrity of the chain of custody. Furthermore, the computer forensics investigator must verify that the copy of the evidence data is an accurate copy. This is accomplished by using a technique called hashing. Hashing is a mathematical calculation that measures the length of data and produces a hash value. Hashing is performed on the evidence data and on the replication of the evidence data. If both have the same hash value, then it is safe to say that the copy is the same as the evidence data. The hashing can be performed by anyone using the same tool and the same data, and the result will be the same hash value. It is critical that the computer forensics investigator avoid unintended consequences that might bring doubt on the results of the investigation. The computer forensics workstation must be maintained. Storage space for evidence data must be cleaned of data from previous investigations. Preferably, a new storage device is used for every evidence data acquisition. The computer forensics workstation must be compatible with the target computing device, operating system, and computer applications. Computer forensics workstations can be a field-kit or a laboratory workstation. A field-kit is a portable computer forensics workstation that can be brought to the remote location to gather evidence. The laboratory computer forensics workstation is designed for analysis of the evidence data. It is critical that computing devices be packaged in anti-static evidence bags to prevent accidental static discharges that might affect the computing device. Any disruption of the evidence data makes the evidence data invalid and cannot be used in the case.
Conducting a Computer Forensics Investigation
299
Conducting a Computer Forensics Investigation There are commercial forensics software suites such as EnCase, Forensics Toolkit, and ProDiscover that are purchased from vendors who specialize in developing forensics software to meet the needs of professional computer forensics investigators. Opensource forensics software suites such as Autopsy and Digital Forensics Framework are available at no cost from the internet. In addition to computer forensics software suites, there are computer forensics utilities that focus on a single aspect of a computer forensics investigation. These include FTK Imager, dcfldd, and dd, which enable the computer forensics investigator to obtain an image of an evidence drive. An image is an exact copy of the drive.
Preserving Data Using Write Blockers It is critical that data be preserved during the forensic investigation. Any indication that the data is different from the evidence data can lead the entire evidence data to be excluded from the case. Even if it appears that it might have been changed without it actually changing, the courts may doubt the accuracy of the evidence and therefore discount the evidence data. For example, the computer forensics investigator creates an image of the evidence data on the computer forensics workstation. The accuracy is verified by the hash value. The computer forensics investigator uses a hex editor to read the copy of the evidence data on the investigator’s device. This is accomplished by the hex editor copying the copy of the evidence data into computer memory. The computer forensics investigator can now change hexadecimal values in memory using the hex editor without jeopardizing the integrity of the copy of the evidence data. However, this changes if the computer forensics investigator inadvertently saves the changed copy in memory to the copy of the evidence data on the computer forensics workstation. The copy of the evidence data is now considered corrupted and can no longer be used for the investigation. The computer forensics investigator must document the error and then acquire another copy of the original evidence data, assuming it is available. If it is unavailable, then the computer forensics investigation into the computing device cannot continue. A write blocker is used to prevent overriding evidence data or a copy of the evidence data. There are two types of write blockers: hardware and software. Hardware write blockers are computing devices placed between the evidence drive that contains the evidence data and the forensics workstation. The software write blocker is a software component of a forensics suite that disables the capability to save anything to the evidence drive. Software built into the computer forensics suite automatically documents the chain of custody with the evidence drive by recording who accessed it, when, and why
300
Chapter 11: Forensic Computing and How People Can Be Tracked
it was accessed. Each computer forensics investigator is assigned a logon to access the computer forensics suite. The computer forensics suite is used to access the evidence drive. Each access—and attempted access—is time-stamped and stored in a log. Before granting access to the evidence drive, the computer forensics investigator must provide a reason for the access. This too is stored in the log. Any attempt to save to the evidence drive is recorded with sufficient information to help identify the person who made the attempt. There are various types of hardware write blockers. One of the most common is to connect the evidence drive to the hardware write blockers using a USB port and then to the computer forensics workstation. The hardware write blocker, however, must be able to access different types of interfaces to connect to different types of hard drives. An alternative to the USB connection is to use a hard drive docking station. A hard drive docking station enables you to place the evidence drive into the docking station that is a hardware writer blocker. The hard drive docking station is connected to the computer forensics workstation. Professional-grade hardware write blockers contain switches used to activate or deactivate the hardware blocker. This enables you to set up a hard drive for read-only or for read/write access. Some hard drive writer blockers are hard drive duplicators. A hard drive duplicator is a hardware device that copies one hard drive to another hard drive by using a mirroring process where it copies sector by sector, ensuring that each piece of evidence data location is replicated. Hard drive duplicators are fast.
Hashing Hashing is a mathematical technique used to ensure that a copy of a file is exactly the same as the original files. A hashing program uses a hashing algorithm on the original file to arrive at a hash value. A copy is made of the file and the same hashing program uses the hashing algorithm on the copy of the file to create another hash value. If both has values that are the same, then it is said that the copy is exactly the same as the original. It is critical that the device used to copy the hard drive also performs hashing to ensure that files are copied without any changes. The image copy of the evidence drive must have the same hash value as the evidence drive. Any difference indicates that the image copy is not a copy of the evidence drive. There are two commonly used hash algorithms used in computer forensics. These are MD5 and the secure hash algorithm (SHA). The MD5 algorithms have vulnerabilities that are not found in the SHA. The MD5 is at risk for a remote possibility of a collision. A collision happens when two different files produce the same hash value when using the same hash algorithm. The SHA uses a large hash value, making the hash value less likely to encounter a collision.
Hashing
301
Hexadecimal Level of Investigation The computer forensics investigator needs to examine the contents of the evidence drive. The physical content of the evidence drive is in the form of magnetic settings in the medium. Each setting is the logical equivalent of a binary value. The computer forensics investigator must be able to translate these settings into meaningful information. For example, the computer forensics investigator uses a tool such as a hex editor to convert the data to readable text. As you learned from Chapter 2 and Chapter 3, data is stored as a binary value of a series of 0s and 1s. Binary values are numbers in the binary number system. You use the decimal number system that has ten digits of 0 through 9. When you add 1 to 9, you carry over the value one place to the left. This appears as 10. The binary number system has two digits. When you add 1 to binary 1, you carry over the value one place to the left. This appears as 10. It looks like ten but the equivalent decimal value is 2 because there are only two digits in the binary number system. Computer forensics investigators rarely work at the binary level of evidence data. However, they do work at the hexadecimal level of evidence data. Hexadecimal is also a numbering system similar to the binary number system and the decimal number system. In hexadecimal there are 16 digits. The first ten digits are the same as the decimal number system (0–9). Letters A, B, C, D, E, and F are used to represent the last six digits of the hexadecimal number system. It is important to keep in mind that a mathematical value can be represented in any number system without changing the value itself. It simply looks different. The same mathematical operations (addition, subtraction, multiplication, division) can be performed in any number system without changing the results of the operation. Hexadecimals are used by assembly language programmers for many good reasons having to do with bits and bytes and 16 digits, but mainly because binary numbers quickly become unwieldy because of the number of digits needed to represent a number in binary. It is much easier for humans to read and manipulate hexadecimals, but it does not matter to a computer. Data can be clandestinely manipulated at the binary level. However, there are simply too many digits for the computer forensics investigator to work with, so a tool called a hex editor (such as Hex Workshop) converts the binary values to hexadecimal values and then translates hexadecimal values into its equivalent ASCII character (see Chapter 2). As you learned in the first three chapters, keyboard characters and characters not found on the keyboard are represented by a number, which is stored in the evidence data. The hex editor makes it easy for the computer forensics investigator to visualize the evidence data. There are commercially available hex editors and open-source hex editors. Hex Workshop is a commercial hex editor. The hex editor selected for a computer forensics investigation must be able to open very large evidence data files without crashing.
302
Chapter 11: Forensic Computing and How People Can Be Tracked
The hex editor must enable the user to search by sectors. Figure 11.1 shows the Hex Editor Neo.
Figure 11.1: Hex Editor Neo is a hex editor that can be used to examine the content of files.
Offset: Locating Data There can be what seems to be an endless amount of evidence data that is examined by the computer forensics investigator. Typically a small amount of data is suspicious. The computer forensics investigator must locate the suspicious data and quickly identify its location in reports and in presentations to the legal team and the court while keeping intact the evidence data. The computer forensics investigator cannot simply present suspicious evidence data. Instead, the computer forensics investigator must show how to locate the evidence data in the evidence drive. The best way to accomplished this is by using an offset. Offset is a concept that is critical to understand evidence data. Imagine a screen filled by evidence data represented in hexadecimal numbers. You found suspicious data at a particular location. The challenge is to identify the location so anyone reading your computer forensics report can find that data. A data location is identified using an offset. An offset is a measure of the “distance” from a point in a file or disk drive from another point. This is much like telling someone to drive 5.5 miles down the road to reach a house. The offset specified in a file or disk drive is in bytes, not miles. A byte
Mounting: Hiding Data
303
is eight binary digits. So the computer forensic report will say that the suspicious data is located at perhaps 1,512 bytes from the beginning of the file. Since bytes are relatively difficult to count—just too many 0s and 1s—the offset is usually noted in hexadecimal values and the hex editor is used to find the location. Once the location is found, the hex editor is used to examine the evidence data at that location. The actual hexadecimal value can be examined or the ASCII character associated with the hexadecimal value is reviewed. Both appear in the hex editor. Keep in mind that the computer forensics investigator must be prepared to prove that the hex editor accurately located the suspicious data using the offset. That is, the hex editor counted the offset from the specified location the same way as anyone could have counted the offset at the same starting point. The hex editor simply found the suspicious point a lot faster than the computer forensics investigator could by counting hexadecimal values until the offset is reached.
Mounting: Hiding Data The forensic specialist needs to make sure that they locate all the data on the computer. Some of it may be hidden even unknowingly by the user. The data on a computer is stored on the hard drive (disk drives). Drives are represented by a letter or some other identifier on the screen that is used to access the content of the drive. However, the drive identifier may refer to a logical drive rather than a physical drive. The logical drive must be made “visible” using a process called mounting. The drive may seem not to exist if the drive is not mounted—but does exist and contains data. Let’s take a closer look. A hard disk is logically divided into sections. Each section is called a partition and is listed in a partition table that the operating system reads before accessing a partition. Think of a partition as a “logical disk,” each being treated as a hard disk. Although the partitioning process creates partitions on a hard disk, partitions are not visible to the operating system and to the user until the partitions are “mounted.” A partition can be made invisible to the operating system and to the user by unmounting the partition. Data on the partition remains intact, but effectively hidden until the partition is mounted. A similar process occurs when a USB drive is inserted into a computing device. The operating system recognizes the USB drive and mounts it, making the USB drive available to the operating system and user. Each partition has a unique name. Running the fdisk command in the Windows command prompt displays all storage devices including partitions. Recall that you can find the Windows command prompt by simply clicking on the search magnifying glass on the Windows command bar at the bottom of your Windows 10 screen and start typing in “command prompt” and clicking on Command Prompt. Entering the umount command at the command prompt, followed by the name of the partition
304
Chapter 11: Forensic Computing and How People Can Be Tracked
causes the partition to unmount and become invisible. Similarly, the USB has a partition name and can be made invisible to the operating system by using unmount. The mount command, however, is used to make the partition (USB) visible. A computer forensics investigator needs to control partitions to ensure the integrity of the evidence data on an evidence drive.
Bit Shifting Data is encoded in a set of binary digits, each referred to as a bit. Bits are logically grouped into bytes (8 bits) or larger groupings. Collectively the group’s value represents data. The ASCII value is a widely used example of how groups of bits form meaningful information. Programs that read a file start with the first bit, logically break bits into groups, read the group’s value, and then do something such as displaying a character on the screen. The presumption is that data begins with the first bit of the file. A suspect hiding data may have devised a program to begin to store data several bits from the beginning of the file. This is referred to as bit shifting. For example, data begins with the third bit from the beginning of the file rather than the first bit. Reading from the first bit is misleading and probably produces unreadable data. The computer forensics investigator can use tools in the computer forensics workstation to designate the bit position to begin reading data. By shifting the bit location, the computer forensics investigator is able to locate the hidden data on the evidence drive. Figure 11.2 shows how shifting a bit one position changes the character from a percentage to capital J.
Figure 11.2: Shifting reading a bit to the right changes the value from a % to a J.
Bit Flipping Another technique that may be used to mislead a computer forensics investigation is to change the bit setting. A bit is set to either 0 or 1. A program interprets the bit setting of each bit in the logical group as data, such as a character on the keyboard. The presumption is that the settings actually represent data. However, reversing the setting of the bit—changing 0 to 1 and 1 to 0—hides the data. Here’s how this works. A program such as Word writes data to a file. The file
Mounting: Hiding Data
305
is then changed by another program reversing the bit settings. Strange characters appear on the screen when Word reads the file. It appears this is not a Word file, or the file is corrupted and unreadable. The same program that reversed the bit settings is used to reverse the bit settings again, restoring the file to its original bit value. Word can read the file and display the data as intended. The computer forensics investigator must consider that the bit flip technique was used if what seems to be readable data is unreadable. Figure 11.3 shows how flipping bits for the capital letter J results in a strange symbol when the flipped bits are displayed.
Figure 11.3: A program can change 0s to 1s and 1s to 0s; in either case, bit flipping results in incorrect data to be displayed on the screen.
Live Data Acquisition Data stored on an evidence drive is permanent. However, data can be placed in volatile storage such as random access memory, where data is lost once power is removed from the computing device. Therefore, the computer forensics investigator must perform a live acquisition of evidence data without causing the data to be lost. Live data acquisition is also used to minimize the effect encrypted evidence data has on the computer forensics investigation. Encryption is a technique used to transform data into an unreadable form. A cipher is used to decipher encrypted data into a readable form. Deciphered data is typically stored in memory immediately before a program reads the data. Therefore, the readable data is usually data that is stored in memory. Accessing live data in memory circumvents the issue of encrypted data. The FTK Image tool is used to acquire live data from memory. It is critical that the evidence computing device remains under power and connected to the forensics workstation that is running the FTK Image tool. The FTK Image tool copies the content of the evidence computing device’s memory and stores the contents in a file on the forensics workstation. This process is referred to as a memory dump. The memory dump file can be opened in the forensics workstation using a variety of programs including the hex editor. The computer forensics investigator can scroll through the contents or use a search feature of the program to locate a specific character pattern. Also critical to making a live acquisition is identifying all processes that were running when the live data was acquired. Forensics workstation tools are available to copy the list of processes that are running. There are programs running on a computer that run in the background without you knowing. These are referred to as pro-
306
Chapter 11: Forensic Computing and How People Can Be Tracked
cesses. Some processes are run by the operating system or other applications. Other processors are run clandestinely. The list of processes that are running gives the computer forensics investigator a glimpse of what is going on when the computer device is working. You can test this out yourself on a PC. Here’s how to do it. 1. Press Ctrl-Alt-Delete keys simultaneously. 2. Select Start Task Manager. 3. Select Processes to see a list of programs running on the computer.
Remote Acquisition Remote acquisition is the technique of capturing evidence data without physically being in possession of the evidence computing device. There are a number of techniques that can be used for this purpose. However, each requires the evidence computing device and the computer forensics workstation to be on the same network. One remote acquisition technique is to access the evidence computing device using the device’s Internet Protocol (IP) address. The computer forensics investigator might be able to use the IP address to remotely log on to the evidence computing device. This is a common technique during investigations where the computing device is owned by the organization that authorized the investigation. The organization has the legal right to access its computing devices without notifying the user of the device. However, this technique is challenging to use if the target of the investigation owns the computing device, because the target must authorize access except when the courts grant access. Another remote acquisition technique is to intercept data en route to the evidence computing device. Data is transmitted in the form of electronic envelopes called a packet. Each packet has the IP address of the sending and destination computing device. Packets travel from the sending computing device over a data network through a series computing devices called a router or switcher that directs the packet to the destination computing device. Forensics software directs packets for the target destination computing device to the forensic workstation and sends them to the target computing device. Remote acquisition of evidence data requires a reliable network access. Any disruption in data acquisition might question the authentication of all data acquired during that session. Remember that the computer forensics investigator must prove that acquired evidence data was not modified or corrupted in any way. Questionable evidence data is likely to be considered invalid and not admissible in the legal action.
Anti-Forensics Tools
307
Deleted Data Data deleted from a computing device may not be removed from the device. On many computing devices, the user can place a file in the trash and then go into the trash and recover the file intact. The user can also empty the trash, presumably deleting the file from the computing device and making it unlikely that the user can retrieve the file. Deleting a file from a computing device makes the space available to the operating system to be used to save newly saved data. The file remains relatively intact until a new file is saved, at which time a piece or the whole file might be overwritten, truly making it impossible to recover the file. This occurs when all space that contains pieces of the file is replaced with new data. A file is stored in pieces, with each piece stored in a portion of the disk called a sector. At the end of each sector is data that tells where to find the sector that contains the next piece of the file. The last sector contains data that indicates it is the end of the file. The operating system retrieves each piece of the file, recreating the entire file. Not all sectors are overwritten when a new file is saved to the device. Some sectors remain and contain pieces of the deleted file. Computer forensics workstations have tools that enable the computer forensics investigator to locate sectors containing a deleted file and then reassemble the file from available sectors.
Anti-Forensics Tools Anti-forensics tools are programs that make it difficult for the computer forensics investigator to retrieve and analyze evidence data. Anti-forensics tools enable users to store data in unsuspected areas of the device, leading the computer forensics investigator to presume no evidential data exists on the device. A common technique is to change information in file headers. A file header is a piece of a file that tells the computing device the kind of file that is attached to the file header. Anti-forensics tools enable the user to change information in the file header in an attempt to mislead the computer forensics workstation into believing that the file does not contain evidence data. Let’s say that the evidence drive is being searched for compromising videos. The computer forensics workstation searches the evidence drive for video files. The computer forensics workstation ignores the file extension since file extensions can easily be modified. Instead, file headers are examined. The file header contains data that identifies the file as a video file. However, an anti-forensics tool can be used to change the data, indicating that the file contains something other than a video. Another technique used by anti-forensics tools is to store the evidence data in unlikely spaces on the evidence drive. A file is divided into pieces, with each piece stored in a section of the sector. A sector is a fixed size. Typically, the last piece of a file doesn’t completely fill the last sector. There is space available, referred to as
308
Chapter 11: Forensic Computing and How People Can Be Tracked
slack space. An anti-forensics tool can use slack space to store pieces of the evidence file, making it extremely challenging to recover the evidence file without knowing the location of the slack places. Although a computer forensics investigation tends to focus on evidence data files, the focus of the investigation can be searching for an executable file. An executable file contains instructions for the operating system to perform tasks. An executable file is commonly referred to as a program or an application. Anti-forensics tools can hide an executable file from both the operating system and computer forensics workstations by either packing the executable file into an existing executable file using a tool called a packer or combining the executable file with an existing executable file using a tool called a binder. In both cases, only the existing executable file is recognized by the computer forensics workstation and the operating system. Changing metadata that is associated with a file is another technique used to counter the computer forensics investigation. Metadata is data that describes data. The file name, file extension, date/time the file was changed, size of the file, and access rights to the file are metadata. Access rights designate if the file is read-only or can be changed. An anti-forensics tool can be used to change the date/time when the file was changed, for example, which can misdirect a search or mislead an analysis of the data contained in the file. The computer forensics investigator may be looking for a file that was changed on a particular date/time. Changing the metadata timestamp on the file may not raise suspicion. Still other anti-forensics tools monitor activity on the evidence drive for signs of a computer forensics investigation. Once detected, files are automatically deleted and the evidence drive is reformatted, making it nearly impossible to recover any files on the evidence drive. Even if files can be retrieved, the computer forensics investigator must prove the integrity of evidence data, knowing that the anti-forensics tool attempted to make the data unreadable. Data encryption is the best way to hide data from a computer forensics investigation. Bits that comprise data are scrambled using an encryption key following an encryption algorithm. Computer forensics investigators will find it challenging to decipher the encrypted file without knowing the decipher key.
Cell Phones A cell phone is a computing device that stores information similarly to other computing devices. Today, cells phones contain a variety of information that might be of interest to a computer forensics investigation, including text messages, images, contact lists, location history, emails, and purchasing and banking information.
Cell Phones
309
Computer forensics workstations are capable of extracting information from a cell phone, including information that the user deleted and tried to remove by resetting the cell phone to factory settings. GPS location is usually of particular interest to computer forensics investigators because the data records the exact location of the cell phone at a particular date and time. Data gathered from a cell phone is usually supported by data provided by the cell phone service provider. A record of every cell phone activity is recorded by the cell phone service provider in an electronic log. Each entry in the log uniquely identifies the cell phone and the cell tower that processed the transmission. A cell tower is a radio transceiver that has a specific reception area. The signals from cell phones are received by the cell tower and then forwarded using a landline to the cell phone service provider’s technical center to complete the transmission. Law enforcement officials with permission from the courts can track a cell phone live by monitoring activities of a specific cell phone tower. Even without knowing the cell phone’s GPS position, transmissions can be monitored by cell phone towers and can be used to find the relative position of the cell phone.
Appendix A Information Technology Auditing An application is more than a tool used to conduct business. An application is also an enforcer that makes sure the rules of how to conduct business are strictly followed by everyone within the organization. You see this after logging onto the application when your activities are restricted to your role in the organization. Let’s say you’re a sales representative. You’ll be able to see information about only your accounts. You’ll be able to enter an order but not may not be able to change the price of items purchased. Setting prices may be the job of the marketing manager. Limiting an employee’s access to only portions of an application they need to conduct business is referred to as a control—it controls what the employee can and cannot do. The need for a control and the design of a control is identified during the design of the application and is based on good business practices. Programmers then incorporate controls into the program code. IT auditors are employees who verify that controls are in place and working as designed. IT auditors perform a critical role in every organization because they verify that applications comply with the organization’s policies and procedures—rules that tell staff how to go about achieving the organization’s objectives. Think for a moment. Executives report business results based on information generated by an application. An application tells executives the inventory status, account balances, and profits and losses. The overriding question is: How do executives know these figures are correct? IT auditors examine applications—and other technology used by the organization—to determine if information generated by each application truly represents the financial status of the organization. Some think of IT auditors as teachers who review your homework assignment, making sure you followed all the rules to complete the assignment. Others see IT auditors as representatives of the board of directors who confirm that business activities conducted by applications and related technology are working according to plan and can be trusted. There’s a trust but verify philosophy in business. That is, they trust that the application appropriately implements the business practice but IT auditors need to verify that the implementation is appropriate. The board of directors and executives trust that applications and technology are working as planned, and the IT auditor verifies this is true.
The Information Technology Audit Process The information technology audit follows a structured process that begins with a charter or engagement letter. Both a charter and engagement letter are similar in that they are written authorization to conduct the audit. They differ in that a charter is an DOI 10.1515/9781547400812- 012
312
Appendix A: Information Technology Auditing
authorization for internal auditors to conduct the audit while an engagement letter authorizes an outside auditor to conduct the audit. An internal auditor is an employee of the organization. An outside auditor is an auditing firm, which is a vendor to the organization. Besides specifying terms of the audit, the engagement letter also specifies compensation. A team of auditors is assembled once the charter/engagement letter is executed. The information technology auditing team is comprised of a team leader and various team members, many of whom specialize in an area of technology. The team leader and a core complement of auditors remain throughout the audit; other team members join the team to audit particular technology and then move on to other auditing projects. The initial step is for the team to pre-plan the audit by identifying the scope of the audit; preliminary tasks to conduct the audit; and the resources necessary to conduct the audit. The objective is to organize the team. Next, the team performs a risk assessment. A risk assessment is a preliminary higher-level overview of the effort required to conduct the audit within the time frame set forth in the charter/engagement letter. It is here where processes, applications, and related technologies are reviewed to determine potential exposure for the organization. The risk assessment provides sufficient information to determine the feasibility of conducting the audit. Not all audits can be conducted within the terms of the charter/engagement letter. For example, the organization may have operations located throughout the world and want the audit to be completed within a month. Logistics make it impractical to meet these terms. The audit is impossible to conduct without revising the charter/engagement letter. Once it is feasible to conduct the audit, the team leader develops an auditing plan. An auditing plan is much like a project plan in that both contain tasks and subtasks listed in priority order with related dependencies. Each task has an estimated duration that ensures the audit is completed on a deadline. Resources are assigned to tasks and the audit begins. Each team member focuses on a process and related applications. The auditor gathers information called “evidence” on how the process is actually performed by staff. Both manual and automated elements of the processes are examined. Each step is verified for compliance with policies and operational procedures. Evidence proves compliance or non-compliance of each step. Auditors also perform audit tests to validate that evidence represents actual performance. For example, an auditor may create a set of orders that represent all possible types of orders. Under the auditor’s observation, staff will process the order set. Results are then gathered as evidence. Once all observations, testing, and evidence gathering are completed, the auditing team analyzes the results and determines elements that are in compliance or non-compliance and develop recommendations. The audit concludes with an audit report that describes the scope of the audit, the audit plan, the execution of the audit plan, how evidence was gathered, testing
Auditing Technology
313
that was conducted, and a summary of the audit. The most important section of the audit report is the opinion. It is here where the audit teams identify elements that were non-compliant and suggest ways to bring those elements into compliance. The audit team then moves on to another audit without concerns if recommendations are accepted or not.
Auditing Technology The information technology auditor assesses the status of computing devices and applications within an organization to determine if they adhere to standards, policies, and procedures. Auditors begin by reviewing the maturity of each process, each receiving a score based on the generic maturity model. Many processes are either partially or fully computerized. The generic maturity model defines a process based on six grades. These are: 0 grade: This grade is given to a process that completely lacks any recognizable characteristics. It is a process because an activity produces results. However, the activity isn’t defined. The activity is usually performed by the same person and is referred to as a non-existent process. 1 grade: This grade is given to a process that lacks structure but the staff sees the need to provide structure to the process. This process is usually performed by the same person and is referred to as an initial process. 2 grade: This grade is given to a process that is repeatable by different people; however, there is no formal definition of the process. There are no formal instructions on how to perform the process. For example, updating a spreadsheet with data is such a process. 3 grade: This grade is given to a process that meets grade 2 and has a well-defined and documented standard set of activities. Staff undergoes training on how to perform the process. 4 grade: This grade is given to a process that meets grade 3 and is managed. The process is monitored and measured for compliance. An alert is sent when the process fails. 5 grade: This grade is given to a process that meets grade 4 requirements and adheres to best practices and is continually improved.
314
Appendix A: Information Technology Auditing
Processes below a grade 3 are considered risky for the organization because they expose the organization to unnecessary liability. For example, financial data for official reports are manually entered into a spreadsheet and then the spreadsheet is used in presentations and in financial records. Although the original financial data was generated by the organization’s computerized accounting system, the actual data used to report the status of the organization is entered manually into the spreadsheet. Data can easily be inadvertently modified from data controlled by the accounting system.
Controls The best way to enforce standards, policies, and procedures is to embed controls in a process. A control is way of monitoring and measuring steps in a process to ensure that rules are being followed. For example, an employee’s ID badge must be valid in order to open the door to the data center. This prevents (to some extent) unauthorized personnel from entering the data center. The information technology auditor assesses all processes to determine if controls are embedded in the process. Although many controls within the organization exist in applications, controls are also found in other aspects of the organization. These include: Infrastructure control: This controls access to the network cabling, electrical devices (backup generators, utility closets), and other areas critical to the operation of the organization. Environmental control: This includes temperature and humidity throughout the facility; operation of generators, uninterruptible power supplies; smoke detectors; fire suppression system; and protection from food and water seepages. Physical access control: This includes access to locations, issuance of identification badges, operation of biometric access devices, and visitor control. Application controls: This includes authorized access to features of an application; validating the accuracy of information generated by the application; embedded controls to ensure rules are enforced by the application; and segregation of duties where certain sensitive activities require multiple staff members to perform each step of the activity independently of other staff members. General controls: This includes controls on developing applications; controls on how changes to applications are implemented; and controls over the use and maintenance of computing devices.
COBIT
315
COBIT The Information Systems Audit and Control Association (ISACA) is an international professional association that defines good practice framework for information technology management and information technology governance. Governance is the process of setting rules for information technology. ISACA’s framework is called Control Objectives for Information and Related Technology (COBIT). COBIT process controls are the minimum requirements for effective control of each information technology process. Each requirement has a control objective that provides a reasonable assurance the business objectives will be achieved and undesired events will be detected, prevented, or corrected. COBIT divides control objectives into six general categories. Each category is then further divided into sections that specifies recommended controls for each section. The six general categories are: PC1 Process owner: Each process must have an owner. PC2 Repeatability: Each process must be well defined so the process can be repeated, producing the same result each time. PC3 Goals and objectives: Each process must have clearly defined goals and objectives. PC4 Roles and responsibilities: Each process must have unambiguous roles, activities, and responsibilities defined. PC5 Process performance: The performance of each process must be measurable against the defined goals and objectives. PC6 Policy, plans, and procedures: Policies, plans, or procedures that drive the process must be documented, reviewed, kept current, signed off on, and communicated to all stakeholders involved in the process.
The Audit Charter The audit charter is authorization from directors of the organization to perform the information technology audit. Think of the audit charter as a contract between the auditing team and directors. The audit cannot be conducted without the audit charter. The authorization practically overrules the rights of managers to restrict the inquiry. For example, a manager cannot prohibit an auditor access to a process, staff, equipment, or area of the organization during the audit as long as access is required to
316
Appendix A: Information Technology Auditing
comply with the audit charter. It is as if the directors of the organization requested access. The audit charter clearly states: Scope and objectives: The scope and objectives section of the audit defines the extent and purpose of the investigation. For example, the audit may set out to verify that the order entry system is generating accurate information. The scope of the audit is limited to the order entry system and related activities. The purpose is to validate data produced by the system. Limitations: The limitation section states the restrictions within which the audit is conducted. For example, a limitation might be that auditors cannot interrupt normal business operations, which means that test orders can only be entered during off business hours. Limitations typically include resources, expenses, and the timeframe. Authority: The authority section specifies rights granted by the directors to the auditors in order to perform the audit within the scope and objectives of the audit. This section also includes protocols to follow when accessing a process and staff and procedures to resolve any conflicts that may arise during the audit. For example, the auditors may be required to give a manager reasonable notice before visiting the business unit. If the manager objects, then the leader of the audit team will try to accommodate the manager; however, directors become involved if the manager doesn’t agree to reasonable accommodations. Methodology: The methodology section broadly defines auditing techniques that will be used in the audit. Timeline: The timeline section specifies the timeframe within which the audit is conducted. It will indicate a start date and the date when the audit report is submitted to the directors of the organization. Audit report: The audit report section defines the information sought by the directors that is in addition to typical auditing results.
The Audit Committee The audit charter is formally issued by the organization’s audit committee. The audit committee is a subset of the board of directors, usually consisting of board members who are not employees of the organization, referred to as outside directors. The audit committee is responsible for ensuring that the financial records of the organization
Preplanning the Audit
317
reflect the status of the organization and that management is adhering to the organization’s policies and procedures. The audit committee oversees the internal auditing staff, which is led by the organization’s internal auditor. In addition to internal auditing operations, the audit committee engages an organization to audit the records and operations of the organization, referred to as the outside auditor. Both the internal and outside auditors apply standard auditing practices to verify that the organization’s records are accurate and that the organization’s policies and procedures adhere to standards of practice. Standards are developed by standards bodies. For example, the Financial Accounting Standards Board (FASB) developed the generally accepted accounting principles (GAAP) as its standard for accounting. Auditors compare the organization’s policies and procedures to GAAP and compare policies and procedures to practices within the organization. There are many standards used within the organization—one for each discipline. Auditors focus on verifying the organization’s financial records. In doing so, auditors also examine all the processes involved in the organization that influence the financial records. Many of those processes are directly or indirectly influenced by technology, which is why the audit committee requires that auditors perform an information technology audit.
Preplanning the Audit The preplanning process is when the audit team prepares for the audit. The audit team leader gathers information about the organization and the industry, such as industry regulations and operating standards unique to the industry. The goal is to prepare the foundation for planning the actual audit. Here are factors considered when preplanning the audit: Impact on operations: There is a natural flow to business operations, referred to as the business cycle. Understanding the business cycle can minimize the impact the audit has on operations. For example, auditors avoid conducting an audit on a retail organization during holiday seasons because staff is focused on the holiday business. The audit would interfere in business operations if conducted during that period. Reporting cycle: A reporting cycle is the natural flow of when the organization reports operational results, which typically reflects the business cycle. For example, an organization may define a fiscal year reporting cycle rather than a calendar year. A fiscal year reporting cycle defines the months in the year when the organization begins and ends, recording operational performance. This reflects the business cycle.
318
Appendix A: Information Technology Auditing
Critical business process: A critical business process is a process that is necessary to the survival of the organization, sometimes known as a mission-critical system. Identifying critical business processes during the preplanning stage of the audit helps to prioritize auditing activities. Critical business processes are audited first. Results of prior audits: An audit report describes the scope of the audit, methodologies used to conduct the audit, limitations of the audit, and, most important, the opinion. The opinion attests to the compliance with accepted business practices. Any deviations from accepted business practices is usually stated in the audit results. This is referred to as a qualified result that identifies weaknesses and suggests on how to strengthen weaknesses. Previous audit reports provide a guide for the current audit. More important, previous audit reports highlight weaknesses. The current audit determines if the organization has strengthened weaknesses that were reported in previous audit reports. Logistics: Auditors trust but verify that processes are implemented as planned and work according to design. This might require that auditors visit the organization’s locations. Preplanning identifies locations and what within each location will be audited. Interviews: Auditors will interview staff at all ranks to help piece together steps in a process and determine if steps are executed according to approved policies and procedures. Preplanning identifies staff by title and/or name based on the role in the process and assesses the availability of the staff to be interviewed. Objectives and strategies: The auditing team examines if the organization’s objectives and strategies defined at a high level flow through tactical objectives and tactical strategies at the operational level. Tactical objectives and strategies define how the staff carries out higher-level objectives and strategies. Auditors will compare the information technology objectives and strategies to that of the organization. Preplanning identifies the organization’s objectives and strategies and that of each operating unit.
Audit Restrictions An audit is generally authorized to verify that one or more policies and procedures of an organization are being followed at all levels of the organization. However, the audit committee can set limitations that restrict the auditing process. It is critical that restrictions be identified and clarified during the preplanning process. Restrictions are reflected in the audit plan and may actually prevent the audit itself.
The Audit Planning
319
Restrictions might limit access to evidence. For example, a secure area of the data center may be deemed off-limits to auditors. In doing so, auditors are unable to verify that policies and procedures are being followed within the secured area. Another restriction is the available resources both on the audit team and staff at all ranks. A relatively small audit team, for example, is unlikely to perform a thorough audit if the scope of the audit is broad, such as traveling worldwide to assess global operations with the audit deadline. Likewise, operational staff may become unavailable by schedule or by design to avoid speaking with auditors. And the deadline for the audit may be impractical. Restrictions may be too severe and result in an ineffective audit that could lead the audit team leader to refuse to conduct the audit. Alternatively, the audit may proceed, but restrictions are reflected in the auditor’s opinion. The auditor’s opinion can be unqualified (complies with standards); qualified (complies with most standards with some exceptions); adverse (does not comply with standards); or a disclaimer (no opinion). Too many restrictions that impede but don’t prevent the audit are reported with no opinion. (See “The Audit Report and Stakeholders” later in this chapter.) If the audit team leader determines that restrictions imposed by the audit committee make it impossible to conduct the audit, then these concerns are presented to the audit committee. The audit does not proceed. An audit is an independent assessment. If it can’t be performed according to auditing standards, then there is no audit.
The Audit Planning The audit plan is a project plan. A project plan contains the details of tasks and subtasks that need to be performed to complete the audit. The team leader develops the plan based on preplanning and the audit charter/engagement letter and begins planning using the work breakdown structure. The work breakdown structure is a planning technique that focuses on the desired outcome and then divides the outcome into subcomponents that eventually lead to a work package. A work package is a segment of the outcome that becomes the focus of the work—a piece of the audit. For example, an external auditing team is engaged by the organization to determine if the organization’s books and records reflect standards, policies, and procedures. The audit team leader starts planning using the outcome—the audit report— and then breaks down the work necessary to produce the audit report. One work package is to conduct the information technology audit. The information technology audit is a component of the full audit that also includes financial audits.
320
Appendix A: Information Technology Auditing
Tasks and Subtasks An activity within a work package is referred to as a task. A task has a beginning and an end and delivers a result. The work package is complete when all tasks of the work package are completed. A task itself may have activities. These are referred to as subtasks. A subtask also has a beginning and an end and delivers a result. A task that has subtasks is completed when all its subtasks are completed. The name of the task/subtask must reflect the deliverable. For example, verifying compliance with password requirements might be a task. The result of the task is that passwords are compliant or non-compliant. You can probably imagine that there are many subtasks associated with this task. Subtasks verify the process of creating a password; changing passwords; tracking password usage; and identifying and intervening in password violations. Assessing each aspect of standards, policies, and procedures that are related to passwords becomes a subtask. Furthermore, the audit team needs to assess all types of passwords in all areas of the organization in order to complete the task. The audit team leader identifies all tasks and subtasks required to complete the work package (i.e., information technology audit). At times it can be challenging to identify tasks and subtasks because the activity must meet the definition—it must have a beginning and an end and deliver a result. Activities that don’t meet the definition are not a task/subtask and do not become part of the audit plan. For example, an auditor may be required to meet with the organization’s chief information officer (CIO). This seems like a task, but it isn’t. Although the meeting has a beginning and an end, there is no result. In contrast, meeting with the CIO to identify staff that will assess the audit is a task because the result of the activity is to deliver a list of staff. Without specifying the result (list of staff), the activity is not a task.
Duration The audit team leader sets a duration for each task/subtask. Duration is the length of time necessary to complete the task/subtask. Let’s say that the task is to assess security at the data center. Duration includes travel time to the site; an assessment of the site; and preparing and delivering the result of the assessment—not only the time spent assessing the site. Duration is an estimate based on experience (how long it took in previous audits) and factors unique to the current audit, such as location, weather, and available transportation. It is common for the audit team leader to average duration for similar tasks and adjust the average to reflect factors unique to the current audit. The result is the team leader’s best estimate.
Tasks and Subtasks
321
Setting realistic durations for each task/subtask is critical to the success of the audit because duration at the task/subtask level is summed to create duration for the audit itself. Duration of task/subtasks provide the foundation to justify the duration of the audit. The team leader refers to task/subtasks to explain to the audit committee the time that is necessary to complete the audit.
Dependencies In the ideal world, all tasks/subtasks are performed simultaneously, but this is not reality. While some tasks/subtasks can be performed at the same time, many tasks can’t begin until one or more tasks are completed. For example, the task to identify standards, policies, and procedures related to passwords must be completed because the auditor needs to assess if the operating units are compliant. The audit team leader identifies dependencies among tasks/subtasks that are usually depicted in a Gantt chart. A Gantt chart is a graphic illustration of tasks/ subtasks along a timeline where a rectangle representing the beginning and end of each task is placed in the appropriate position on the timeline. The team leader and members of the audit team can track the audit. Dependencies are represented as lines that connect dependent tasks/subtasks on the Gantt chart, showing when to estimate the completion of one task/subtask and when the dependent task/subtask can begin. Any change in duration of a task/ subtask affects the duration of dependent tasks/subtasks. For example, an increase in duration by two weeks to gather standards, policies, and procedures related to passwords will move the starting of the task/subtask to assess passwords.
Figure A.1: A Gantt Chart
The Gantt chart illustrates tasks and dependencies related to activities of the audit.
322
Appendix A: Information Technology Auditing
Critical Path Dependencies help to develop the critical path for the audit. The critical path consists of tasks whose duration affects the duration of the audit. For example, the task that identifies standards, policies, and procedures related to passwords is probably on the critical path because it’s complete, determining when assessing passwords can begin. Therefore, identifying those standards, policies, and procedures directly influences duration of the audit. The audit team leader plans and monitors the execution of tasks on the critical path carefully to ensure that the audit stays on course. Increasing the duration for a task on the critical path will increase duration of the audit. Decreasing duration of a task on the critical path may not decrease duration of the audit. It may remove that task from the critical path since another task not dependent on it is placed on the critical path. During the audit, it is common for tasks to move on and off the critical path as duration estimates change based on audit activities. The critical path is used by the audit team leader to manage the audit. Decisions are made based on the tasks positioned on or off the critical path. For example, an auditor may request time off. Tasks assigned to the auditor are reviewed to determine if any tasks are on the critical path. If so, then the request may be denied because the auditor’s absence would delay the completion of the task.
Resources A resource is a person or thing that is necessary to perform the audit. Human resources are identified by title and job description. The job description specifies job skills. Non-human resources are identified by a name and description. For example, a laptop of a specific configuration is a name and description. The audit team leader identifies the number and type of resource required to perform each task in the audit. Initially the team leader focuses on each task without regard to other tasks. Let’s say there are three tasks—each task requires an auditor. Their preliminary requirement is to open three auditor positions. A position is an actual person. Each position has a corresponding title and job description. The three auditor positions have the same title and job description: an auditor. Non-human resources are also identified similarly. Once resources are identified for all tasks, the audit team leader determines the number of positions to open based on when each task is performed. If each task is performed sequentially, then one auditor position is opened since the auditor can perform each task. However, if three tasks are performed simultaneously, then three auditor positions are open in order to keep the audit on schedule. A similar approach is taken for non-human resources. Let’s say each auditor requires a laptop. Initially, the three tasks each require a laptop. If tasks are performed
Tasks and Subtasks
323
sequentially, then there is one auditor and one laptop. Otherwise, three auditors and three laptops are necessary. Once identified by the audit team leader, each actual resource is listed on the resource list. The resource list is a roster of the audit team and items needed to complete the audit. The audit team leader assigns resources from the resource list to tasks. If the resource is not on the resource list, then it isn’t available to be assigned to a task.
Resource Cost There is a cost associated with each resource. A resource can have a fixed cost, which is a one-time expense, such as the purchase of a laptop. A resource can have a variable cost, which is an expense based on usage, such as a daily charge for an auditor. The cost of the resource is entered into the resource list. The audit team leader considers the resource cost when assigning the resource to a task. Ideally, the lowest cost resource is assigned to a task. Don’t confuse the cost of the resource to the organization with the cost of the resource to the audit. For example, an internal auditor is a full-time employee who is paid an annual salary. When the internal auditor is assigned to the audit, all expenses associated with the internal auditor are allocated to the audit. This includes salary, benefits, and employer taxes—all expenses. Annual expenses may be prorated per workday, creating a variable cost. If the internal auditor works three days on the audit, then the audit is charge three days of the cost of the internal auditor. Non-human resources are usually a fixed cost to the organization, such as the acquisition of a laptop. However, there are usually annual ancillary costs, such as maintenance, the help desk, software licenses, and similar resources needed to keep the laptop (non-human resource) operational. The fixed cost and the ancillary costs are prorated and summed into a variable cost (i.e., monthly) that is charged to the audit.
Cost of the Audit As the audit team leader assigns resources to a task/subtask, the cost of those resources is also allocated to the task/subtask. The cost of the task reflects the duration of the task. Let’s say an auditor has a $500 cost and the duration of the task is ten workdays, then the cost of the auditor is $5,000 to perform that task. Keep in mind that the auditor receives his or her regular pay, not the $500 per day charge. The charge reflects benefits and other expenses the organization incurs to employ the auditor. The cost for all tasks is tallied to arrive at the cost of the audit. If the cost is too high after the preliminary audit plan is created, then the audit team leader reviews
324
Appendix A: Information Technology Auditing
each task/subtask to determine if adjustments can be made to the audit plan. Adjustments are made to the task/subtask, not the cost. Let’s say the audit costs $100,000 and the organization is willing to pay $75,000 for the audit. The audit team leader doesn’t simply reduce the cost to $75,000. Instead, each task/subtask is examined to see if activities can be cut or modified. Adjustments are made at the task level, which leads to a reduction in resources, which in turn lowers the cost of performing the task. For example, the audit team leader may find a way to reduce the duration of the task from ten workdays to five workdays. The result is that the auditor is not needed for five days for that task and that the cost of the task—at least for the auditor—is reduced from $5,000 to $2,500. In this way, the audit team leader knows exactly where cost reductions will occur.
Responsibilities of the Auditor and Auditee There are two partners in an audit. These are the auditee and the auditor. The auditee is the person who undergoes the audit and the auditor is the person who is conducting the audit. Each has responsibilities during the audit. Although it is said that the auditee is the person being audited, the reality is the auditee is an area of the organization that is the subject of the audit. Mention the word “audit” to a manager—or anyone in the organization—and it conjures images of the government reviewing your tax filings and the fear of repercussions if there are errors. The purpose of an audit is not to catch mistakes that result in punishment. The audit is designed to verify that operations are adhering to standards, policies, and procedures. Any deviation is not looked upon as an error, as much as an opportunity for the organization to adjust operations to conform to requirements. The auditee responsibilities during an audit include: Identifying critical success factors (CSF): A critical success factor is an activity that is measured to determine if the operation is successful. For example, the response time for answering calls to the help desk is a critical success factor. The lower the response time, the more successful the performance of the help desk. The auditee identifies critical success factors for its area of the organization. The auditor then uses that critical success factor during the audit. Identifying personnel roles and responsibilities: The auditor does not know whom the staff members are in the area of operations being audited. The auditee (usually the manager of the area) provides the auditor with a list of staff with their roles and responsibilities and a briefing on their backgrounds. The auditor should be introduced to staff and the staff will be asked to cooperate with the auditor’s requests.
Responsibilities of the Auditor and Auditee
325
Providing access: The auditee remains responsible for the area being audited. The manager of the area grants the auditor access to staff, locations, systems, and all information relevant to the audit. The auditor reports to the audit team leader if the manager fails to grant access. Notification is made to the auditing committee, who is likely to direct executives to speak with the manager. Cooperating with evidence gathering: An auditor examines processes by gathering information and monitoring operations. This is referred to as evidence. Although the term “evidence” brings to mind images of a criminal investigation, an audit is not a criminal investigation. Evidence is a fact, such as a user being seen looking under the computer keyboard and then entering the password into the computer application. Notice that no conclusion was made. The auditor did not say that the user’s password was under the keyboard. This might be true but the auditor didn’t see it. The auditee must help the auditor collect evidence. If the auditor asked the user to look under the keyboard, the user must allow the auditor to do so. Accessing prior audits: Everything in an audit is documented and becomes part of the official record of the audit. This includes preplanning, planning, work papers of auditors, minutes of meetings, drafts of the audit report, and the official audit report. All of these are saved and are either in control of the audit committee, internal auditors, or external auditors. The auditee (audit committee) is responsible for giving the audit team access to that information because information about the previous audit provides the foundation for the current audit. Organization chart: The organization chart shows how the organization is organized. It contains operational groups, responsibilities, staff structure, and reporting lines. The organization chart provides a roadmap, assisting the auditor with navigating the organization. The organization’s leaders are responsible for providing auditors with the organization chart. Identifying controls: A control is an element in a process that ensures that standards, policies, and procedures of the organization are being followed. For example, the automatic process requiring users to change passwords every quarter is a control that is built into the computer’s operating system. The auditee is responsible for identifying each control and describing the effectiveness of the control. That is, requiring users to change their password is a strong control. Preventing users from storing the password under their keyboard is a weak control. Users are warned not to follow this practice but nothing prevents them from doing so. The auditor’s responsibilities complement those of the responsibilities of the auditee. These include:
326
Appendix A: Information Technology Auditing
Identifying audit objectives: Based on the charter or engagement letter, the audit team leader clearly identifies the objectives of the audit. For example, some audit committees request a full audit to assess if the information technology effort adheres to standards, policies, and procedures. Other times, requests are narrower, such as to audit cybersecurity procedures. Developing an audit plan: The audit plan specifically defines how the audit team will achieve the audit goal. The audit plan has timelines, scheduled events, and resources that will be used for the audit. The audit plan also reflects the scope of the audit and any restrictions imposed by the audit committee. Identifying audit procedures: An audit procedure is the methodology used by the audit team to conduct the audit. For example, auditors might visit all of the organization’s locations and review all evidence of an operation. Alternatively, auditors may decide to use an acceptable sampling method that reduces the effort but delivers an acceptable result. Identifying the risk-based audit strategy: Risk-based auditing is a strategy used to focus on the more risky areas of the organization. For example, the auditing team might focus on elements of operational areas that can negatively affect the sustainability of the organization if it should fail. These include processes that authorize payments, inventory control, and information technology security. The audit team leader develops a strategy to audit risky areas of the organization. Audit cost estimating: Once the audit plan is developed, the audit team leader estimates the cost of performing the audit. The audit committee uses the estimate to decide if the audit should proceed or if the scope of the audit needs to change in order to remain financially feasible. Audit documenting: The auditing team is responsible for documenting every element of the audit and linking the auditing effort to each of the audit objectives. The final audit report must identify each audit objective and state if it is compliant with standards, policies, and procedures in the audit report opinion. Deviations are noted and auditors should recommend remediation.
Audit Risk Assessment An audit risk assessment looks at the likelihood that auditors can acquire sufficient evidence during the audit to reach an opinion on whether or not the element of the operation complies with standards, policies, and procedures. Think of an auditor’s job as determining what is in a jar, but the auditor can’t see in the jar. The label on
Audit Risk Assessment
327
the jar describes which should be in the jar (standards, policies, and procedures). The auditor now has to conduct the audit to determine if the contents correspond to the contents on the label. To do this, the auditor sticks their hand into the jar and grabs one item (evidence) and then describes the item in their notes (work sheet). These steps are repeated until the auditor has enough to conclude that the sample is representative of the entire content of the jar. The auditor then compares their notes to the description of the label and then renders an opinion that the contents of the jar do or don’t match the description on the label. The auditor is interested in material evidence that is relevant to the objective of the audit. Evidence that is material to the audit is relevant to the audit’s outcome. It helps to provide or disprove if the operation conforms to standards, policies, and procedures. Evidence that isn’t material is not part of the audit. The auditor is not looking to gather all conceivable evidence. Instead, the auditor is looking to gather sufficient evidence to reach a conclusion. The risk assessment determines if there is sufficient evidence available to the auditor.
Types of Risks The auditor identifies and categorizes risks that might impede the audit and then takes steps to avoid or mitigate the risks if possible. Types of risks are: Inherent risk: An inherent risk is a natural risk that always exists. This includes weather, floods, and even the risk of a car accident when the auditor drives to the site of the audit. Detection risk: A detection risk is a risk that the audit may be unable to identify. Two of the more common detection risks are sampling risk and a non-sampling risk. A sampling risk occurs when a sample of evidence leads to a false conclusion. That is, the auditor applied appropriate sampling techniques but the conclusion was wrong. A non-sampling risk is when the auditor doesn’t use the appropriate auditing procedures, or when the procedures that are used are inconsistent with the objective of the audit—both lead to an erroneous finding. Control risk: A control risk occurs when the auditor loses control over the appropriate auditing procedures, causing errors to be introduced in the audit. These errors are either not detected or not corrected in a timely manner. Business risk: A business risk is a risk that is inherent to the operations of the organization. These include regulatory requirements, contractual obligations, and financial limitations. For example, a critical component of the organization’s operation is handled by a vendor; however, the vendor will not permit the auditor to audit the vendor’s operation.
328
Appendix A: Information Technology Auditing
Technological risk: A technological risk occurs when technology fails during the audit. For example, the database used to authorize access to the organization’s systems might be unavailable during the audit, or the auditor’s laptop crashes. Operational risk: An operational risk is the chance that a process is not performed correctly by the staff or by an application. Residual risk: A residual risk is a risk that exists even after all mitigation efforts are performed. No matter what the auditor does to avoid it, there is an element of risk. Audit risk: An audit risk is the combination of all other risks.
Audit Quality Control An information technology audit requires assessment of all technology that the organization uses to operate the business, including all computing devices, networking, application development, and cybersecurity software and processes. The audit team leader, while knowledgeable about technology, rarely has the breadth and depth of knowledge to perform a full information technology audit. The audit team leader should have the skills to identify areas of technology that need to be audited, quality control methodologies to ensure the accuracy of the audit, and the technology experts who are needed to perform the audit. The audit team leader asks the questions and the technology expert answers the questions related to specific technology within the organization. For example, the audit team leader asks if the source code (the instructions written by the programmer) for the order entry application is available. The organization uses an executable version of the order entry application. The source code for the order entry application is needed to modify and maintain the order entry application. The audit team leader needs to know if the source code exists and if the copy that exists is the same as the executable version of the order entry application. Think of this as having the Word document of the PDF file that is distributed throughout the organization. The Word document is needed to make changes to the PDF file. An expert in application development—particularly one who can read the source code—is needed to determine if the source code is the same as the executable application. It isn’t feasible for the audit team leader to assemble an audit team that has all the necessary expertise to conduct the information technology audit. Typically, the audit team consists of a core group who have general knowledge of technology and auditing. Expertise in specific areas of technology are brought on to the team as needed. Some experts are borrowed from other departments within the organization or are consultants from an outside firm. They remain on the team long enough to complete the audit for a specific area of technology.
Audit Quality Control
329
Techniques to Ensure a Quality Audit In addition to employing appropriate technology experts, the audit team leader ensures that quality control is employed in the audit by incorporating good auditing practices. These are: Audit methodology: An audit methodology is a standard procedure used to plan and execute the audit. Using a standard approach to an audit ensures continuity. Think of standards as using the same ruler to measure the same technology in every audit. Understanding needs and expectations: An information technology audit is conducted for many reasons besides a general audit of the organization’s operations. Audits are conducted to verify compliance with regulatory requirements; to meet requirements of vendors and customers; and to provide support for financial obligations (loans). Knowing the purpose of the audit helps the audit team leader deliver results that meet the organization’s expectations. Respect the business cycle: A quality audit requires cooperation from staff. During busy periods in the business cycle, staff members are focused on fulfilling business obligations and have little time to help with the audit. The audit team leader recognizes competing demands and should avoid conducting the audit during peak periods, if possible. Hold workshops and interviews: Audits make staff edgy at times, causing them to have a defensive mindset rather than one of cooperation. There is a tendency to answer an auditor’s questions with “yes,” “no,” or “I don’t know the answer.” The auditor only receives information that the auditor requests. Auditors strive to make the staff part of the audit team so they provide information that the auditor needs, even if the auditor doesn’t request the information. The audit team leader holds workshops and interviews with staff, explaining the auditing process and emphasizing that the goal is to improve operations and not spy on the staff. Customer satisfaction survey: Customer satisfaction surveys are conducted at the end of every audit and serve as a guide to improve auditing techniques for future audits. Sometimes surveys are taken following each area that is audited rather than waiting for the conclusion of the full audit. In this way, adjustments can be made to the auditing process when auditing other areas of the organization. Agree on reference terms: Each area of technology has its own set of terms and jargon. Likewise, businesses also have their own acronyms and terms. Before beginning the audit, the audit team leader works with leaders throughout the organization to develop a reference list of terms and definitions commonly used in the
330
Appendix A: Information Technology Auditing
organization and in the industry. This ensures that auditors and the staff “speak” the same language. Performance metrics: The audit team leader needs to identify metrics used by leaders within the organization to measure performance. Although each industry and technology use standard metrics, the organization may or may not adopt those metrics or may have its own metrics to measure performance. This ensures that auditors are using the same “ruler” to measure the organization. Monitor execution of the audit plan: The audit plan reflects the scope and limitations of the audit as set forth by the audit committee. It also clearly defines tasks, dependencies, and resources needed to execute the audit. It is critical that the audit team leader and the team itself compare actual performance to the audit plan to ensure that the audit is carried out as planned. Any deviation from the audit plan should be reflected in a revision of the audit plan. Respond to complaints: There is an array of issues that might come up during the audit. The audit team leader must have a process in place for the audit team and staff to raise issues during the audit. Issues can stem from the audit interfering with business operations; conflicts between an auditor and staff; or staff refusing to cooperate—practically anything can interfere with the audit process. Issues must be addressed promptly—within hours, if feasible. Delaying a resolution impedes the audit and leads to rumors throughout the organization that the audit team is intrusive and impeding operations.
Data Collection Once the audit is underway, the initial focus is to collect information about the area being audited. Information is evidence of how a process is conducted by the staff. The objective of data collection is simply to gather information that is later reviewed for relevancy. The auditor is similar to a detective at a potential crime scene where the detective gathers information and then sorts through the information in an attempt to tell what happened at the scene; if a crime occurred; and who potentially committed the crime. The detective is simply telling a story based on the information. The auditor does the same, except the mindset is that there is no crime. The auditor gathers information that is relevant to the process that is being audited and then attempts to tell the story of how the process actually works. The story is compared to standards, policies, and procedures later in the audit to determine if operations comply with the regulatory standards and policies. Auditors try to “walk in the staff’s shoes” when gathering data. They get to know the staff and observe each step in the process that is being audited. Auditors are objec-
Data Collection
331
tive and are non-judgmental. They are fact finders regardless of what the facts reveal or do not reveal. Auditors observe and document the process. Questions are asked to clarify steps in the process. Many auditors avoid reviewing standards, policies, and procedures until after the data collection process is completed to remain objective. Auditors collect documentation that is used in the process. Documentation includes documents used by the staff to begin the process; during the process; and documents produced by the process. For example, the auditor retains a copy of a vendor’s invoice; any intermediate reports used by the staff to process the invoice; and documents that contain the account balance, showing that the invoice was paid. Other documentation collection includes training materials; notes taken by staff to guide them through the process; correspondence related to the particular item (i.e., invoice) being processed; and any other written information that may be used in the process. Not all documentation collected by the auditor is necessarily used in the process. For example, staff may never review the training material; therefore, it isn’t used in the process but is collected anyway, because there is a possibility that the staff might use it. Staff who are associated with the process of being audited are interviewed by the auditor either individually or as a group to help the auditor understand steps in the process. Sometimes, both methods are used. Meeting staff as a group helps the auditor develop a broad overview of a process and enables the auditor to tap into what all the staff remembers. Individual interviews provide insight into the fine points of the process since not all staff members know exactly how each step works. Interviews usually begin with the manager of the area being audited. The manager’s role is to implement standards, policies, and procedures, including interpreting vague areas. The auditor is careful to note the manager’s understanding of the process. The auditor then compares what the manager believes is happening with the staff’s performance. Any discrepancies are noted. For example, a step may not work as designed by the manager. The staff uses a workaround rather than raising the issue with the manager. The manager may be unaware of the workaround. It is crucial that the auditor not correct either the staff or the manager if a discrepancy is noted. The auditor is an observer gathering information and not a substitute manager redirecting the staff. Data collection ends when the auditor can tell the story of how the process actually works. Each observation and interview provide the auditor with an element of the story. Sometimes there are gaps in the story, requiring the auditor to re-interview the staff and gather additional documentation to complete the story. It is critical that all gaps are filled and that the auditor doesn’t assume anything. The auditor trusts but verifies every aspect of the process that is being audited.
332
Appendix A: Information Technology Auditing
Review Existing Controls A control is an element of a process that enforces standards, policies, and procedures. For example, a control in the operating system forces each user to change their password quarterly and blocks the use of passwords that violate standards and policies. Each user would be responsible for adhering to the rules if controls were not incorporated into the system. A control falls into one of three categories. These are: Preventative: A preventative control prevents violations of standards, policies, and procedures from happening, such as requiring users to develop a strong password. Detective: A detective control alerts the staff when a standard, policy, or procedure is violated. For example, a message might be sent to the manager of information technology if an employee attempts to access a customer account without permission. Corrective: A corrective control attempts to rectify a problem once the problem is detected. For example, a backup copy of the database is used to restore the database if the database server crashes. The auditor also determines how each control is implemented. There are three common methods used to implement a control. These are: Administrative: The organization can issue standards, policies, and procedures to control how processes work. For example, a policy might be issued that requires the staff to back up the database regularly. It is the staff’s responsibility to comply with the policy. Technology: Standards, policies, and procedures are enforced automatically through the use of technology. For example, an application automatically backs up the database. There is no human intervention. Physical: The organization can physically ensure that standards, policies, and procedures are implemented. For example, security staff document the identification of each visitor before letting the person enter the building.
Evidence Life Cycle The evidence life cycle is the entire process of how evidence is handled throughout the audit and after the audit. The initial step for the auditor is to identify the evidence that needs to be collected in order to understand a process. The auditor then collects the evidence from the area being audited. The collection process requires common sense.
Identifying Evidence
333
The audit is not a criminal investigation, so the auditor needs to be sensitive to the nature of the business operation. Collecting evidence must not disrupt the business and must preserve confidentiality for both the business and customers. All evidence should remain with the client during an audit by outside auditors. All evidence must be preserved in its original state to maintain the chain of custody and ensure that there is no tampering with the evidence. Each piece of evidence is assessed by an expert who is familiar with the area being audited and familiar with the nature of the evidence. The expert must test the evidence and perform qualitative and quantitative measurements appropriate for the evidence. Every aspect of the evidence must be well documented. The evidence is preserved until after the audit is completed and accepted by the organization. Evidence supports the audit findings and is used by the audit to support the audit findings should the findings be challenged by the audit committee and areas of the organization that were the target of the audit.
Identifying Evidence Evidence is information that represents an element in a process. For example, notes from an interview where staff describe how they execute a process are considered evidence. Notes taken by the auditor after observing staff executing a process are also considered evidence. Reports, manuals, training material, training classes, and results from previous audits are all considered evidence. Evidence is like a dot on a connect-a-dot picture, where the auditor links evidence together (connects the dots) to reach an opinion (draw a picture). Evidence is used to either prove or disprove a conclusion. There are two types of evidence gathered during an audit: Direct evidence: Direct evidence is something that by itself is a fact, such as eyewitness statement and reports generated by a process. Indirect evidence: Indirect evidence is a hypothesis supported by direct evidence. For example, an order is entered into an order entry system and an invoice is generated. The inference is that the process used the order entry process to generate the invoice. An inference is drawing a logical and reasonable proposition from another that is supposed to be true. The auditor doesn’t know this for a fact but it is a reasonable presumption. In law, this is referred to as circumstantial evidence.
334
Appendix A: Information Technology Auditing
Grading Evidence Not all evidence is alike. Some evidence is more relevant to the audit than other evidence. Auditors determine the value by grading evidence.
Material Relevance Material relevance evidence is evidence that influences the outcome. Irrelevant evidence is evidence that is not related to the audit and has no direct influence on the outcome of the audit.
Competency of the Evidence Provider Evidence can be provided by staff or generated by a process. The auditor evaluates the source of the evidence to determine if the evidence comes from a reliable source. For example, a staff member who executes the process describes how the process works. The staff member is highly competent to provide the evidence. In contrast, the manager who oversees but doesn’t execute the process may not be competent enough to provide the evidence on how the process works because the manager doesn’t execute the process. The manager provides secondhand information. This may provide some value, but is not as valuable as the information provided by staff who execute the process. An expert is another common provider of evidence. An expert is a person who possesses special skills, knowledge, or experience related to a process. For example, an expert can tell the auditor what is standard practice in the industry. An expert can also give an opinion as to whether the process complies with standard practice. The expert provides evidence that the auditor is incapable of deducing.
Evidence Independence The best evidence is gathered through firsthand observation by the auditor; however, that is not possible during most audits. Evidence is typically gathered from secondary sources such as staff or generated by an application. Although secondary evidence is valuable, it is important that the evidence is independent from influences. For example, the manager may ensure that a process is reviewed regularly for compliance. However, the manager may be biased, since irregularities in the process may be considered a reflection on the manager’s ability to manage. The staff member providing the evidence should not have anything to gain or lose by providing the evidence. The auditor must qualify the staff member who provides the evidence by determining
Identifying Evidence
335
whether the staff member is in a position to provide the evidence. A manager may lack the day-to-day knowledge of how the process is actually performed because the manager doesn’t get involved in these details. The staff member who performs the process daily is more qualified than the manager to explain how the process actually works. The ideal provider of evidence is someone who is knowledgeable about the specific area being audited.
Recording Evidence Each item in evidence is documented in an evidence log. The evidence log records the chain of custody beginning with the auditor who collected the evidence and continues with each person who views or touches the evidence. The goal is to track the evidence every moment, from the time the evidence is gathered to the time the evidence is destroyed or returned to the organization. Each item is labeled with a unique number, name, description, date, and time when the evidence was gathered; a description of how the evidence was gathered; and the reason for collecting the evidence. Each piece of evidence is gathered to support an inquiry into an area being audited. The name and contact information of the auditor who entered the evidence in the log is also recorded in the evidence log. All movement of the evidence is recorded in the log, both when evidence is removed from the evidence room and when it is returned to the evidence room. In special circumstances, evidence is photographed at the site where the evidence is collected. Photographs are used to document the condition and location of the evidence. At times, the photograph is used in place of physically taking the evidence. For example, the auditor may want to document the room that contains servers. A photograph is taken and the room is left intact.
Analysis of the Evidence Each piece of evidence is analyzed. Evidence is trusted by being verified. Each piece of evidence is recorded along with the date, time, and methodology used to gather the evidence. For example, the auditor takes a sample of orders and the results of processing the sample orders. This includes fulfillment and invoicing. The auditor then duplicates the effort by taking other samples of orders using the same process. Each group of samples is compared to evaluate whether samples are representative of how the process works.
336
Appendix A: Information Technology Auditing
Preparing Audit Documentation Every step in an audit is fully documented. The result of the audit is assigned an audit coefficient by the audit team leader. The audit coefficient is the level of confidence in the audit results that the audit team has in the audit. The audit coefficient is also referred to as the reliability factor of the audit. An audit should have an audit coefficient of at least 95%. The objective is to provide information so that the audit can be repeated by another auditor who will identify the same evidence and draw the same conclusion. The sets of audit documentation are: –– The charter –– Scope of the audit –– Audit plans –– Policies –– Specific procedures used during the audit –– Record handling and test procedures –– Auditor’s working notes –– Evidence necessary to perform the audit –– Copies of any reports that were issued Collectively, the audit documentation describes the audit, which includes: –– The purpose and scope of the audit –– The audit plan –– The audit team and stakeholders involved in the audit –– Date, time, and location of each event that occurred during the audit –– The methodology used to collect and test evidence
Audit Samples In the ideal world, the auditor observes a process every time it runs and collects evidence each time it runs. Strict sampling methods are used for compliance testing, where the organization must prove to a regulatory body that a process is in compliance with regulations. Many times, an audit is built into the process that compares each transaction with rules and then rejects or flags non-compliant transactions. This is referred to as a control. However, this is impractical for a substantive audit, so the auditor takes a sample. A sample is one occurrence of the process and related evidence. A substantive audit determines if a process materially adheres to standards, policies, and procedures. Most transactions are compliant, although no one examines all transactions. There are two basic types of samples: statistical samples and non-statistical samples.
Audit Samples
337
Samples may not represent all transactions, which are referred to as the population. The auditor chooses the most appropriate sampling method to ensure that the result is not misleading. Sampling has an inherent risk of inaccuracy, known as the sampling error, which is calculated by the auditor.
Statistical Sampling Statistical sampling uses mathematical techniques to determine that the sample is statistically representative of the population within an error factor of plus or minus a value. Statistical samples are presented as percentages of certainty, such as a sample representing 95% of the population with plus/minus 2%. Two common methods used for sampling are: –– Random sampling: Random sampling is where transactions are randomly taken from the population of all transactions. –– Cell sampling: Cell sampling is where transactions are selected at predefined intervals.
Non-statistical Sampling Non-statistical sampling is referred to as judgmental sampling because the auditor determines the sample size and the method of generating the sample. Non-statistical sampling is risky because the selection is based on the subjectivity of the auditor. Evidence gathered using a non-statistical sampling method may be valuable for qualitative purposes, but not for qualitative analysis. For example, looking at five randomly selected orders gives the auditor an idea of what an order looks like (qualitative), but it does not give any indication that those orders are representative of all the orders that are processed. Although the five orders were randomly selected, the sample size may not be statistically significant. It is likely too small of a sample size to calculate a sampling error.
Compliance Testing Compliance testing is performed to determine if a process conforms to regulation. There are four types of compliance testing. These are: Attribute sampling: An attribute is an element of a process that indicates compliance. The attribute sampling determines if the attribute is present or absent in the sample.
338
Appendix A: Information Technology Auditing
Results are reported as rates of occurrence, such as 10 in 100 transactions are missing the attribute. Stop-and-go sampling: Stop-and-go sampling is used to minimize sampling and is used when few errors are expected. For example, all orders require authorization before processing. Since controls in the process require this, the audit can use stop-and-go sampling. After a reasonable number of samples all contain authorization, there is no need to continue, and sampling stops. Discovery sampling: Discovery sampling is used to detect fraud. It is used in cases when the likelihood of finding evidence is low because there is little or no fraud. Precision sampling: Precision sampling is based on the belief that a statistical sample is representative of the population and requires an error rate.
Substantive Testing Substantive testing verifies balances of groups with related characteristics and tests physical inventory to verify if the recorded inventory status is valid. There are five types of substantive testing. These are: Variable sampling: Variable sampling indirectly measures a sample. For example, instead of counting a box filled with $100 bills, one $100 bill is weighed and the content of the box is weighted. Dividing the content weight by the weight of a single $100 produces the number of $100 bills in the box. Unratified mean estimate: Unratified mean estimate projects a total of the population-based estimation rather than counting the population. Stratified mean estimate: Stratified mean estimate divides the population into groups of similar demographics and then estimates each group. Difference estimation: Difference estimation compares the audited and unaudited transactions. Tolerable error rate: Tolerable error rate is the maximum number of errors that are acceptable.
Reporting Audit Findings
339
Reporting Audit Findings The audit report is the results of the audit and the findings of the audit team. The audit report details every aspect of the audit in such a way that anyone reading the report can follow the auditing process and understanding for the basis of the findings. The audit report follows a uniform format, making each audit similar to other audits. Here is the audit report format: Audit scope: The audit scope defines the limitations imposed by the audit committee on the audit team. Audit objectives: Audit objectives identify the goals of the audit, specifying the deliverables of the audit team. Methods and criteria used: The methods and criteria used section of the audit report details how the audit team went about conducting the audit. It describes methodologies and criteria used to identify, collect, and evaluate evidence. Nature of findings: The nature of findings section describes what the audit team found when they looked at the evidence. Extent of work performed: The extent of work performed section describes the effort of the audit team such as sites visited, the depth of the audit at each site, and anything that helps the reader understand the breadth of the audit effort. Applicable dates of coverage: The applicable dates of coverage section specifies the dates and times when auditors gathered evidence. This becomes important because the result of the audit may not reflect transactions and activities that occurred outside the audited period. Restrictions: The restrictions section describes limitations placed on the auditors when conducting the audit. For example, auditors may have been prohibited from entering the room that contains servers. Therefore, activities taking place in that room are not reflected in the audit. Reservations: The reservations section contains concerns that the audit team has on the audit, such as unable to access to key facilities. Qualifications: The qualifications section of the audit report contains concerns auditors have about the findings. For example, the auditors found no evidence of a security breach; however, cybersecurity did not reflect the latest defense measures.
340
Appendix A: Information Technology Auditing
Final opinion: The final opinion is the most critical element of the audit report. It is in this section where the auditing team expresses their feelings on whether or not the area of the organization that was audited is in compliance with standards, policies, and procedures. There is typically one of four types of opinions issued by the auditing team: 1. Certification that the area of the organization is in compliance. This is known as an unqualified opinion. 2. Certification that the area of the organization is in compliance with exceptions as stated. This is known as a qualified opinion and usually contains recommendations that will bring it into compliance. 3. The area of the organization is non-compliant. 4. No opinion. No opinion means that the auditing team did not have sufficient evidence to certify that the area of the organization that was audited was compliant or non-compliant. This is known as a disclaimer of opinion and usually occurs when such severe limitations are placed on the audit team that it is nearly impossible to conduct a viable audit.
The Audit Report and Stakeholders Before the audit report is written, the leader of the audit team meets with key stakeholders to report the preliminary findings. The objective of the meeting is to share the findings. The audit team doesn’t want to surprise any stakeholder, especially if there are concerns with the stakeholder’s operation. The meeting has a cooperative tone without any blame. The leader of the audit team presents the preliminary findings and how the audit team arrived at its conclusion. Stakeholders are asked if they feel the findings are representative of their operation or if auditors overlooked or misread evidence that leads to a different conclusion. It is important that the leader of the audit team act like a colleague trying to fix a problem rather than a prosecutor who is trying to prove the stakeholder guilty of a crime. In particular, stakeholders are presented with a preliminary report so they can give input on its wording. The leader of the audit team wants to be sure that the wording isn’t overly strong as to mislead the reader, yet too weak as to minimize the importance of any deficiency. The wording of the report should fairly describe the actual situation. The stakeholders’ insight can be valuable to fine-tune the wording. Meeting stakeholders also give stakeholders a heads up to prepare a response to the findings. The audit committee is bound to ask stakeholders to explain why their process is lacking. Stakeholders already know—and probably agree—with the wording of the findings and have a well-developed response to how the stakeholder intends to strengthen the weakness.
Detecting Irregularities
341
When presented with the preliminary findings, stakeholders may immediately rectify the problem. If this occurs and is verified by the audit team, then concerns no longer exist and the item is removed from the preliminary audit report. Keep in mind that the objective is to rectify the problem and not punish stakeholders. Working as a team, stakeholders and auditors should be able to address concerns before they appear in the final audit report. The audit report should contain a few material deficits. Minor concerns are not mentioned in the audit report. The audit team lists critical issues that need to be immediately addressed in order to bring the area that was audited into compliance. Each issue has a recommendation on how stakeholders can bring the issue into compliance. This forms the starting point for the next audit to ensure that recommendations were followed. All recommendations should be developed with stakeholders who are knowledgeable about the concern. Auditors should not be seen as the authority on how the process should work. Auditors are facilitators who enlist experts and the stakeholders to address the concern. The meeting with stakeholders should conclude with an agreement. The leaders of the audit team agree on issues that concern the audit team; on the wording of those issues; on the recommendations; or agree not to agree. In such a case, stakeholders can present their interpretation of the evidence to the audit committee.
Detecting Irregularities An irregularity is evidence that leads the auditor to believe that an illegal act might have occurred. Here are illegal actions that might be found during an audit: Fraud: Fraud is any act of deception, including misrepresentation, used to gain an advantage. Theft: Theft is acquiring resources that are not rightfully yours, commonly referred to as conversion. Suppression: Suppression is willfully omitting data or records that affect a business transaction. Racketeering: Racketeering is the repeated process of committing a crime. This opens the stakeholder to violation of the Racketeer Influenced and Corrupt Organizations (RICO) Act, which is used by the federal government to combat organized crime. Evidence that leads the auditor to suspect that a crime might have occurred include: –– Questionable payments –– Unsatisfactory record control
342
Appendix A: Information Technology Auditing
–– Unsatisfactory explanations –– Questionable circumstances –– Unusual or unexpected relationships that might lead to material misstatements or misrepresentations When an auditor discovers evidence that arises suspicion that a crime might have occurred, the auditor stops auditing the area and reports concerns to the leader of the audit team, who presents the concerns and evidence to the organization’s attorney. The organization’s attorney will likely stop the audit and conduct an investigation, during which evidence is identified and gathered using techniques to preserve evidence for potential criminal proceedings. Evidence already gathered by the auditor will likely be ignored and re-gathered by investigators. The leader of the audit team never contacts law enforcement. The organization’s attorney, who follows criminal investigation procedures to verify that a crime might have been committed, does so.
Index A Abandonment rate (AR) 213 Abstraction process 269 Acceptance process 23–24 Accounting system 99, 269, 314 Administrators 204, 207, 217–18, 222 Adoption process 234 Advanced Encryption Standard (AES) 183, 186 AES (Advanced Encryption Standard) 183, 186 Agile programming 25 Agile project management 25 Agreement 211, 214, 219, 224–26, 231–37, 239–40, 249, 255, 291 –– service level of 213–14 –– service-level 243, 256 Algorithm 5, 185, 282 American National Standards Institute (ANSI) 37, 150 Analytics 257, 268, 271, 273, 283–85 Android 83, 127 ANN (Artificial neural networks) 273, 278 ANSI (American National Standards Institute) 37, 150 API (application programming interface) 258, 264 Application architecture 100–101, 103 Application development 328 Application layer 43–44, 128, 166 Application ownership 254–55 Application program 187–88 Application programming interface. See API Application server 59, 62, 101–2, 170, 199, 201, 245, 265 Application set 43 Applications 43–44, 99–105, 116–19, 126–31, 199–203, 224–26, 245–50, 253–65, 311–14 –– cloud-based 245, 252 –– email 51 –– new 129–31, 255, 261 –– organization’s 190, 253–54, 256, 264–65 –– pilot 262 –– vendor’s 224, 227–28 Applications and databases 127, 151, 211, 253, 255, 264 Apps 56, 61, 110, 126–27, 175, 194, 288 Arbitration 232, 291 Arbitrator 232–33, 238, 240–41, 291 DOI 10.1515/9781547400812- 013
Architecture –– client/server 247 –– multi-tier 103–5 –– three-tier 101–2, 247 Artificial intelligence 63, 267, 278 Artificial neural networks (ANN) 273, 278 Assessment 10, 58, 209, 214, 220, 226, 320, 328 Assets 197–98, 205, 208, 235, 265 Assumptions 8, 15, 19, 21, 206, 272 Attachment 53–54, 94 Attack 176–77, 191, 198, 263 Attorneys 231, 238, 287, 289, 292–94, 296–97 –– organization’s 342 Attributes 32–33, 144, 147–48, 337–38 Audio 2, 42, 48, 50, 57, 138–39 Audit 200, 263, 311–13, 315–30, 332–34, 336, 338–42 –– current 318, 320, 325 –– cybersecurity 168–69 –– substantive 336 Audit charter 315–16 Audit coefficient 336 Audit committee 316–19, 321, 325–26, 330, 333, 339–41 Audit cost 324, 326 Audit documentation 336 Audit findings 333, 339 Audit methodology 329 Audit objectives 326, 339 Audit plan 312, 318–20, 324, 326, 330, 336 Audit planning 319 Audit Quality Control 328–29 Audit report 200, 312–13, 316, 318–19, 325, 339–41 Audit report and stakeholders 319, 340 Audit report details 339 Audit risk 328 Audit risk assessment 326–27 Audit samples 336–37 Audit scope 339 Audit team 313, 316–17, 319–21, 323, 326, 328–30, 336, 339–42 Audit team leader 317, 319–26, 328–30, 336 Auditee 324–25 Auditing 312, 328–29 Auditing plan 312
344
Index
Auditing procedures 327 Auditing process 318, 329, 339 Auditing team 312, 315, 318, 326, 340 Auditing technology 313 Auditor positions 322 Auditors 168, 200–202, 311–13, 316–37, 339–42 –– cybersecurity 176, 178 –– internal 263, 312, 317, 323, 325 Auditor’s opinion 319 Authenticate 110–11, 165, 170–71 Authorities, regulatory 169, 195, 225–26, 238 Authority 223, 316, 341 Authorization 7, 223, 256, 265, 297, 312, 315, 338 Average Value 159 B Backbone networks 48 Backup 191, 202, 217 Bandwidth 37, 57, 254–55 Bank 82, 126–27, 169, 208 Base station 56 Baseline 40–42 Basic input/ output system. See BIOS Behaviors 32–33, 139, 144, 189, 267 Benefits 13, 150, 206–7, 245, 269, 323 –– financial 7, 243 –– BI (business intelligence) 271 Big data 246–47, 267–68, 270, 272, 274, 276, 278, 280, 284–85 Binary digits 39, 42, 44, 63, 65–66, 75–76, 80, 182, 303–4 BIOS (basic input/ output system) 77–78, 82 Bit value 81 Bitcoin 258 Bitlines 81 Bits 42, 63, 65–66, 72–74, 76–78, 80, 94–96, 182, 304–5 Blockchain 258 Blockers 296, 299–300 Blocks 5, 44, 54, 69–70, 79, 181, 191, 194, 332 Bluetooth 55, 186, 194 Bluetooth devices 55, 186–87 Box of switches 61, 63–64, 83 Breakpoint 226, 231 Broadcasts 87, 191 Browser 48–50, 54, 100, 102, 121–24, 126–29, 144, 179–80, 185 Budgets 8–9, 14, 19, 21, 204, 216
Building 58, 64–65, 126–27, 195–97, 203, 248, 256, 259–60, 267 Building blocks 64–67, 69, 71, 73, 75 Bus –– external 65–68, 73 –– internal 66–67 Business 1–3, 6–10, 47, 196–98, 200–201, 209, 224–25, 235–36, 273–74 Business analytics 271 Business case 6–8 Business continuity 205 Business cycle 317, 329 Business environment 267 Business impact analysis 197–99 Business intelligence (BI) 271 Business intelligence system 271 Business operations 3, 37, 198, 200, 203–4, 273, 317, 330, 333 Business processes 103–4, 133, 198, 216 –– critical 318 Business risk 21, 327 Bytes 44, 47, 63, 74, 76, 79, 93, 185–86, 301–4 C Cables 35, 37, 46, 48, 55, 57, 59, 64, 200–201 –– optic 37, 40, 42 Cache 53, 70, 73, 77, 82 Cache memory 77–78, 82 Calculating values 158–59 Calculations 3–4, 114, 158–59 Cameras 172, 174–75, 294 Candidates 17, 221 Capacitors 80–81, 97 Capacity 78, 93, 221, 237, 247 Carriers 46, 295 Cascading Style Sheets. See CSS Cases 32, 62, 151, 268, 290–93, 296, 308, 338, 341 CD (Compact Disc) 2, 71, 83, 91, 94, 97 Cell 44, 56, 58, 135, 188 Cell phone 35, 37, 40, 56, 61, 83, 144, 194, 308–9 Cell phone service provider 309 Cell tower 295, 309 Central processing unit (CPU) 65, 67, 189 Change control system 8 Change costs 255 Change management 20–21, 260, 270 Change management committee 21, 28 Change management process 20–21, 28
Index Characters 37–40, 88–89, 134–35, 138–40, 165, 171, 175, 301, 304–5 –– non-printable 134 Charges 72, 81, 90, 97, 227, 229, 290, 292, 323 Charter 7–10, 204, 311, 326, 336 Checksum 185–86 Chief information officer (CIO) 320 Chip 65, 74, 76, 78–79, 81–82 CIO (chief information officer) 320 Circuit board 65, 70, 77 Circuitry 68, 70, 72, 78–79, 81 Circuits 40, 46–47, 67, 81, 90 –– optoelectronic 94–95 CISG (Contracts for the International Sales of Goods) 239 Civil action 289–91 Classes 32–33, 47, 183, 196 Classifications, general 196, 289 Clause 152–63, 233 Client side 100–101 Clients 102, 136, 189, 221, 247–50, 254–55, 333 –– internal 249 Clock, internal 66–67, 73–74 Cloud 62, 245–47, 249–56, 259–60, 262–65, 267 Cloud architecture 257, 259, 261 Cloud computing 245–46, 248, 250, 252, 254, 256, 258, 260, 262 Cloud computing services 245–46 Cloud life cycle process 256 Cloud provider 245–48, 251, 253–56, 260, 262–65, 288 Cloud services 245, 247, 253, 255 Cloud vendor 245–46, 249–53, 255–57, 259–60, 262, 264 Cloud vendor’s data center 245–46 Cluster 25, 176 CMOS (complementary metal-oxide-semiconductor) 82 Coaxial 37, 42 Coaxial cables 37, 42, 46–47 COBIT 315 Colors 32–33, 74, 79, 96, 124 Column functions 158–60 Column names 143, 152, 154–55, 158, 162 Columns 81, 95, 135, 143, 145, 147–49, 151–55, 157–62, 282 Command prompt 49, 51, 84, 303 Commands 49–52, 68, 84, 151 Communication closets 59, 201
345
Communication network technology 35 Communication networks 35–38, 40, 42–46, 48, 50, 52, 54, 56, 58 Communications 43, 54, 56–58, 60, 71, 129, 150, 166, 215 Communications network 35–36, 42–44, 46 Compact Disc. See CD Companies 3, 211, 223, 258, 267 Compensation, vendor’s 242–43 Compiler 31, 109–10, 112–13, 117–18, 200 Complementary metal-oxide-semiconductor (CMOS) 82 Complex Project Management 25, 27 Complexity 109, 222, 224, 267, 269, 281 Compliance 261, 263, 312–13, 318, 329, 334, 336–37, 340–41 Compliant 226, 263, 320–21, 326, 336, 340 Components 14, 23, 33, 63, 65–66, 73, 99–102, 248, 278–79 Computer 31, 48–49, 51–52, 61–92, 98–100, 107–8, 118–19, 182–84, 187–93 –– customer’s 294 –– desk 51 –– local 182, 245 –– one-processor 69 –– personal 2, 61, 68, 85 –– remote 51, 58, 133, 188, 245 –– suspect’s 294 –– user’s 100–101, 119 Computer applications 99–100, 102, 104, 106, 108, 110, 112, 114, 116 Computer category 61–62 Computer device 288, 306 Computer forensics 294, 296, 299–300 Computer forensics evidence 295–97 Computer forensics investigation 294–97, 299, 301, 304–5, 308 Computer forensics investigator 287–89, 294–95, 297–309 Computer forensics report 302–3 Computer forensics software 296 Computer forensics tools 295–96 Computer forensics workstation 296, 298–300, 304, 306–9 Computer investigator 297 Computer memory 73, 76, 95, 299 Computer network 35, 62, 294 Computer powers 81 Computer programs 31, 66, 188 Computer screen 90, 170
346
Index
Computer systems 268, 273 Computer technology 1 Computer virus 187–89 Computer viruses 187–89 Computing 210, 213, 247, 252, 257, 259, 281, 285, 287 –– serverless 259 Computing devices 48, 165–67, 176–80, 184–91, 198–202, 210–13, 287–89, 294–99, 305–8 –– destination 306 –– internal 179 –– intruder’s 178 –– local 53, 252 –– maintenance of 220, 314 –– remote 57, 184, 187 –– target 298, 306 –– target’s 178 –– unsecured 169, 180 –– wireless 191–92 Computing devices to work 59, 169 Computing power 246, 259, 267, 296 Computing resources 245, 247, 249–53, 255, 259–60 –– organization’s 252, 265 Conduct 131, 277, 288, 294–97, 311–12, 318–19, 326–28, 340, 342 Conduct business 235, 311 Conflict resolution process 9, 232 Conflicts 9, 11, 13, 147, 210, 232, 239, 243, 279 Connect 47, 49, 55, 57–58, 62, 72, 178–79, 295–96, 300 Connections 43, 46, 81, 166, 178, 253 Constraints 8, 20, 25, 152, 219, 222 Consumers 2–3, 267 Contact person 146 Contract 20, 203, 211, 213, 219, 223–24, 227–33, 235–43, 291 –– oral 236 –– written 236, 238 Contract disputes 233, 238 Contract manager 243 Contractor, general 59, 221, 229 Contractors 219 –– integrated 221 Controller, programmable logic 166–67 Cookie 133, 136, 193, 283 Copy 50, 186, 193, 293, 295, 298–300, 305, 328, 331
Cost 18–19, 211–12, 219–20, 223–24, 227–28, 247–50, 254–55, 273, 323–24 –– fixed 323 –– variable 323 Cost performance index (CPI) 19 Cost variance 19 Counter 191, 225–26, 236, 295, 308 Court 193, 233, 238, 287, 289–92, 294–97, 299, 302, 309 CPI (cost performance index) 19 CPU (central processing unit) 65, 67, 189 CRC (cyclic redundancy check) 45, 185–86 Credit card processing 103, 258 Crime 174, 237, 288–91, 294–95, 330, 340–42 Critical success factors (CSF) 285, 324 CSF (critical success factors) 285, 324 CSS (Cascading Style Sheets) 123–25, 127 CSS instructions 124 Cursor 87–89 Custody, chain of 293, 297–99, 333 Customer information 141, 144–46, 148, 162 Customer invoice table 159–60 Customer name 137, 148, 154, 156, 162 Customer number 137, 139–40, 143, 145, 148, 152, 154–56, 158–60, 162 Customer table 145–46, 148, 151, 153–56, 162 Customer first name 155–57, 161–63 Customer invoice TB 159, 161 Customer last name 155–57, 161–63 Customer number 152, 154–55, 158, 160–62 Customer number column 152, 162 Customer number Integer 152–53 Customers 11, 135–38, 142–48, 152–54, 156–62, 197, 215–17, 257–58, 261 –– services to 213, 256–57 Customer TB 151–52, 154–57, 161–62 Cyberattack 167, 267 Cybersecurity 165–68, 170, 172, 174, 176, 178, 180, 182, 184 Cybersecurity department 165, 168 Cycle 55, 317 –– evidence life 332 Cyclic redundancy check. See CRC Cyclic redundancy check 45, 185–86 D Dashboard 271, 273, 283 Data analysis 267–68, 270, 272, 274, 276, 278, 280, 282, 284
Index Data and databases 133–34, 138, 140, 142, 144, 146, 148, 150, 152 Data collection 330–31 Data flow diagram 29–30 Data layer 43–44, 101, 128, 166 Data mining 267, 271, 273, 280–83 Data store 29-30 Data warehouse 271, 280–81 –– real-time 281 Database 105, 126–28, 133–44, 146–54, 160–63, 168–69, 174, 201–2, 264–65 –– central 170, 172 –– organization’s 37, 280 –– relational 145–47, 284 –– secured 169 Database access points 168–69 Database administrators 136, 141, 149, 151–53 Database and tables 135, 149, 151 Database applications 110, 133, 136–37, 139, 141, 146, 149–50, 168–69 Database design 136, 140, 147, 284 –– logical 141–42, 147, 149 –– physical 142–43, 149 Database designer 136–43, 147–49, 158, 163 Database failure 202 Database Management System. See DBMS Database management systems 141, 165, 167, 169, 199, 201, 210, 265, 268 Database models 144–45 –– hierarchical 144 –– object-oriented 144 –– relational 144 Database schema 136, 147 Database server 62, 102, 141, 168, 199, 212, 245, 265 Database table 151 Date 20, 125, 137–40, 237, 242, 309, 316, 335–36, 339 Date/time 308 DBMS (Database Management System) 62, 102, 141, 149–62, 165, 167, 169, 201, 255 DBMS manufacturers 150 DC (direct current) 65 DCL (data control language) 151 DDL (data definition language) 151 Deadline 8, 21, 222, 242–43, 277, 312, 319 Decimal 38–39, 64, 107, 138–40 Decimal number system 301 Decimal value 38–39, 48, 114, 301 Decipher 264, 289, 305, 308
347
Decision maker 9, 190, 267–69, 271–77, 280–82 Decision modeling 270–71 Decision points 113, 277 Decision process 29, 268–70 Decision Support Systems 267–68, 270, 272–76, 278, 280, 282, 284 Decision trees 277 Decisions 20–22, 28–29, 113–15, 232–34, 267–69, 271–74, 276–77, 280–81, 292 –– real-time 281 Decode 182, 184–85 Defendant 289–91, 293, 295 Defense attorneys 289–90, 295 Degrees 40, 136, 263, 279 Deliverables 10, 14, 21, 27, 232, 243, 339 Demilitarized Zone. See DMZ Dependencies 10, 15–17, 20, 212, 253, 312, 321–22, 330 Deployment process 260–61 Description 5, 105, 120, 140, 222, 322, 327, 335 Design 11, 14, 32–35, 77, 79, 123, 258–59, 311, 318–19 Design process 5, 29, 34 Designers 5, 11, 56, 104–5 Desktop 61, 85–87, 89, 98, 107 Desktop computers 61–62, 93 Destination network device 36, 43, 45–46 Developers 11, 127, 256, 258–60, 262, 289 Development 8, 11, 19–20, 24, 26, 181, 219–20, 257–58, 260–61 Development team 11, 258, 261 Device driver 71, 87–88 Devices 34–35, 40, 44–46, 70–72, 186–88, 194, 287, 289, 306–7 –– electrical 212, 314 DevOps 260–62 DevOps maturity model 262 DevOps process 260–62 Difference 19, 66, 94, 96–97, 131, 133, 135, 170, 281–82 Digital camera 72, 172–73 Digital certificates 184–85 Digital data 287, 294 Digital recorders 2 Digital waves 41–42 Digits 38–39, 63, 76, 107, 112, 139–40, 301 Dimensions 32–33, 173 Direct current (DC) 65 Direct evidence 293, 333
348
Index
Directors 30, 315–16 Disagreement 232–34 Disaster 195–98, 201–12, 214–17, 251 –– natural 196, 212, 240 –– potential 199, 204, 206, 211 Disaster drills 216–17 Disaster recovery 195–96, 198, 200, 202–6, 208, 210, 212, 214, 216 Disaster recovery operations 214–15, 217 Disaster recovery plan 195, 202, 204–7, 209–10, 215–17 Disaster recovery team 204, 207, 215–16 Disaster-based risk assessment 197–98 Discount 152–53, 278, 299 Disk 77, 88, 91–95, 136, 138–39, 183, 296, 303, 307 Disk drive 61, 70, 86, 91–95, 136, 193, 284, 296, 302 Display 31, 50, 86, 89, 97–98, 106, 109, 120–24, 145 –– instructions to 120, 129 Disputes 232–35 Distinct clause 159–60 Distributed ledger technology 258 DMZ (Demilitarized Zone) 180 DNS (domain name server) 49, 52 Doctype 50, 122, 128–29 Document 62, 69–71, 99, 122, 134–36, 282–83, 299, 331, 335 Documentation 298, 331 Domain name server (DNS) 49, 52 Domain names 52, 181, 191 Downtime 211 Downtime computer/printer 25–26 Downtime procedures 215–16 DRAM. See Dynamic random access memory Drive 5, 29, 93, 121, 206–7, 289, 295–96, 299, 302–3 –– hard 91, 295, 300, 303 Duplicates 160, 335 Duration 10, 15–21, 25, 172, 210, 320–24 Dynamic random access memory (DRAM) 78 E EEPROM (Electrically erasable programmable read-only memory) 79 EICS (emergency incident command system) 214 Electrically erasable programmable read-only memory (EEPROM) 79
Electronic envelope 35–36, 44–45 Electronic Serial Number (ESN) 57 Electrons 40, 80, 97 Email client 51–54 Email server 51, 53–54 Emergency 196, 204, 209, 212, 214–15 Emergency incident command system (EICS) 214 Emergency incident commander 214–15 Emergency operations center 204, 215 Emergency Operations Center (EOC) 215 Employee number 76, 143, 148 Employees 170, 176–78, 194–95, 197, 203, 208–10, 212, 216–19, 280 –– recovery 212 Encrypted data 289, 305 Encryption 182–83, 185, 249, 254, 256, 264–65, 305 Encryption algorithms 182, 264, 308 Engine 23, 32–33, 72, 74, 276 Engineers 4–6, 11, 29, 42–43, 56, 58, 75, 78, 150 Entity 74, 136–37, 142, 144–45, 147, 249 Entity attributes 137, 142 Environments, organization’s computing 256, 259 EOC (Emergency Operations Center) 215 EPROM (Erasable programmable read-only memory) 79 Erasable programmable read-only memory. See EPROM Errors 45, 81–82, 111, 113, 116–17, 153, 324, 327, 338 ESN (Electronic Serial Number) 57 Estimate 9–10, 15–16, 19, 224, 227, 274, 276, 320–21, 338 –– top-down 19 Estimate cost 19 Estimate technique 19 Ethereum 258 Event 22–23, 28, 120–21, 195–97, 200, 207–8, 240, 257–58, 293 Event handler 257–58 Evidence 13, 206, 287–99, 312, 319, 325–27, 330, 332–37, 339–42 –– digital 287, 294 –– expert 294 –– grading 334 –– piece of 290, 297, 333, 335 –– present 289–91
Index –– prosecutor’s 291 –– scientific 294 Evidence computing device 305–6 Evidence data 296, 298–99, 301–8 Evidence drive 295–96, 299–302, 304–5, 307–8 Evidence file 308 Evidence log 298, 335 Evidence room 298, 335 Example 106–9, 111, 113, 115, 122–26, 128–29, 140, 150–63, 248 Excel 3–4, 99, 121, 133, 169, 189, 273, 280 Excel workbook 135, 151 Executable file 108, 117–18, 188, 308 Executes 73–74, 86–87, 89, 108–10, 115–16, 118, 120–21, 329–30, 333–34 Executives 169, 216, 263, 281, 311 Expectations 8, 10–11, 23–24, 42, 129, 200, 213, 222, 238–39 Expenses 207, 209, 220–21, 227, 235, 240, 253, 255, 323 Experience 11–13, 51, 121, 131, 221, 255, 268, 278, 280–81 Expert systems 60, 99, 273, 278–79 Experts 15, 99, 219, 268–69, 279, 328, 333–34, 341 –– subject matter 6, 12, 15, 208, 214 F Facility 45–46, 195–97, 200–201, 205–7, 209–10, 212, 215, 227, 229 –– vendor’s 197, 264 Fact-finding 232–33, 291 Factors 73–74, 215, 217, 224, 226, 232–33, 275, 279, 320 –– critical success 324 –– negotiable 226 –– required 226 –– time service 213 Failure 22, 24, 196, 198, 200–202, 213, 242–43, 253, 261 –– memory chip 82 –– point of 23, 200–202 Feedback 29, 111, 114, 190, 259, 261 Fiber 37, 40, 42, 46 File 48–50, 92–94, 134–36, 138–39, 178, 289, 300, 302–5, 307–8 –– deleted 93, 289, 307 –– index.html 50, 54, 122 –– list of 92
349
–– new 93, 307 –– object 117 –– original 300 –– sorted 85 –– zip 94 File clerk 141, 149 File extensions 108, 134, 188, 307–8 File headers 307 File name 92–93, 308 File server 63, 178 File services 59, 63 File system 92–94 File Transfer Protocol. See FTP File/folder encryption 183 Fingerprints 171, 173–74 Firewall 54, 180–82, 193 Flash memory 79, 95 Floods 195–96, 210, 212, 327 Focus board 25–27 Forensic Computing 287–88, 290, 292, 294, 296, 298, 300, 302, 304 Forensics workstation 299, 305–6 FPM DRAM 78 Frequencies 20, 40, 42, 55–56, 70, 172, 282 FTP (File Transfer Protocol) 48–49, 178 FTP service 178 Function 4, 32, 34, 110–20, 125–26, 129, 196–97, 258, 260 –– pause 115 Function definition 112, 117 Function keys 89 Functionality 32, 40, 110, 125, 130, 136, 144, 150, 215 G GAAP (generally accepted accounting principles) 317 Gantt chart 16, 20, 321 Gate 75–76 Google 48, 121, 126–27, 129, 175, 183, 267, 282 Government 190, 235, 246, 289, 324 Grading 113–14 Graphical user interface. See GUI GUI (graphical user interface) 1, 34, 84, 87, 119, 121 GUI Program 119, 121 –– real 121 H Hackers 51, 87, 167, 182, 186–88, 190–94, 264
350
Index
Hacking 187, 189, 193 Hard disk 70, 72, 82, 92, 136, 294, 303 Hardware 53, 62, 201, 212, 217, 248, 256, 259, 299–300 Hash value 185, 298–300 Hashing 298, 300–301 Hazards 197–98 Help make decisions 274–75, 277, 279 Hex editor 107, 299, 301–3, 305 Hexadecimals 107, 301 High-definition (HD) 72, 74 High-level programming language 109–11, 116, 118, 125 Home page 50, 121, 128, 179 Human resources 17, 322 Hybrid clouds 250 I IANA (Internet Assigned Numbers Authority) 47 Icons 1, 51, 61, 85–86, 116 ID 86, 125–26, 176–77, 264 ID and password 51, 165, 170, 175, 177, 187 IDE (integrated development environment) 117–18, 121 Idea to reality 4–5, 19, 105 Identifying Evidence 333, 335 ID/password 110–11, 176–77, 181, 183 Image 79, 86–87, 90, 95–98, 138, 165, 170, 173, 299 IMAP (Internet Mail Access Protocol) 53 IMAP server 53 Importance of cloud computing 245–46, 248, 250, 252, 254, 256, 258, 260, 262 INDEX Customer Last Name First Name INX 154–55 Indirect evidence 293, 333 Information –– biometric 174 –– change 307 –– customer’s 145, 148 –– encoded 42, 182, 184 –– encrypt 182 –– encrypted 184 –– order 141, 145 –– organization’s 168, 170 –– parent 146 –– personal 174–75, 178, 192 –– piece of 45, 134, 136 –– probative 287 –– protected 183
–– requests for 104, 136, 141, 149, 222 –– sensitive 185, 187–88 –– transfers 72 –– transmit 41–42, 44 Information Systems Audit and Control Association (ISACA) 315 Information technology 200, 252, 315, 332 Information technology audit 311, 315, 317, 319–20, 328–29 Information technology auditing 311–12, 314, 316, 318, 320, 322, 324, 326, 328 Information technology auditor 313–14 Input number 185 Input/output ports 64–65, 67, 70–72, 82, 87, 91–92 Instruction cycle 68 Instruction set 68, 74, 84, 109 –– processor’s 68, 74, 108 Instructions 4, 66–75, 83–84, 86–88, 100–101, 106–18, 120–29, 149, 202–3 –– first 77, 112, 122 –– interrupt 86 –– last 112–14 Integer 112–15, 138–40 Integer programming 270 Integer values 112, 114 Integrated development environment. See IDE Interference 42, 55–57 International organization 37, 43, 150 Internet 44, 46–49, 54, 56–58, 178–81, 184–85, 187–88, 190–93, 245 Internet Assigned Numbers Authority (IANA) 47 Internet Mail Access Protocol (IMAP) 53 Internet Protocol. See IP Interpreter 31, 109, 118 Interrupt 86–87, 89, 188, 190, 316 Interrupt service routine 86–87 Interrupt vector table 86–87 Intranet 35, 54, 62, 72, 85, 166, 180, 191, 193 –– organization’s 54, 62, 141, 179–80 Intruders 176–78, 187 Investigation 168, 294–95, 297–99, 301, 306, 308, 316, 342 iOS 127 IP (Internet Protocol) 36, 44, 47, 178–81, 191, 193, 264, 294, 306 IP address 36–37, 47–49, 51–52, 54, 179, 181, 191, 287, 306 –– dynamic 47–48 iPhone 5–7, 14, 127
Index ISACA (Information Systems Audit and Control Association) 315 ISP (internet service provider) 47–49, 179, 287–88, 294 Items 25–27, 219–20, 225, 231–34, 241–42, 323, 327, 331, 335 J Java 31, 110, 118, 127, 253 JavaScript 110, 125–27 Job description 17, 322 K Kernel 84–86, 178, 188–89 Key 32, 88–89, 182–85, 190, 203, 205, 254, 279, 282 Key performance indicators (KPIs) 274 Keyboard 37–39, 43, 82–83, 85, 87–89, 138–39, 175–76, 188–90, 325 Keyboard buffer 88–89 Keystrokes 51, 74, 88–89, 189–90 Knowledge management 270, 280 KPIs (Key performance indicators) 274 L LAN. See Local area network Language 6, 31, 38, 70, 107, 109, 118, 141, 149–50 –– assembly 107–9 –– machine 31, 107, 117–18 Laptop 61, 85, 87, 165, 169, 187, 288, 322–23 Law 193, 233, 236–37, 239, 289–93, 297, 333 Law enforcement 58, 287–89, 294–97 Layers 43–44, 90, 94, 101, 127, 166–67 –– application logic 101 –– presentation 43–44, 101, 127, 166 –– presentation logic 101 –– transparent electrode 90–91 LCD (liquid crystal display) 91, 95 LCD monitor 97–98 Leadership 6–10, 19, 21, 215 Legacy systems 199–200 Legal action 235, 287, 289–90, 297, 306 Letters 37–39, 115–16, 151, 157, 163, 239, 241–42, 301, 303 –– charter/engagement 312 –– engagement 311–12, 326 Liabilities 224, 235, 256, 314 Library 62, 116–19 Light 17, 39, 42, 63, 81, 89, 97, 173, 193
351
–– near-infrared 172–73 Limitations 8, 222, 254–56, 259, 316, 318, 330, 339–40 Linear programming 270 Link 50, 53–54, 118, 142, 144, 148–49, 161–62, 178, 180 Linux 83–85 Liquid crystal display (LCD) 91, 95 Local area network (LAN) 45, 200, 228, 247 Location 60, 62, 87–92, 100, 104, 302–3, 318, 320, 335–36 –– organization’s 227, 318, 326 –– remote 57–58, 195, 263, 298 Log 51, 53, 165, 168, 190, 264, 293–94, 300, 335 Login information 60, 126, 190 Loop 29, 31–32, 106, 115–16 M MAC (Media Access Control) 47, 186 MAC address 47, 186, 294 Mac OS 83–85 Machine code 107–9, 113, 118 Macro 4, 121, 189–90 Macro virus 189–90 Mainframe computers 62, 85, 144 Maintenance 213, 219–20, 248, 252, 257–58, 276, 323 Malware 54, 60, 87, 177–78, 182, 187, 190, 194 Management 219–20, 222, 224, 226, 228, 230, 232, 234, 236 Management information systems (MIS) 110, 182, 273 Management processes 256 Managers 99, 105, 199–200, 219, 269, 315–16, 324–25, 331–32, 334–35 Managing risk 21, 195 Manufacturers 33, 47, 97, 150, 167, 240, 243 Margins 278–79 Master file table 92–93 Mean time between failures (MTBF) 213, 243 Measurements 14, 73, 81, 95, 170–74, 205 Measures, disaster recovery control 206 Media 42, 183–84 Media Access Control. See MAC Mediator 232–34, 291 Meeting 20, 27–28, 42, 222–23, 238, 241, 320, 325, 340–41 Members 1, 7, 11–12, 27, 35, 177, 211, 214–16, 226
352
Index
Memorandum 241 Memorandum of understanding 241 Memory 63–71, 76–83, 86, 107–8, 113–14, 136, 296, 299, 305 –– gigabyte of 63, 78 –– nonvolatile 77 Memory address 69–71, 76, 86, 113–16, 188 Memory blocks 70 Memory cell 80–82 Memory chips 80–82 Memory locations 68, 114 Methodologies 25, 214, 316, 318, 326, 335–36, 339 Metrics 213, 262, 330 Microchip 89–90 Microprocessor 166–67 Microservices 257–59, 262, 264–65 Microsoft 1, 110, 118, 141, 175, 182, 192, 246 Milestones 14, 21, 228 MIS (management information systems) 110, 182, 273 MIS department 177, 180, 182, 187, 199, 213 Mobile communication devices 56–57 Mobile communications device 56 Mobile computing device 56, 194, 252 Mobile device 55–56, 126–28 Mobile Telephone Switching Office. See MTSO Model building 269 Models 29, 32–33, 43, 144, 269–70, 272, 274–77, 279 – deterministic 270 –– probabilistic 270 –– simulation 275, 277 Monitoring 59–60, 196, 314 Monitors 54, 56, 60, 67, 70, 95–98, 183, 192, 223 Motherboard 64–65, 67, 69–72, 76, 82, 96 Mouse 1, 51, 61, 64, 67, 70, 82–85, 87, 89–90 MTBF. See mean time between failures MTSO (Mobile Telephone Switching Office) 56–57 Multiple applications 43, 103, 166, 258 Multiprocessing 75 Multitasking 75, 85 N –– Name, customer’s 135, 145–46 –– first 133, 135, 148, 154, 156, 187, 282 –– last 133, 135, 138, 143, 148, 154, 156–57, 161 NAP (network access point) 48, 60
Natural language processing 282 Negative voltage 41–42 Negotiating 219, 226, 229, 231 Negotiation process 224–26, 230 Negotiation strategy 226, 230 Negotiations 222, 224–27, 229–32, 236, 238, 241 Net amount 153, 159, 161 Network 35–36, 43–49, 51, 55–60, 62, 165–66, 168–69, 178, 193 –– home 48 –– mobile 57 –– organization’s 57, 186 –– public 57, 245, 252 –– remote 48, 58 Network access point (NAP) 48, 60 Network access points 48, 60 Network administrator 49, 59–60 Network cables 37, 46, 58–59, 199, 201 –– central 46 Network connections 45, 55, 85, 245 Network database model 144 Network designers 58–59 Network devices 35–37, 42–49, 54–55, 58–60, 72, 166, 193, 294 Network engineers 49, 193 Network interface 44 Network interface card (NIC) 47 Network layer 43–44, 166, 183 Network operating system 59 Network packets 186, 193 Network protocols 45, 47, 55, 59 –– physical 43, 166 Network routers 36, 199 Network segments 46, 60 Network server 62 Network sniffers 193 Network technologies 45 Network traffic 37, 256 Network type 43, 166 New Technology File System (NTFS) 92 NIC (network interface card) 47 NJ 155, 158 Non-compliant 313, 320, 340 Non-human resources 322–23 Nonlinear programming 270 Normalization 147–48 NoSQL database 133 NTFS (New Technology File System) 92 NULL 139, 152–53
Index Number 36–40, 63, 65–68, 71–77, 93–96, 114, 137–40, 146–48, 175–76 –– binary 39–40, 63, 301 –– decimal 39, 63, 76 –– order 145, 162 –– partial 114, 139 Number system 14, 301 –– binary 39, 301 Numbering systems 38–39, 107, 139, 301 Numeric values, binary 39 O Object oriented design 32-24 Objectives 315–16, 318, 326 –– organization’s 311, 318 Objects 32–33, 40, 87, 119–21, 144, 265 Occurrence 22, 198, 240, 336, 338 Office 4, 35, 59, 61–62, 121, 203, 227 Office communications network 35–37 Offset 302–3 Operating system 59–60, 82–85, 87–90, 127, 167, 178, 188–90, 303–4, 306–8 Operating system programs 59 Operations 82, 196–201, 203–4, 214–15, 260–61, 314, 317, 324, 326–27 –– normal 81, 187, 195–96 –– organization’s 62, 200, 211, 237, 252, 257, 327, 329 –– vendor’s 211, 327 Operations process 261 Operations teams 261–62 Operator 156–57, 192 Options, best 269–70, 272–73 Order entry application 197, 199, 201–2, 219–21, 223, 230, 234, 236–40, 328 Order entry process 198, 220, 333 Order table 145–48, 162 Orders 43–44, 145–47, 155, 157–58, 161–62, 190–91, 200–202, 311–12, 337–38 Organization 6–13, 176–80, 194–217, 219–32, 235–43, 245–57, 262–65, 311–20, 323–30 –– private 174 Organization chart 325 Organization contracts 211, 227 Organization name 175–76 Organization’s data 246, 254, 263–64, 280 Organization’s data center 210, 245–46 Organization’s policies 54, 200, 261, 311, 317 OSI model 43–44, 166–67, 183
353
Outcome 5–7, 214, 219–20, 222, 229–30, 272–73, 275–76, 319, 334 Outlook 43, 51, 53, 70, 166 Output 44, 75–76, 85, 111, 125, 131, 268, 276, 278–79 Outsource 18, 22–23, 249 P Package 36, 208 Packet number 45 Packets 36–37, 43–48, 55, 58, 60, 180–82, 185–86, 193, 306 –– routes 46 Packet-switching networks 47 PAN (personal-area network) 55 Partitions 92, 303–4 Partners 144, 235, 324 Password 31, 106, 165, 170–72, 175–78, 187, 320–22, 325, 332 Password policy 175–76 Path, critical 16, 18, 20, 322 Pattern matching 157 Patterns 89, 133, 173, 190, 275, 280–82 Payments, processes credit card 258 PayPal 258 PDF files 169, 273, 282, 328 Performance 18, 130, 237–38, 242–43, 268, 271, 274, 312, 315 Personal-area network (PAN) 55 Phases 9, 14, 261, 272–73, 275 –– intelligence 272–73 Photograph 96, 335 PHP 110, 128–29 PHP program 128–29 Physical access 170–71, 173, 175, 177, 201 Physical evidence 294 Picture 35, 48–49, 74, 126, 135, 137–39, 170, 172–74, 187 Pixels 74, 79, 95–97, 116 Placeholders 112, 114 Plaintiff 290–93, 295 Plan 28–29, 59, 98–99, 195, 209–10, 214, 216, 311, 315 –– risk management 21–22 Planning 9–10, 20, 26–29, 205, 215, 217, 317, 319, 325 –– rapid 28–29 PLC (programmable logic controller) 166–67 PMBOK (Project Management Body of Knowledge) 11
354
Index
PMP (Project Management Professional) 11 Policies 13, 168, 280, 312–15, 319–22, 324–27, 330–32, 336, 340 Policies and procedures 200–202, 317–19 POP3 server 51–53 Popular high-level programming languages 110 Port 49–53, 70, 72, 94, 181, 296 Port numbers 49 Post office 36, 46, 208–9 Postal code 36, 135, 137, 157 Power 46, 61, 65–66, 77, 196–97, 205–6, 209–10, 212, 305 –– backup 212, 216–17 PowerPoint 95, 119, 138 Precision 139–40 Preplanning 317–19, 325 Preplanning process 317–18 Presentations 35, 43, 138, 166, 169, 267, 302, 314 Presumption 169, 178, 205, 239, 253, 287, 304 Price 147, 222–25, 227, 229–31, 275, 311 Primary data center 210–11 Primary key 143–45, 148, 152–53, 161–62 Print 69, 71, 99, 109 Print queue 71–72 Printers 35–36, 62, 64, 67, 70–72, 75, 83–84, 86, 227 Printing 69, 71, 74, 83, 88 Private cloud 249–51, 259 Private network, virtual 57–58, 192, 245 Probability 21–22, 167–68, 172, 175, 197–98, 206–7, 270, 276, 279 Problem statement 272 Proceedings 288, 290, 292, 297 –– legal 290, 293, 296–97 Process control block 69–70 Process identification number 69 Processing 66–67, 69, 74, 78, 86–88, 100–102, 104–5, 111, 247–48 Processing instructions 69, 75 Processing time 69, 148 Processor 63–77, 81–90, 92–93, 99–100, 107–10, 112–16, 118, 188, 190 –– 32-bit 63 –– embedded 166 –– multiple 69, 73, 75, 284 –– word 88, 134–35 Processor executes 74, 107, 116, 118 Processor executes instructions 84, 114 Processors and memory 63, 82
Procurement 219–21, 223 Procurement process 220 Production 7, 23–24, 147, 214, 260, 268 Production environment 130–31, 246 Products 8–11, 19, 81, 147, 150, 228, 239–40, 264–65, 282–83 –– office 121, 134, 169 Product/service 223–24, 227, 230, 235, 239 Program 31–32, 68–71, 83–91, 97–100, 104–22, 127–29, 165–67, 187–88, 304–8 –– anti-malware 177 –– assembly language 108–9 –– cleaning 93 –– compiled 118 –– cracking 176 –– executable 107, 109, 200 –– group of 59, 83 –– hashing 300 –– kind-of 121–22 –– legitimate 188 –– malicious 167 –– real 112, 115, 190 –– sort 85 –– transaction 105 –– virus detection 188–89 Program executes 98, 112 Program instructions 89, 98 Programmable logic controller (PLC) 166–67 Programmable read-only memory. See PROM Programmers 31–34, 68, 84–85, 100, 104–5, 107–10, 113–26, 129–31, 188–89 Programming 29, 32–34, 68 –– linear 270 Programming languages 31, 68, 107, 110–11, 118, 125, 127–28 –– interpreted 118 Progress 11–12, 14, 25–26, 261 –– significant 14 Progress payments 220, 228 Project 7–22, 24–28, 60, 134, 219–34, 240–43, 245, 270 –– completed 10 –– new 7, 10, 24 Project charter 7–10, 20, 23–24 Project management 9, 25 –– agile 25–26, 28 –– enterprise 6 Project Management Body of Knowledge (PMBOK) 11 Project Management Institute 11
Index Project Management Professional (PMP) 11 Project management software 16 Project manager 8–22, 24–25, 27–28, 220, 222–24, 229, 234 –– clinical 22 Project manager time 234–35 Project owner 27 Project plan 8, 10, 13–17, 20–21, 312, 319 –– revised 21 Project sponsor 7–10, 12–13, 21, 23–24, 130 Project sponsor and stakeholders 23–24 Project team 9–14, 17, 20, 23–28, 219 Project team members 11–12, 20 Project template 10 Projected capacitive touchscreen 90–91 PROM (Programmable read-only memory) 78–79 Proposal 3, 7, 220, 222–24, 231 –– request for 222–24 –– vendor’s 224, 226 Prosecutor 289–93, 295, 340 Protocols 44, 46, 55, 58, 181, 186, 316 Prototype screens 34 Proxy server 54, 60, 179–80, 191 –– organization’s 179–80 Pseudocode 6, 31–32, 34, 99, 105–7, 109, 111, 113, 115 Public cloud 249–50, 259 Public key 184–85 Pulses 65–67 Q Quality audit 329 Queries 150–52, 160, 162–63, 169, 255, 257, 281 R Racketeer Influenced and Corrupt Organizations (RICO) 341 RAM (random access memory) 76–79, 305 Rambus dynamic random access memory (RDRAM) 78 Random access memory. See RAM –– dynamic 78 Range 4, 55, 97, 157, 173, 181, 270, 276, 279 RDRAM (Rambus dynamic random access memory) 78 Read-only memory. See ROM Real-time Operating System (RTOS) 85 Real-time planning 28
355
Recommendations 200, 239, 312–13, 340–41 Recording 2, 4, 168, 170, 172, 299, 317 Recording industry 2–3 Records 2, 135, 144, 288, 293, 295, 309, 317, 319 Recover 93, 201, 204–6, 216, 261, 307–8 Recovery 202, 204, 209, 212–13, 243, 251 Recovery point 199, 201 Recovery point objective (RPO) 201 Recovery time objective. See RTO Regions 48, 206, 210–11, 264 Registers 66–67, 69, 73, 107–9 Relationship capital 12–13 Remote servers 126, 128, 246–47, 263 Repeater 46, 55 Report 20, 70, 74, 169, 269, 271, 273, 295–96, 339–40 –– official evidence 295 Request 48–50, 52–54, 140–41, 179–80, 188, 190–91, 221–22, 287–88, 290 –– print 71 Request information 149, 221 Requirements 11, 21, 26–27, 205, 223, 313, 315, 324, 329 –– organization’s 255–56 –– regulatory 200, 214, 220, 238, 250, 263, 327, 329 Resource allocation 18, 20 Resource cost 323 Resource list 18, 323 Resources 11, 15–19, 21–22, 25, 247–49, 251, 254–55, 259–60, 322–24 Response options 210–11 Response time 36, 95, 105, 130, 144, 255, 324 Responsibilities 207, 213–14, 221, 223, 229, 253, 256, 315, 324–25 Restrictions 54, 180, 316, 318–19, 326, 339 Retrieve 52, 76–77, 87–88, 93, 102, 135–36, 138, 155–56, 307 Return rows 156–57 Revenue 22, 203, 208–9, 235, 274 Revenue stream 3, 208–9 RICO (Racketeer Influenced and Corrupt Organizations) 341 Risk 8, 21–22, 195, 197–98, 200–202, 206–7, 223, 269, 276, 327–28 –– detection 21, 327 –– direct 197 –– indirect 197 –– obvious 178, 197
356
Index
–– operational 22, 250, 328 –– residual 22, 328 –– technological 22, 328 Risk assessment 22, 197–203, 205, 207, 209, 211, 213, 312, 327 –– asset-based 197–98 Risk Management and Disaster Recovery 195–96, 198, 200, 202, 204, 206, 208, 210, 212 Risk tolerance 8, 22, 170, 206–7, 276 ROM (read-only memory) 76–79, 81 Router 35–37, 45–46, 48, 59, 72, 193, 201, 306 Rows 79, 81, 95, 97, 135, 143–48, 152–59, 161, 282 RPO (Recovery Point Objective) 201 RTO (Recovery Time Objective) 201 RTOS (Real-time Operating System) 85 Rules 43, 45, 145–46, 180–81, 239, 278–79, 290–92, 311, 314–15 –– filtering 180–82 –– validation 140 S Sales 105, 177, 191, 223, 239, 274, 311 Sales information 216 Sales order application 216 Scale, economies of 247–50 Schedule 14–16, 18, 20–21, 189, 227, 243, 319, 322 Scope 19, 204, 259, 312, 316, 318–19, 326, 330, 336 Screen 32–36, 49–53, 61, 70–71, 79–80, 82–93, 95–98, 119–22, 302–5 Screen size 96, 121 Script 49, 121, 125–26 Scrum 26-27 SDKs (software development kit) 127 SDRAM (Synchronous dynamic random access memory) 78 Search 134–35, 148, 154, 156–57, 179, 181, 246, 277, 287–88 –– partial 277 Search criteria 134, 148, 154, 157, 282 Search warrant 288, 294, 297 Sectors 91–94, 300, 302, 307 Secure hash algorithm (SHA) 300 Secure Sockets Layer. See SSL Security 60, 186, 194, 201–2, 211, 249–50, 253–56, 260, 263–65 Security group 177
–– representative 177 Segment 44, 228, 274, 319 Select Customer Number 156–57, 161–63 Select statement 155–56, 158–59, 162 Sentiment analysis 282–83 Server side 100–101, 128–29 Servers 48, 51, 62, 85, 100–101, 127, 181, 201, 247–48 –– print 62 Service-level agreement. See SLA Services 18–19, 178, 192, 211–14, 219, 243, 247–50, 253, 255–58 –– computing 245 –– denial of 191, 194, 263–64 –– minimum 213, 243 –– sharing 178–79 Settlement 226, 240, 291 SGRAM (Synchronous graphics random access memory) 78 SHA (secure hash algorithm) 300 Sharing Parts of Programs 116–17 SID 56–57 Signal 46, 55, 81, 85–86, 89, 191, 309 Simon’s decision-making process 272–73 Simple Mail Transfer Protocol. See SMTP Simple Project 25, 27 Simulation 270–72, 277 SLA (service-level agreement) 213, 243, 254, 256 Smartphones 5, 56–57, 61, 72, 126, 133, 169, 175, 186–87 SMTP (Simple Mail Transfer Protocol) 51–52, 181 SMTP server 51–52, 181 Social security number 143, 174 Software 44, 49, 51–53, 245, 248, 264, 267, 296, 299 Software application 4, 224 Software development kit (SDK 127 Software programs 49 Source code 100, 107, 109–10, 117–18, 200, 328 Specifications 11–12, 23–24, 59, 105, 213, 222–24, 227, 229–31, 239 Spreadsheet 189, 269, 282, 313–14 Sprint 26–27 SQL (Structured Query Language) 110, 141, 149–53, 155, 157–58, 163 SQL statements 149–50 SRAM (Static random access memory) 77–78 SSL (Secure Sockets Layer) 183–85
Index Staff members 262, 314, 324, 329, 331, 334–35 Stakeholders 5–6, 8, 10–13, 20, 23–28, 30, 315, 319, 340–41 Standardization 37, 43, 150 Standards 150, 237–38, 313–14, 317, 319–22, 324–27, 329–32, 336, 340 –– industry 200, 232, 237–38 Standards body 37, 150, 317 Static random access memory. See SRAM Stockholders 235–36 Storage 72, 245–46, 248–50, 254, 256 Strategies 35, 225–26, 276, 284–85, 318, 326 Strength 182, 226, 229–30 Stress 130, 217 Structure 6, 8, 25, 109–10, 118, 144, 150, 163, 313 –– work breakdown 14, 319 Structured data 137–38, 282 Structured Query Language. See SQL Subcomponents 234, 319 Subcontractors 219, 221, 229 Subnet 36–37, 46–48 Subnet masks 48 Subroutines 120–21 Subtasks 10, 14–17, 25, 27, 85, 111, 312, 319–21, 323 Supervised learning process 278 Support 10, 13, 29, 200, 217, 220, 292, 333, 335 Sustainability 28, 195, 197–98, 206, 210–11, 219, 253, 255, 326 Switches 46, 61, 63–64, 66, 69–70, 76, 79, 83, 97 Symbols 29, 37–38, 108–9, 122, 139, 305 Synchronous dynamic random access memory (SDRAM) 78 Synchronous graphics random access memory (SGRAM) 78 Systems 23–24, 99, 130, 170, 172, 176–77, 198–201, 203, 267–68 –– authentication 170 –– automated decision 278 –– backup 200, 216, 240 –– business performance management 274 –– collaborative 280 –– layered cybersecurity 170 –– new 24 –– order entry 177, 201, 219, 223, 230, 237, 316, 333 –– organization’s 280, 328 –– real-time operating 85
357
–– tier 196 Systems analyst 6, 11, 17, 31, 105–6, 113, 149, 167 Systems programs 59 T Table Customer TB 152–53 Table name 152, 158, 162–63 Table statement 151–52 Tables 25–26, 38–39, 52–53, 138, 140–45, 147–49, 151–56, 158–59, 161–63 –– interrupt 188, 190 Tablets 56, 61–62, 126, 169 Tasks 4, 9–11, 14–21, 25, 28, 116–17, 219, 278–79, 320–24 –– common 103–4 –– general 10 –– single 85 Tasks and subtasks 14–16, 27, 312, 319–21, 323 Tasks/subtasks 321 –– dependent 321 Task/subtask 320–21, 323–24 Team 3, 6, 12, 24–28, 204, 261, 312, 328, 330 –– emergency response 214–15, 217 –– legal 295, 297, 302 –– small project 25 Team leader 27, 204, 312, 319–22 Team members 12, 20, 25–27, 204, 312 Technology 1–6, 34–35, 150, 256, 258, 267, 311–12, 328–29, 332 –– new 2–3, 150, 264 Telecommunications carrier 56, 58 Telnet 51, 178 Telnet service 178 Termination 225, 242, 254–56 Test 23, 114, 116, 126–27, 129–31, 216–17, 247, 260, 277–78 –– installation 23–24 –– stress 23, 130 Testimonial evidence 293 Testing 129–31, 204–5, 211, 216–17, 220, 225, 260–61, 312, 336–37 Testing process 129 Text analytics 282 Text editor 108, 134 Text file 107–8, 122, 134 Text mining 282–83 Third parties 127, 232, 246 –– unbiased 232 Time evidence 297
358
Index
Time service factor (TSF) 213 Timeframe 9, 316 Timeline 9, 222–23, 316, 321, 326 Timestamp 92, 138, 140 TLS (Transport Layer Security) 184–85 Toolbox 117–20 Tools 29–30, 256, 261–63, 296, 298, 301, 304, 307–8, 311 –– anti-forensics 307–8 Touchscreens 90–91 –– capacitive 90–91 –– infrared 90–91 –– resistive 90–91 Track 13, 69, 71, 91–92, 242, 261, 309, 321, 335 Tracked 287–88, 290, 292, 294, 296, 298, 300, 302, 304 Traffic 37, 46, 65, 252 Transactions 100, 104–5, 146, 202, 267, 336–39 Transceivers 40, 186, 191 Transfer 207, 213, 236, 255, 265 Transistor 80–81, 97 Transmission 33, 43–46, 48, 55–58, 72, 181, 185–86, 193–94, 309 Transmit 42, 46, 55–56, 65, 72, 185, 194 Transmitter 40, 55 Transport 43–44, 166 Transport Layer Security. See TLS Trial 116, 290–93 –– criminal 290–92 TSF (Time service factor) 213 Turnaround time 213 Twisted pair 37, 42 U Unauthorized access 175, 177, 187, 207, 265 Unexpected events 28 Uniform Commercial Code (UCC) 239, 291 Uniform Resource Locator. See URLs Unique name 49, 110, 145, 303 Unit 23–24, 129–30, 215, 260 –– central processing 65, 67, 189 United States 1, 58, 127, 150, 192, 239, 251, 264 Universal serial bus (USB) 72, 304 Unix 59, 83–85, 257 Unlock 175–76, 254, 289 Unstructured data 137–38, 282–83 Upgrades 187, 229, 253, 258–59 Uptime 213 URLs (Uniform Resource Locator) 49, 283
USB (Universal serial bus) 72, 304 USB port 70, 72, 87, 300 User acceptance test 23–24 User interface 49, 59, 100–102, 127, 271 –– graphical 1, 84, 119 Users 52–53, 119–20, 127, 175–78, 281, 302–3, 306–7, 325, 332 –– super 163 V Value 75, 112–17, 139–40, 147–48, 154–61, 227–28, 230, 235–36, 274–75 –– binary 39, 107, 139, 301 –– default 139 –– financial 7 –– group’s 304 –– hexadecimal 107, 301, 303 –– market 227 –– partial 157 –– perceptions of 227 –– return 88, 112, 114, 116, 160 –– second 140 –– shared 12 –– strategic 7 –– valid range of 140 Variable result 112–14 Variables 34, 112–14, 268–70, 274–76 –– dependent 274–75 –– independent 274–75 –– value of 112, 116 VBA (Visual Basic for Applications) 4, 121 VBA program 121 Vendor 195–97, 199–201, 203–4, 211, 213–16, 219–32, 234–43, 249–50, 256–59 –– qualified 220–21 –– specialty 229 Vendor negotiations and management 219–20, 222, 224, 226, 228, 230, 232, 234, 236 Video 48–50, 57, 61, 74, 76, 98, 138–39, 180, 256 Video files 138, 307 Video memory 78–80, 86, 89, 96 Virtual Private Network. See VPN Virtual project teams 12 Virus 181, 187–90, 192 Visual Basic for Applications. See VBA VPN (Virtual Private Network) 57–58, 192, 245 VPN network client 58 Vulnerabilities 167, 194, 198, 200, 264, 300
Index W WAN (Wide area network) 45 WAP (wireless access point) 191–93 WAP, fraudulent 192 Warranty 239–40 Waves 39–42, 46, 55, 73 –– analog 41–42 –– radio 56, 186, 193 Web applications 121, 123, 125 Web applications, developing 110 Web server 48–51, 54, 59, 62, 100, 102, 128, 184, 191 Webpages 50, 54, 59, 100, 102, 122–26, 128–29, 283 –– application’s 102 Website 49–50, 110, 121–22, 124, 136, 179, 191–93, 287, 294–95 Website name 49, 179 Wide area network (WAN) 45 Wi-Fi 35, 37, 55, 64, 186 Wi-Fi connection 186, 201 Wi-Fi hotspots 55, 58
359
Wi-Fi network 55, 186, 193 Windows 49, 83–85, 87, 89, 92, 98, 118–20, 134–36, 258 Windows searches 134 Wireless access point. See WAP Word documents 3, 122, 134–36, 146, 282, 328 Work computer 54, 165, 182 Work group 11 Work package 14, 29, 319–20 Workaround 138, 331 Workbooks 134–35 Workflows 13, 29–30, 99, 105, 136, 139, 142, 147, 149 Worksheet 135, 151 –– project prioritization 7 Workstations 61, 74, 76, 296 X, Y, Z XML 144–45 XML database model 144 XML Tags 144–45