318 102 2MB
English Pages xxviii, 347 pages: illustrations; 24 cm [237] Year 2007;2008
IBM PRESS NEWSLETTER Sign up for the monthly IBM PRESS NEWSLETTER at ibmpressbooks.com/newsletters
LEARN • NEW PODCASTS from your favorite authors • ARTICLES & INTERVIEWS with authors • SPECIAL OFFERS from IBM Press and partners • NOTICES & REMINDERS about author appearances and conferences
WIN Sign up for the IBM PRESS NEWSLETTER and you will be automatically entered into a QUARTERLY GIVE-AWAY for 3 months access to Safari Books Online – online access to more than 5000 books A $150 VALUE!
Sign up at ibmpressbooks.com/newsletter
REGISTER YOUR BOOK ibmpressbooks.com/ibmregister REGISTRATION ENTITLES YOU TO: • Supplemental materials that may be available • Advance notice of forthcoming editions • A coupon that can be used on your next purchase from ibmpressbooks.com
Visit ibmpressbooks.com for all product information
Related Books of Interest
A Practical Guide to Trusted Computing by David Challener, Kent Yoder, Ryan Catherman, David Safford, and Leendert Van Doorn ISBN: 0-13-239842-7
Every year, computer security threats become more severe. Software alone can no longer adequately defend against them: what’s needed is secure hardware. The Trusted Platform Module (TPM) makes that possible by providing a complete, open industry standard for implementing trusted computing hardware subsystems in PCs. Already available from virtually every leading PC manufacturer, TPM gives software professionals powerful new ways to protect their customers. Now, there’s a start-to-finish guide for every software professional and security specialist who wants to utilize this breakthrough security technology. Authored by innovators who helped create TPM and implement its leading-edge products, this practical book covers all facets of TPM technology: what it can achieve, how it works, and how to write applications for it. The authors offer deep, real-world insights into both TPM and the Trusted Computing Group (TCG) Software Stack. Then, to demonstrate how TPM can solve many of today’s most challenging security problems, they present four start-to-finish case studies, each with extensive C-based code examples.
Understanding DB2 Learning Visually with Examples, Second Edition by Raul F. Chong, Xiaomei Wang, Michael Dang, and Dwaine R. Snow ISBN: 0-13-158018-3
IBM® DB2® 9 and DB2 9.5 provide breakthrough capabilities for providing Information on Demand, implementing Web services and Service Oriented Architecture, and streamlining information management. Understanding DB2: Learning Visually with Examples, Second Edition, is the easiest way to master the latest versions of DB2 and apply their full power to your business challenges. Written by four IBM DB2 experts, this book introduces key concepts with dozens of examples drawn from the authors’ experience working with DB2 in enterprise environments. Thoroughly updated for DB2 9.5, it covers new innovations ranging from manageability to performance and XML support to API integration. Each concept is presented with easy-to-understand screenshots, diagrams, charts, and tables. This book is for everyone who works with DB2: database administrators, system administrators, developers, and consultants. With hundreds of well-designed review questions and answers, it will also help professionals prepare for the IBM DB2 Certification Exams 730, 731, or 736. Listen to the author’s podcast at: ibmpressbooks.com/podcasts
Sign up for the monthly IBM Press newsletter at ibmpressbooks/newsletters
Related Books of Interest
Implementing ITIL Configuration Management
RFID Sourcebook
by Larry Klosterboer
ISBN: 0-13-185137-3
ISBN: 0-13-242593-9 ®
®
The IT Infrastructure Library (ITIL ) helps you make better technology choices, manages IT more effectively, and drives greater business value from all your IT investments. The core of ITIL is configuration management: the discipline of identifying, tracking, and controlling your IT environment’s diverse components to gain accurate and timely information for better decision-making. Now, there’s a practical, start-to-finish guide to ITIL configuration management for every IT leader, manager, and practitioner. ITIL-certified architect and solutions provider Larry Klosterboer helps you establish a clear roadmap for success, customize standard processes to your unique needs, and avoid the pitfalls that stand in your way. You’ll learn how to plan your implementation, deploy tools and processes, administer ongoing configuration management tasks, refine ITIL information, and leverage it for competitive advantage. Throughout, Klosterboer demystifies ITIL’s jargon and illuminates each technique with real-world advice and examples.
by Sandip Lahiri Approaching crucial decisions about Radio Frequency Identification (RFID) technology? This book will help you make choices that maximize the business value of RFID technology and minimize its risks. IBM’s Sandip Lahiri, an experienced RFID solution architect, presents upto-the-minute insight for evaluating RFID; defining optimal strategies, blueprints, and timetables; and deploying systems that deliver what they promise. Drawing on his experience, Lahiri offers candid assessments of RFID’s potential advantages, its technical capabilities and limitations, and its business process implications. He identifies pitfalls that have tripped up early adopters, and shows how to overcome or work around them. This must-have resource can also act as a reference guide to any nontechnical person who wants to know about the technology. From building business cases to testing tags, this book shares powerful insights into virtually every issue you’re likely to face. Whatever your role in RFID strategy, planning, or execution, have Sandip Lahiri’s experience and knowledge on your side: You’ll dramatically improve your odds of success.
Listen to the author’s podcast at: ibmpressbooks.com/podcasts
Visit ibmpressbooks.com for all product information
Related Books of Interest Lotus Notes Developer’s Toolbox Elliott ISBN: 0-13-221448-2
IBM Rational Unified Process Reference and Certification Guide
Mainframe Basics for Security Professionals
Shuja, Krebs
Getting Started with RACF
WebSphere Business Integration Primer
by Ori Pomerantz, Barbara Vander Weele, Mark Nelson, and Tim Hahn ISBN: 0-13-173856-9
For over 40 years, the IBM mainframe has been the backbone of the world’s largest enterprises. If you’re coming to the IBM System z® mainframe platform from UNIX®, Linux®, or Windows®, you need practical guidance on leveraging its unique security capabilities. Now, IBM experts have written the first authoritative book on mainframe security specifically designed to build on your experience in other environments. The authors illuminate the mainframe’s security model and call special attention to z/OS® security techniques that differ from UNIX, Linux, and Windows. They thoroughly introduce IBM’s powerful Resource Access Control Facility (RACF®) security subsystem and demonstrate how mainframe security integrates into your enterprise-wide IT security infrastructure. If you’re an experienced system administrator or security professional, there’s no faster way to extend your expertise into “big iron” environments.
ISBN: 0-13-156292-4
Iyengar, Jessani, Chilanti ISBN: 0-13-224831-X
Understanding DB2 9 Security Bond, See, Wong, Chan ISBN: 0-13-134590-7
Mining the Talk Spangler, Kreulen ISBN: 0-13-233953-6
Service-Oriented Architecture (SOA) Compass Bieberstein, Bose, Fiammante, Jones, Shah ISBN: 0-13-187002-5
Persistence in the Enterprise Barcia, Hambrick, Brown, Peterson, Bhogal ISBN: 0-13-158756-0
Sign up for the monthly IBM Press newsletter at ibmpressbooks/newsletters
Policy Technologies for Self-Managing Systems
This page intentionally left blank
Policy Technologies for Self-Managing Systems
Dakshi Agrawal Seraphin Calo Kang-Won Lee Jorge Lobo Dinesh Verma IBM Press Pearson plc Upper Saddle River, NJ • Boston• Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Cape Town • Sydney • Tokyo • Singapore • Mexico City ibmpressbooks.com
The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. © Copyright 2009 by International Business Machines Corporation. All rights reserved. Note to U.S. Government Users: Documentation related to restricted right. Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corporation. IBM Press Program Managers: Tara Woodman, Ellice Uffer Cover design: IBM Corporation Associate Publisher: Mark Taub Marketing Manager: Kourtnaye Sturgeon Publicist: Heather Fox Acquisitions Editor: Bernard Goodwin Managing Editor: Patrick Kanouse Designer: Alan Clements Senior Project Editor: Tonya Simpson Copy Editor: Mike Henry Indexer: Tim Wright Compositor: TnT Design, Inc. Proofreader: Williams Woods Publishing Services Manufacturing Buyer: Dan Uhrig Published by Pearson plc Publishing as IBM Press IBM Press offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U. S. Corporate and Government Sales 1-800-382-3419 [email protected]. For sales outside the U. S., please contact: International Sales [email protected]. The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both: IBM, the IBM logo, IBM Press, AIX, OS/2, Tivoli, and WebSphere. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
Library of Congress Cataloging-in-Publication Data Policy technologies for self managing systems / Dakshi Agrawal ... [et al.]. p. cm. ISBN 0-13-221307-9 (hardback : alk. paper) 1. Systems engineering. I. Agrawal, Dakshi. TA168.P58 2008 658.4’03—dc22 2008034941 All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to: Pearson Education, Inc Rights and Contracts Department 501 Boylston Street, Suite 900 Boston, MA 02116 Fax (617) 671 3447 ISBN-13: 978-0-13-221307-3 ISBN-10: 0-13-221307-9 Text printed in the United States on recycled paper at R.R. Donnelley in Crawfordsville, Indiana. First printing September 2008
Dedicated to our colleagues at IBM who supported and helped our work on policy technologies; and to our families, for their support during the preparation of this book.
Contents
Foreword
xi
Preface
xiii
Chapter 1. Policy Definition and Usage Scenarios 1.1. Formal Definition of Policy 1.1.1. Types, Nature, and Usage of Policies 1.2. Policy-Based Self-Configuration 1.3. Policy-Based Self-Protection in Computer Networks 1.4. Policy-Based Self-Optimization in Computer Systems 1.5. Policy-Based Self-Healing 1.6. Building a Policy-Based Management System 1.7. Summary
1 2 6 10 13 15 16 17 20
Chapter 2. Policy Lifecycle—Creation, Distribution, and Enforcement
21
2.1. A Holistic View of the Policy Lifecycle 2.2. Instances of Policy-Based Systems 2.2.1. Network QoS Control 2.2.2. Privacy Policy Publication 2.2.3. Policy-Based Management of Enterprise Network Access 2.3. Policy Creation 2.4. Policy Distribution 2.5. Policy Distribution Using Repositories 2.5.1. Grouping of Policies by System Components Role 2.5.2. Grouping of Policy Components
22 25 25 27 28 30 31 35 36 37
vii
viii
Contents
2.6. Policy Creation and Distribution for Multiple Administrative Domains 2.7. Policy Enforcement 2.7.1. Policy Evaluation Trigger 2.7.2. Policy Enforcement Context 2.7.3. Data Gathering 2.7.4. Policy Evaluation 2.7.5. Decision Execution 2.8. Summary
Chapter 3. Policy Information Model 3.1. How Is an Information Model Described? 3.2. Policy Information Models 3.2.1. Why Use Information Models 3.2.2. Condition-Action Information Model 3.2.3. Event-Condition-Action Information Model 3.2.4. Mode-Subject-Action-Target Information Model 3.2.5. Grouping, Scope, and Priorities 3.3. A Standardized Policy Model 3.3.1. The Common Information Model (CIM) 3.3.2. The CIM Policy Model 3.4. Summary
Chapter 4. Policy Languages 4.1. Declarative Nature of Policy Languages 4.2. Survey of Policy Languages 4.2.1. PDL 4.2.2. Ponder 4.2.3. CQL 4.2.4. XACML 4.2.5. ACPL 4.3. CIM-SPL 4.3.1. CIM-SPL Policy Rules 4.3.2. Policy Groups 4.3.3. An Example of CIM-SPL Policy 4.4. Summary
38 41 42 44 45 46 49 50
51 52 54 55 56 59 59 60 62 62 63 69
71 72 73 73 76 79 81 81 82 82 87 89 91
Contents
Chapter 5. Policy Transformation and Analysis 5.1. Policy Transformation 5.2. Design-Time Techniques for Policy Transformation 5.2.1. Transformation Using Analytical Models 5.2.2. Transformation Using Static Rules 5.2.3. Transformation by Policy Table Lookup 5.2.4. Transformation Using Case-Based Reasoning 5.3. Real-Time Policy Transformation 5.4. Policy Analysis 5.4.1. Conflict Checking 5.4.2. Conflict Resolution 5.4.3. Coverage Checking 5.4.4. What-If Analysis 5.5. Related Work 5.6. Summary
Chapter 6. Policy-Based Configuration Management 6.1. Configuration Management Overview 6.2. Policy-Based Configuration Management 6.2.1. Policy-Based Simplification of Configuration Management 6.2.2. Policy-Based Tuning of System Configuration 6.2.3. Policy-Based Checking of System Configuration 6.3. Example in Storage Area Networks 6.3.1. Configuration Checking of Storage Area Networks 6.3.2. Policy Modeling and Representation 6.3.3. Architecture of a Policy-Based SAN Configuration Checker 6.4. Example in Hosted Server Environment 6.4.1. Architecture for Self-Configuration 6.4.2. Variations on the Architecture 6.5. Summary
Chapter 7. Policy-Based Fault Management 7.1. Fault Management Overview 7.1.1. Fault Management in Networks 7.1.2. Fault Management in Web-Based Applications
ix
93 94 95 96 96 97 99 104 106 106 109 111 112 113 114
115 116 118 118 119 120 121 122 125 128 131 133 136 137
139 139 141 144
x
Contents
7.2. Policy-Based Fault Management 7.2.1. Policy-Based Acquisition of Fault Information 7.2.2. Policy-Based Format Conversion 7.2.3. Policy-Based Event Volume Reduction 7.2.4. Policy-Based Root Cause Analysis 7.2.5. Policy-Based Remedial Action 7.3. Architecture of a Policy-Based Fault Management System 7.4. Summary
Chapter 8. Policy-Based Security Management 8.1. Overview of Security Management 8.2. Policy Applications in Security 8.2.1. Policy-Driven Access Control 8.2.2. Higher-Level Access Policies 8.2.3. Policy-Based Self-Protection 8.2.4. Policy-Based Communication Assurance 8.3. Policy-Based Security Assurance for IPsec Protocol 8.3.1. Business Needs Satisfied by the Security Assurance Tool 8.3.2. Communication Control Policies for IPsec Protocol 8.3.3. Generating the Communication Control Policies 8.4. Summary
Chapter 9. Related Topics 9.1. Production Rules 9.2. Business Rules and Processes 9.3. IT Processes 9.4. Event Correlation and Notification Systems 9.5. Service Level Agreements 9.6. Regulatory Compliance 9.7. Proliferation of Policy-Based Technologies
145 146 147 149 150 151 153 156
157 158 159 160 163 164 168 168 169 170 172 173
175 175 177 179 180 183 185 186
References
189
Index
195
Foreword
It is a great pleasure to write the foreword to this IBM Press book on Policy Technologies for Self-Managing Systems. Self-management of IT systems and processes is a stated goal of the IBM Autonomic Computing initiative, which I have led for the past six years. Simply put, autonomic systems should shift more of the burden of operational decision making from people to the technology, allowing people more time for truly creative activities. In this model, IT systems sense and respond to changing conditions resulting from component failures, workload changes, environmental changes, and so on. Obviously, such characteristics offer the ability to reduce labor costs, improve reliability, utilize equipment more effectively, and many other benefits. However, implementing such a model raises the following question: How does the technology know what we want it to do? This challenge, a fundamental issue in the grand challenge of autonomic computing, is exactly what policy technologies are all about: Policies that allow humans to specify their objectives for self-management—bridging the gap between people and technology—are at the heart of achieving autonomic computing. In its Autonomic Computing initiative, IBM has developed many advances in policy technology that will assist in self-management of systems. This book by the IBM research team provides a good survey of the state of the art of policy technology and discusses the different ways in which they can be exploited to build self-managing systems.
xi
xii
Foreword
From an adoption perspective, policy technologies in systems management are at a relatively early stage. As a result, many IT administrators are not aware of the benefits that can be attained using policy-based management and the different ways to apply policies within their environment. This book does an excellent job of providing an overview of the technology area and showing how they can be applied for tasks such as configuration management, fault management, and access controls. The authors are among the best and most successful group of policy researchers who have contributed to key standards, developed theoretical frameworks, and, more importantly, applied policy technology to build pragmatic selfmanagement solutions in networking, storage systems, and security systems. This is an important topic described by experts in the field, and I am excited that this book is coming to the marketplace through IBM Press. Alan Ganek Chief Technology Officer, Tivoli Software, and Vice President, Autonomic Computing
Preface
Enterprise computer systems have become increasingly complex over the course of the last couple of decades, and a significant percentage of the Information Technology budget of each enterprise is spent on the management and operations of these computer systems. As technology evolves and changes, an increased level of expertise is required in the management and support of the systems. Not only the individual configurations of the computer systems vary widely across organizations, different applications, devices, and users in the enterprise also interact with each other in a complex and subtle manner. These interactions are highly unique to each enterprise environment, and it takes a significant effort to customize management systems for each enterprise environment. In an ideal world, each enterprise computer system would have a management software that would analyze the different interactions, and require the intervention of a human administrator only on an occasional basis. Development of such a system is the goal of research into autonomic computing underway at different research laboratories and universities. This book describes how to build such a system based on policy technology, where a policy is a mechanism to put constraints on the behavior of enterprise systems. The primary thesis of this book is to show how self-management can be enabled using policies. The book does this by presenting a general framework for self-management using policies, which is followed by examples of the application of this framework to the areas of storage systems, computer networks based on Internet Protocol, and security. xiii
xiv
Preface
Who Will Benefit from This Book? This book is intended for operators and architects of enterprise computing systems who want to understand how policy technology can be used to simplify the operations and management of their networks. If you are a management software developer working on a policy enabled product, this book will help you understand the different algorithms and techniques which can be used to efficiently implement the different components of a policy-based solution. Algorithms which are useful for policy management software as well as algorithms needed at routers and servers implementing policy support are included in this book. This book will also be beneficial to management consultants whose clients include operators of enterprise IT systems. If you are a management consultant and would like to understand the technical issues associated with the use of policies in the enterprise, this is the right book for you. If you are a technical professional involved in areas of network or systems management in an enterprise, and want to understand how network policies can simplify your task, you will find this book to be very useful.
Who Is This Book Not For? If you are looking for a book describing government or legal policies related to enterprise networks, this book is not for you. This book describes the technical applications of policies regarding configuration and operation of enterprise computer systems, and does not address any legal or business issues. If you are looking for an introduction to systems or network management, this book is not intended for you. This book only addresses those aspects of systems management which relate to policies. Finally, if you are looking for specific details on how to deploy policies in an enterprise using a specific vendor product, this book is not for you. The intention of the book is to describe the general techniques in this field and it does not describe any specific product.
Preface
xv
The Organization of This Book This book has been organized into nine chapters, which can be logically grouped into two parts. Chapters 1 through 5 provide an overview of policy technologies, describing the generic algorithms and techniques that can be used for defining, managing, and using policies. Chapters 6 through 8 demonstrate how policy-based technologies can be used in specific areas of systems management. Chapter 1 introduces the concept of policies, and provides a formal definition of the same. It presents an architecture for building policy-based systems, and discusses the use of policies to achieve the goal of self-management in computer systems—such as the use of policies for developing systems that are selfconfiguring, self-optimizing, self-healing, and self-protecting. Chapter 2 provides a review of the lifecycle through which policies are used to create a policy-based self-managing system, discussing the approaches that can be used to define policies and distribute them among the different components of a self-managing system. Chapter 3 describes the information model associated with policies—that is, the logical structure of a policy and what it contains. It explores different alternate ways to represent the information model and provides an overview of the Common Information Model representation of policies, a standard specification from the Distributed Management Task Force. Chapter 4 describes how the information models described in Chapter 3 can be represented in a concrete format using a policy definition language. A policy language is used to define policies in a manner that can be exchanged and communicated among different components of a computer system. The chapter looks at the details of various policy languages used in different policy management systems. Although policies allow the development of self-managing systems, one needs to validate that the policies used for self-management are not inconsistent or inappropriate. A self-managing system with inconsistent policies can be disastrous. Chapter 5 describes how the different policies specified by an administrator can be validated and checked for inconsistencies, mistakes, or impossible targets. It also describes how policies entered by a human administrator can be transformed into policies that are precise and enforceable by a computer element.
xvi
Preface
Chapter 6, 7, and 8 discuss how to apply policy-based technologies to the different areas of computer systems management, and to develop systems that can automatically react to different issues arising in those areas of management. Chapter 6 describes how the policies can be used to simplify the task of systems configuration management. In addition to discussing the different applications of policies into simplifying systems configuration and determining policy-based inconsistencies in system configuration, it provides a detailed example of using policies to validate the configuration of a storage area network. Chapter 7 discusses the applications of policy technologies to the task of computer systems fault management. It describes how policies can be used to build a system that can automatically react to the faults and error conditions that can arise in a computing environment. Chapter 8 describes the different applications of policies in managing security of computer systems. It discusses architectures for policy-based self-protecting systems, and provides an example of using policies to support business-level secure communications requirements using network communications technology such as the IP security protocol. Chapter 9 discusses some advanced topics that are related to the concept of policies including a discussion of production rules, service level agreements, IP processes, and business process management.
Acknowledgments
This book was partially sponsored by the US Army Research laboratory and the UK Ministry of Defense under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the US Government, the UK Ministry of Defense, or the UK Government. The US and UK Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
xvii
This page intentionally left blank
About the Authors
Dakshi Agrawal, IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York 10532 (electronic mail: [email protected]). Dr. Agrawal is a research staff member at IBM T. J. Watson Research Center in the Policy & Networking Department. He has been a core team member in developing the Policy Management Toolkit for IBM Autonomic Computing, and received an IBM Research Division Award and Invention Achievement Award for this contribution in the project. Dr. Agrawal received a Ph.D. in 1999 from the University of Illinois–Urbana-Champaign (UIUC), Urbana, IL in electrical engineering. He worked as a Visiting Assistant Professor at UIUC during 1999–2000 before joining T. J. Watson Research Center, IBM Corporation, Hawthorne, NY as a Research Staff Member. Dr. Agrawal has more than 30 publications in international conferences and journals in the area of digital communication theory, distributed computing systems, and digital security and privacy. He has been granted or has applied for more than ten patents with the US Patent Office. Seraphin Calo, IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York 10532 (electronic mail: [email protected]). Dr. Calo is a Research Staff Member at IBM Research and currently manages the Policy Technologies group within that organization. He received the M.S., M.A., and Ph.D. degrees in electrical engineering from Princeton University, Princeton, New Jersey. He has worked, published, and managed research projects in a number of technical areas, including: queuing theory, data communications networks, multiaccess protocols, expert systems, and complex systems management. He has been very
xix
xx
About the Authors
active in international conferences, particularly in the systems management and policy areas. His recent involvements include serving on the Organizing Committee of Policy 2004 (IEEE 5th International Workshop on Policies for Distributed Systems and Networks) and serving as the General Chair of IM 2005 (The Ninth IFIP/IEEE International Symposium on Integrated Network Management). Dr. Calo has authored more than 50 technical papers and has several United States patents (three issued and four pending). He has received two IBM Research Division awards and two IBM Invention Achievement awards. His current research interests include distributed applications, services management, and policy based computing. Kang-Won Lee, IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York 10532 (electronic mail: [email protected]). Dr. Kang-Won Lee is a research staff member at IBM T. J. Watson Research Center in the Policy & Networking Department. He has been a core team member in developing the Policy Management Toolkit for IBM Autonomic Computing, and received an IBM Research Division Award and Invention Achievement Award for this contribution in the project. He is currently working on policy-based storage area network planning and verification. Dr. Lee has received his Ph.D. in computer science from the University of Illinois–Urbana-Champaign, specializing in computer networks. Dr. Lee has published more than 40 technical articles in premier IEEE and ACM journals and conferences. Jorge Lobo, IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York 10532 (electronic mail: [email protected]). Dr. Lobo joined IBM T. J. Watson Research Center in 2004. Before going to IBM he was principal architect at Teltier Technologies, a start-up company in the wireless telecommunication space acquired by Dynamicsoft and now part of Cisco System. Before Teltier, he was a tenured associate professor of CS at the University of Illinois at Chicago and member of the Network Computing Research Department at Bell Labs. At Teltier he developed a policy server for the availability management of Presence Servers. The servers were successfully tested inside two GSM networks in Europe. He also designed and co-developed PDL, one of the first generic policy languages for network management. A policy server based on PDL was deployed for the management and monitoring of Lucent first generation of softswitch networks. Jorge Lobo has more than 50 publications in international journals and conferences in the areas of Networks, Databases, and AI. He is co-author of an MIT book on logic programming and is co-founder and member of the steering committee for the IEEE International Workshop on Policies for Distributed Systems and
About the Authors
xxi
Networks. He has a Ph.D. in CS from University of Maryland at College Park, and an M.S. and a B.E. from Simon Bolivar University, Venezuela. Dinesh Verma, IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York 10532 (electronic mail: [email protected]). Dinesh C. Verma manages the Policy & Networking technologies area at IBM T.J. Watson Research Center, Hawthorne, New York. He received his doctorate in Computer Networking from University of California at Berkeley in 1992, his bachelor’s in Computer Science from the Indian Institute of Technology, Kanpur, India in 1987, and master in Management of Technology from Polytechnic University, Brooklyn, NY in 1998. He holds 14 patents related to computer networks, and has authored more than 50 papers in the area. His research interests include topics in policy-based computing systems, Quality of Service in computer communication systems, distributed computing, and autonomic self-managing software systems. Dr. Verma has authored four books, two with Pearson or its imprints, and two with John Wiley & Sons. Books published by Pearson include Policy-Based Networking (ISBN 1578702267) and Supporting Service Level Agreements on IP Networks (ISBN 1578701465).
This page intentionally left blank
Chapter 1
Policy Definition and Usage Scenarios
M
anaging IT (information technology) infrastructure is hard. From Fortune 500 enterprises to small businesses and from nationwide data centers to personal computers in homes, an inordinate amount of time and effort is spent in managing IT. Management and operational expenses are taking an increasingly larger share of the IT budget in many organizations, with a major part of it attributed to the complexity of the systems that need to be managed. IT management is a labor-intensive task, and skilled administrators need to intervene frequently to keep the IT infrastructure running. The exponential increase in the size of IT infrastructures coupled with increasing technical complexity has led to a situation where, despite automation, remote management, and off-shoring, the fundamental problem—there are not enough skilled people to ensure seamless operation of IT systems—remains untamed. This has driven research and industry to look for management frameworks that go beyond the direct human manipulation of network devices and systems [AUTO]. One approach toward this aim is to build policy-based management systems (PBMS). Policy-based management refers to a software paradigm developed around the concept of building autonomous systems or systems that manage themselves with minimum input from human administrators. This paradigm provides system administrators and decision makers with interfaces that let them set general guiding principles and policies to govern the behavior and interactions of the managed systems. Although large portions of the IT
1
2
Chapter 1
•
Policy Definition and Usage Scenarios
management chores are still carried out manually and in an ad hoc manner, policy-based management systems are maturing and can be found in areas such as data center management, privacy, security and access management, and the management of quality of service and service level agreements in networks. The main objective of this book is to provide the reader with a firm understanding of what policy-based management systems are, how they can be used to reduce the cost of IT administration, and the state of the art in policybased management in real life.
1.1. Formal Definition of Policy The word “policy” has its origins in government and regulations and its source is Middle English and Middle French. If we open a dictionary and look for the word “policy” we may find the following definitions [MERR]:
1. A definite course or method of action selected from among alternatives and in light of given conditions to guide and determine present and future decisions. 2. A high-level overall plan embracing the general goals and acceptable procedures especially of a governmental body. The reader may notice that even though both the definitions convey a very similar idea, that is, a policy is a plan or course of action, the distinction between the two comes from the specificity of the plan. In the first case the plan is definite and concrete, whereas the second definition refers to a high-level plan. In many occasions, the term “policy” is used interchangeably with “regulation” although regulations have more emphasis on enforcement, usually describing authoritative rules dealing with details or procedures [MERR]. In general, the word “policy” is used in a broad spectrum of situations in common English. The use of the word “policy” in computer science, networking, and information technology has experienced a similar phenomenon. It has been used to describe among other things: regulations, general goals for systems management, or prescriptive plans of action. A few examples of where the term has been applied are access control policies, load balancing policies, security policies, back-up policies, firewall policies, and so on. We also find references to policy in high-level programming languages and systems generally referred to as Business Rules Systems [CJDA].
Formal Definition of Policy
3
In many cases policies are equated with system configuration. Take, for example, policies in Microsoft® 2000 Exchange servers. In Exchange, a policy is “a collection of configuration settings that are applied to one or more Exchange configuration objects.… You can define a policy that controls the configuration of some or all settings across a server or other objects in an Exchange organization.…” Given the variety of usage of the word “policy,” we first need to precisely define what we mean by policy. Heuristically, a policy is a set of considerations designed to guide decisions on courses of actions. Policies usually start as natural language statements. From these descriptions, many details need to be sorted out before policies can be implemented. Consider the following statement usually implemented as a default policy in Apache Web servers: Do not allow the execution of CGI scripts.
The policy is activated by setting the value of an appropriate variable in a configuration file. During initialization the Web server reads the configuration file and adjusts its behavior in a way that when interpreting and serving documents to Web clients it will throw an exception if it encounters a CGI script as the source of the document to be rendered into the Web client. Compare this policy to the following example from banking regulations: A currency transaction report (CTR) must be filed with the federal government for any deposit of $10,000 or more into a bank account.
This statement, extracted from the Money Laundering Suppression Act enacted by the U.S. Congress in 1994, is a typical policy regulation that banks must implement. In modern bank systems, the implementation will probably be done using database triggers. The implementation of these two policies has little in common. However, there is significant commonality in the specification. First, both policies identify a target system: the computer where the Web server is running and the bank information system. Second, both policies express constraints over the behavior of the target system. From the point of view of high-level policy specification, what the system is or how the system is implemented is not relevant.1 The policy merely indicates how to regulate the behavior of a system by indicating the states that the system can or cannot take. In the Web server example, if we can take a snapshot of the state of the server at any moment in time, the policy indicates that we
4
Chapter 1
•
Policy Definition and Usage Scenarios
should never find a process associated with a CGI script that was started by the Web server. In the banking example, if we take a snapshot of the system and find a transaction containing a transfer of $10,000 or more, the snapshot must also contain the generation of a CTR. Accordingly, for us to specify a policy we need first to identify three things:
1. The target of the policy, which we will call the target system. A target system may be a single device such as a notebook computer or workstation, or it can be a complex system such as a data center or a bank information system consisting of multiple servers and storage systems. 2. A set of attributes associated with the target system. The value of an attribute can be a simple number or text string, or it can be as complex as a structured object containing other attributes. At this moment we do not need to define a data model for attributes; we need to know only that these attributes are identifiable and accessible and that they take values from a predefined set of types. 3. The states that the target system can take at any given time, which are defined by an assignment of values to the system attributes. In practice, there are many alternatives for the definition and identification of target systems. For example, the computer system where the Web server is running could be identified by an IP address; or we can group subsystems and identify the group with a unique logical name, for example, all the computers on the second floor of an office building. There are also many ways to define and get the values of system attributes. For example, an attribute of a computer system could be a set of objects representing the processes running in the computer system at a given time. These objects could be complex objects with testable properties that identify whether the object represents a process that has been started by the Web server, and whether it is a CGI script.2 However, the behavior of a system is not completely characterized by the set of states it is in. A definition of “behavior” needs to take into consideration how the system moves through these states. Given that policies constrain the behavior, it is not surprising to find policies that constrain these state transitions. Consider the following example: If a credit card authorized for a single person has been used within 1 hour in different cities that are at least 500 miles apart, reject the charge and suspend the credit card immediately.
Formal Definition of Policy
5
This policy is also a constraint. But in contrast to our previous policies, the constraint is not imposed on a single state of the system, but on at least three states: the state of the system at the time the credit card is first used; the state at the time when a second use of the credit card is detected and the transaction needs to be rejected; and any state in the future where credit card transactions must be rejected. Thus, we will define the behavior of a system to be a continuous ordered set of states, where the order is imposed by time. Consider a system S that may behave in many ways. Let B(S) be the set of all possible behaviors the system S can exhibit (that is, any possible continuous ordered set of states). Definition: A policy is a set of constraints on the possible behaviors B(S) of a target system S; that is, it defines a subset of B(S) of acceptable behaviors for S. We note that this is a very generic definition, and it does not say how policies can be implemented or enforced. Implementations will require systems to provide operations that can affect their behavior. If there is no way to affect the behavior of the system, we will not be able to implement policies. These operations are special attributes of the system that policies can use. We will generically refer to these operations as actions. Note also that even though the system states can change continuously, implementations will be able to observe only discrete changes. In many real-life systems, the state of the system may not be completely defined or known. Note that the determination of the full state of a system is not necessary to use a policy based approach. Policies can be defined using only a small number of attributes of system state and do not require the determination of the complete state a priori. Let us return to our Web server example. We have noticed that activating the policy to restrict the execution of all CGI scripts is straightforward—we set the appropriate variable in the configuration file of the Apache server, the server will be restarted and it will take care of the rest by itself. Now let’s take a more interesting policy. We can create policies that will allow different sets of users to execute different sets of CGI scripts. The implementation of a policy like this in a standard Apache server is not that obvious. One could try to implement this policy by creating a directory structure that reflects the different sets of scripts with links to the scripts from the appropriate directories, and creating access control files for each directory with the different sets of users that have
6
Chapter 1
•
Policy Definition and Usage Scenarios
access to the scripts. Thus the policy would be enforced by giving user names and passwords to the users and forcing users to authenticate themselves before executing any of the scripts. A severe inconvenience with this implementation is that changes in either the set of users or scripts may require reshuffling of the directories and changes in different access control files. The difficulty arises because there is no obvious connection between what the policy wants to enforce and how it is enforced. In the simple CGI policy, how the policy is implemented is hidden inside the implementation of the Web server and the implementer needs merely to set the policy on or off. For the second case, having only the possibility of setting the CGI script execution policy on or off is too restrictive because an essential component of what the policy wants to constrain is conveyed by the different sets of users and scripts. A policy-based management system aims to provide an environment to policy authors and implementers where they can concentrate their efforts on describing what the policy restricts and thereby alleviating the burden created by having to describe how the policy will be enforced. This separation of what from how varies widely among different systems and applications, and in practice most policy authors are still required to have at least partial understanding of policy implementation.
1.1.1. Types, Nature, and Usage of Policies As defined earlier, policies are constraints on the behavior of a system, and system behavior is a sequence of system states. In turn, each state of a system can be characterized by the values that a collection of system attributes takes. In this section, we enumerate some of the common types of constraints specified on the system behavior, and discuss how they result in different types of policies. The attributes of a state can be divided into three groups—a set of fixed attributes, a set of directly modifiable attributes, and other observable but not directly modifiable attributes. The fixed attributes of a system cannot be modified directly or indirectly. As an example, a server in a data center has a state characterized by attributes such as maximum number of processes, size of virtual memory, size of physical memory, amount of buffer space for network communication, processor utilization, disk space utilization, memory utilization, time taken to respond to a user command, and so on. Among these, the size of physical memory is a fixed attribute for the purposes of systems management—it
Formal Definition of Policy
7
cannot be changed until the hardware of the server itself is modified. Some of these attributes, such as the maximum number of processes, size of virtual memory, or the amount of buffer space, can be modified directly by changing some values in a configuration file. Other attributes, such as processor or memory utilization, cannot be modified directly. They can be manipulated only by modifying the direct parameters or taking some other action—for example, by killing a running process. We define the set of directly modifiable attributes of a system as its configuration attributes. Furthermore, the set of attributes that are not directly modifiable, but can be observed or computed from observation of system attributes, are defined as system metrics. The configuration of a system, using these conventions, is the collection of the configuration attributes and the assignment of values to them. The simplest policy type specifies an explicit constraint on the attributes of the state that the system can take, thereby limiting system behavior: Configuration Constraint Policy: This type of policy specifies constraints that must be satisfied by the configuration of the system in all possible states. These may include allowable values for an individual configuration attribute, minimum and maximum bounds on the value of an individual configuration attribute, relationships that must be satisfied among different configuration attributes, or allowable values for a function defined over the configuration attributes. Some examples of configuration constraint policies are as follows: • Do not set the maximum threads attribute on an application server over 50. • The size of virtual memory in the system should be less than two times the size of physical memory. • Only users in the administration group have access to the system configuration files.
Configuration constraint policies are often used to ensure correct configuration of a system, to self-protect the system from operator errors, and to prevent the system from entering the operational modes that are known to be harmful. Metric Constraint Policy: This type of policy specifies constraints that must be satisfied by the system metrics at all times. Unlike configuration attributes, the metrics of a system cannot be manipulated directly. The system needs to determine in an automated manner how to manipulate the configuration of
8
Chapter 1
•
Policy Definition and Usage Scenarios
the system, or to take appropriate actions such that the constraints on the metrics are satisfied. The constraints on the metrics may include bounds on any observable metrics, or relationships that may be satisfied among a set of system attributes including at least one metric attribute. Metric constraint policies that specify an upper or lower bound on a metric are also known as goal policies because they provide a goal for that metric, which the system should strive to achieve. Examples of metric constraint policies include the following: • Keep the CPU utilization of the system below 50%. • All directory lookups on the name of a person should be completed in less than a second. • The end-to-end network latency should be kept below 100 milliseconds.
Metric constraint policies are often used to enable self-configuration of systems in order to meet specific performance requirements or objectives. Action Policy: In the preceding examples, the two types of policies described specify constraints on a single system state. In many cases, a policy may require explicit actions to be taken when the state of a target system satisfies some constraints. These types of policies are called action policies because they require the system to take a specific set of actions. Action policies constrain a sequence of states. That is, when a particular state is observed then certain actions must be taken at a later point so that the target system will be in some other state. In most cases, the action policy would modify the configuration of the system in response to some condition being true. Action policies essentially provide a plan according to which the system should operate when it encounters a certain condition specified in the policy. Examples of action policies include the following: • If the CPU utilization of a server in a data center exceeds 70%, allocate a new server to balance the workload. • If the temperature of the system exceeds 95 degrees Celsius, then shut-down the system. • If the number of bytes used by a hosted site exceeds 1 Gbyte in a month, then shut down access to the site. • If the inbound packet has a code-point for expedited forwarding (EF) per-hop behavior (PHB) in the packet header, then put it in the high priority queue.
In these examples, action policies have been used to manage the performance of computer servers and networks, for managing the effect of environmental conditions, for limiting resource utilization, and for providing different Qualities of Service in communications networks.
Formal Definition of Policy
9
Not all action policies specify an action that can be directly executed on a system. One important type of action policy is the alert policy, which is commonly used to flag any conditions that may require operator intervention. Alert Policy: An alert policy is an action policy where the action consists of a notification sent out to another entity. A notification is an action that does not modify the configuration of the system itself. Instead it can take one or more of the following forms: sending an email or an SMS message, making a phone call, logging a message in a file, or displaying an alert visually on a display. Some examples of alert policies are as follows: • Notify all users who have not accessed their account for three months by email to warn of possible account deletion. • If a system has not installed the latest version of anti-virus software, send an email to the employee and his/her manager. • If a system has gone down, send a message to the administrator’s pager.
Although real-life policies, as shown here, are specified in various different styles, all of these policies can be restructured using a common pattern or a model. Formally, this model is called the policy information model. One of the most widely used policy information models describes a policy using a condition-action rule, which means if the condition is true then perform the action. A more specific version of the condition-action rule is the event-condition-action (ECA) rule, which means upon occurrence of the event, if the condition is true then perform the action. It is not difficult to see that the preceding policy rules can be transformed into some version of ECA rules. For example, the metric constraint “The end-to-end network latency should be kept below 100 milliseconds” can be rewritten as “Upon completion of measurement, if the end-to-end network latency is above 100 milliseconds, then record the violation in the system log file.” The policy information model is a useful framework to describe, compare, and analyze various different policy rules. In Chapter 3, “Policy Information Model,” we will review some of the popular policy information models that are being widely used. Having defined policies as constraints on the operation of a system, let us examine how the specification of such constraints can help in the management of IT systems. The specification of constraints on the state of the system can be used for several purposes, such as
10
Chapter 1
•
Policy Definition and Usage Scenarios
• When the demand or workload on a system changes requiring a reconfiguration of the system, the constraints can be used to determine a desirable new configuration. • When there is a contention for resources in the system, the constraints can be used to determine the manner in which to resolve that contention. • When any external entity tries to access the resources in the system, the constraints can be used to determine whether that access ought to be permitted. • When a system violates certain constraints, it can determine and execute a set of actions that will allow it to remove that violation. Policies can be used to build systems that are autonomic—that is, exhibit the properties of self-configuration, self-protection, self-optimization, and selfhealing. A self-configuring system would configure itself according to its intended function. A self-protecting system would identify threats to itself and take corrective actions. A self-optimizing system would modify its configuration according to the current workload to maximize its performance. A selfhealing system would automatically repair any damage done to its components. The manner in which policy technology can be used to enable the development of such systems is described in the next few subsections.
1.2. Policy-Based Self-Configuration One of the most time-consuming operations in the management of any system is the initial configuration of a new installation, or the reconfiguration that needs to be performed when new requirements are received. The basic approach in self-configuration is to offer a simplified set of abstractions that the administrator needs to manipulate, while the detailed configuration of a myriad of parameters in the system are hidden to a large extent. Although there are several instances in which policy-based self-configuration mechanisms can be used, we will use the example of a hosting service provider for illustration purposes. For the sake of simplicity of illustration, let us assume that all the customers of this service provider run their Web sites only within the premises of the service provider, and each Web site is supported by one or more instances of a Web
Policy-Based Self-Configuration
11
server such as Apache or IIS. The service provider has a pool of stand-by servers that can be deployed for any customers after a proper installation of applications. Some of the customers whose Web sites draw heavy traffic may need multiple servers at the site with a load-balancer in front of them, whereas customers whose sites are not as popular may be sharing a single server with other customers. The service provider can set up a hosting Web site with a set of routers, virtual LAN switches, load-balancers, and server blades that can enable this service. All of these devices make up the target system of the policies. Because most hosting service providers will have system administrators who can write scripts to automate common processes, we further assume that they will have developed a series of scripts so that they can automatically allocate a server from a shared pool to a specific customer, and conversely return a server back to the pool once the busy period is over or the contract with a corresponding customer expires. A similar script can be developed for automating the addition or removal of a new virtual server for smaller customers. A more detailed discussion about how such a system can be developed can be found in [APPL]. When a new customer is added or an existing customer removed, the configuration of the site needs to be changed according to the change in the set of customers being supported. If there are mechanisms available for servers to be assigned in an automated manner to different customers from a shared pool, then the number of servers or processors assigned to a specific customer may change depending on the intensity of traffic to that site. Sometimes the hosting site may want to enforce limits on how much bandwidth a customer’s site can use in a month, and may want to reconfigure the site to restrict the throughput available to a hosted site if the traffic to that site exceeds predetermined thresholds. Let us assume that the service provider characterizes its customers into two groups: large and small. It may instantiate policies for self-configuration of its site, which may look like the following: If a large customer has 75% or more utilization of all its servers, and has less than its maximum allowed number of servers, then allocate an additional server from the free pool to it. If a large customer has 30% utilization or less on all its servers, and has more than its minimum allowed number of servers, then remove a server from it to the free pool. If a small customer has reached 125% of the monthly bandwidth allowed to its site, then disallow further access to the site for the rest of the month.
12
Chapter 1
•
Policy Definition and Usage Scenarios
If the addition of a new small customer causes the number of small customers at a server to exceed a threshold, allocate a new server from the free pool and migrate half of the existing small customers to that new server.
In the preceding example, we can identify several attributes: utilization rate, number of servers, numbers of free servers, monthly bandwidth, and so on. Each of the policies provides a constraint on the new configuration of the system. The behavior of the system (allocation of servers between customers and the free pool) is constrained to conform to the guidelines set earlier. The guidelines may change based on the experience of the service provider—instead of using server utilization, it may opt to use the bandwidth consumed as a trigger for reallocation of servers, or it may use a combination of both. Also it may choose to not block small customers that exceed their throughput limits, opting instead to charge them an additional amount of money. Looking back at the discussion of policy types, we can recognize all of these policies as instances of action policies. These sets of policies allow the administrator to manage the customers using attributes (utilization, number of servers, bandwidth rate, and so on) that are decoupled from the details of actual server configuration (their IP addresses, commands to control bandwidth, their operating system version, and so forth). Thus, the goal is to allow administrators to view system management in terms of the abstracted attributes that lets them specify what needs to be done, leaving the details of how it can be done to the underlying mechanisms that support a policy-based management system. Policies do not describe the mechanisms for the reallocation of the servers, the migration of customers to the new server, or disabling access to any site. However, assuming that appropriate scripts to do these tasks exist, the ability to specify the policies and invoke the right scripts for the required actions would enable the system to self-configure itself in accordance with the wishes of the service provider. Building a policy enabled management system for this scenario would involve three steps:
1. Determining a way to specify the policies. 2. Enabling support within the system to interpret and enforce the policies. 3. Invoking a mechanism to distribute policies from the entity specifying them to the entities interpreting and enforcing them.
Policy-Based Self-Protection in Computer Networks
13
To specify policies, a language that can capture the semantics of policies needs to be selected and a tool to specify the policies needs to be developed. Later we will discuss that having an information model is also an important aspect of the specification process in addition to selecting a policy language. The system management software that allocates and reallocates servers needs to understand policies specified in this language so that it can enforce the policies by transferring the servers under various operation conditions. Finally, it is important that the policy specification from a system administrator is distributed to a system that can enforce the policies. In this particular case, if there is only one instance of the system management software, the third problem of distribution is trivial because there is no need for synchronizing among multiple copies of the policy.
1.3. Policy-Based Self-Protection in Computer Networks Policy-based management in network administration has been practiced for more than a decade and has been used successfully in many application areas. A common usage of policies in network management is to regulate the traffic in the network, especially dealing with the security, access control, and quality of service of different traffic streams. Some examples of network traffic security policies include the following: • Allow telnet connections out of the local network. • Block telnet connections into the sites on the local network. • Allow only secure HTTP traffic, and block any other traffic into the local network. • If a UDP packet on an illegal port is received from an external computer, disallow any communication to that computer.
The enforcement of these policies within the network, which is the target system in this case, needs to be managed in the context of the configuration of a specific network. If we consider a simple model of network access protection, in which access to the local network is secured and protected by means of one or more perimeter firewalls, then the policies can be readily seen as configuration constraint policies as described in the previous section. The enforcement of these policies requires that the configuration of the firewall be done in a manner that is consistent with these policies. In this case the attributes are source and destination IP address and ports.
14
Chapter 1
•
Policy Definition and Usage Scenarios
As in the previous example, we need to have means to specify the policies, interpret and enforce the policies, and distribute policies from the entity specifying them to the entities interpreting and enforcing them. For the purpose of defining policies, a machine-readable language needs to be developed for their specification. For access control and security policies, the language could be a standard access control policy language such as the “eXtensible Access Control Markup Language” (XACML) [OASI] from the Organization for the Advancement of Structured Information Standards (OASIS) or some other policy languages with similar capabilities. The firewalls in the system need to implement and enforce the access control policies. If they are able to interpret the language selected for specifying policies, they can take the policy as it is specified for enforcement. Otherwise, a translation of the policies to a format that the firewall can interpret needs to be done. In some cases, the policies might not be able to be translated easily into a firewall configuration. An example would be the presence of a policy requiring that a notification should be sent out to the administrator whenever an access attempt to a forbidden site is made. Because the firewall is not capable of sending notifications—it merely records passively the sites that users are trying to access—an additional mechanism is needed. In this case, an independent software package would be required to periodically collect the records of which sites were accessed by each user from the firewall, and then process them to check whether a notification needs to be generated. If there is more than one firewall in the system, we need to have a mechanism that will keep the policies specified in different firewalls consistent. Although this can be done manually, it would be more convenient to have an automated mechanism to dispense and distribute the policies. Various alternative approaches to the distribution of policies can be developed and they are discussed in detail in Chapter 5, “Policy Transformation and Analysis.” In another variation of self-protection, policies may be defined that indicate how the set of applicable policies ought to be changed. As an example, if a system detects that the system is under attack by a newly identified infected host, the applicable policy rules may be augmented to prevent traffic from that infected host to reach the rest of the network. There are several other uses of policies in managing computer networks. The application of policy-based fault management in computer networks is discussed in detail in Chapter 7, “Policy-Based Fault Management.”
Policy-Based Self-Optimization in Computer Systems
15
1.4. Policy-Based Self-Optimization in Computer Systems A computer system can be called self-optimizing if it can reconfigure itself so as to best satisfy the needs and requirements imposed upon it. In the case of enterprise systems, the optimization requirements imposed on a computer system are usually driven by the needs of a business. The business that owns the computer system could have some contractual obligations that it may have signed up for. For example, the hosting services provider may have signed a service level agreement (SLA) promising a target system response time to a customer, and it would like to provision, configure, and operate its systems so that it is in the best situation to meet its obligations. Self-optimization in computer systems may be specified by means of metric constraints policies or goal policies. These policies would require a bound on a metric that the system may or may not directly control. When the metric constraint policies are specified, the system is expected to try to do its best to meet them. To be able to match these policies, the system must translate them into a set of action policies—that is, a set of actions that can be invoked when some conditions about the system behavior are satisfied. The translation of metric constraint policies into action policies is the process of policy transformation, which is described in Chapter 5. An example of this type of policy is the support of service level agreements. An SLA might read as follows: The service provider must be able to process at least 1000 transactions per second under the condition that the system is operating under normal load 70% of the time. Otherwise the provider is not obliged to fulfill the requirement.
The policy must also define normal and overload conditions (but they are not shown here for simplicity). An implementation will provide a client of the service provider with access to the system attributes so that client or sometimes a third party can verify that the agreement has been fulfilled. If policies are violated, the policies themselves may include penalties or compensations as actions to encourage the system to conform its behavior to the contract. To implement self-optimizing policy, the system can try to predict when it is expected to fail the requirements and take corrective actions before any constraint is violated.
16
Chapter 1
•
Policy Definition and Usage Scenarios
1.5. Policy-Based Self-Healing Sometimes the primary role of policies is to make sure that the operational state of the system satisfies the policies that are defined within the system. If the system is not satisfying those constraints, it should take corrective actions or create an alert. Thus, both action and alert policies can be used to implement self-healing systems, although one may take the position that the alert policy simply allows the system to call for assistance when it sees an issue rather than healing itself. An example of this is the following security policy: The temperature in the blade center must be maintained at less than 65 degrees.
For example, if one of the fans in the blade center breaks, this policy can trigger an action to put some of the blades to sleep in order to reduce the temperature. If the action does not sufficiently reduce the temperature, an alert policy can be triggered to request the attention of the system administrator. Another good example can be found in storage area network (SAN) configuration management. SAN configuration is a complex problem because of the interaction between many different devices and software systems. Configurations need to make sure proper device drivers are installed, incompatible devices are not connected or are not configured in the same zone3, redundancy requirements are fulfilled, and so on. To cope with the complexity, experts come up with various sets of policies that represent best practices for interoperability and reliability. An example of a policy in this set would be the following: The same host bus adapter (HBA) cannot be used to access both tape and disk devices.
This policy can be verified automatically if there is an appropriate software module installed at different computers and devices in the storage area network. Data about the system configuration is collected and policies are evaluated against the data. If the configuration is not compliant, the policy management service reports the violations and may isolate parts of the system to avoid errors or failures. In some cases, it may even be able to modify the system automatically for compliance, thereby achieving the goal of self-healing.
Building a Policy-Based Management System
17
As previously mentioned, a machine-readable language needs to be developed for the purpose of defining policies, and an information model is an important aspect of the specification process. The software module that validates compliance needs to be a component of the management software for the system that collects the state information and checks it for violation of policies.
1.6. Building a Policy-Based Management System The reader must have noticed from the description of the various aspects of policy-based management for different scenarios that there are many similarities in the design of the underlying capabilities. In this section, we present the architecture of a generic policy-based management system and describe the functions needed to build it. We start as a reference point with what is usually called the IETF/DMTF Policy Architecture. The IETF/DMTF Policy Architecture is found in many Request for Comments (RFCs) published regarding the use of policies in computer communication networks. (See the sidebar titled “IETF and DMTF” for more details.) Despite being cited as the IETF Policy Architecture or the DMTF Policy Architecture, it is worth pointing out that neither of the two organizations has actually standardized policy architectures. Thus, the architecture is not a standard formally defined by either the IETF or the DMTF. It is more akin to a folk architecture that is usually ascribed to the IETF and/or DMTF. IETF and DMTF The IETF (http://www.ietf.org) and DMTF (http://www.dmtf.org) are two standards organizations that have been at the forefront of policy standardization. IETF is an acronym for the Internet Engineering Task Force. It is the organization that defines the standards governing the protocols, management and other issues related to the operation of the Internet. IETF standards are published as RFCs that are publicly available on the Internet. Several policy-related efforts were carried out within the IETF as part of the initiatives needed to ensure Quality of Service and security within the Internet. In particular, RFC 2748 defined the COPS (Common Open Policy Service) protocol, RFC 2749 defined COPS usage for RSVP (Resource ReSerVation Protocol), and RFC 2753 defined a framework for policy-based admission control. All three RFCs mentioned were published in early 2000.
18
Chapter 1
•
Policy Definition and Usage Scenarios
A working group within the IETF defined a policy common information model, which was published as RFC 3060 in 2001. Work on defining policy standards was subsequently moved over to the DMTF (Distributed Management Task Force), which is an industrial organization that develops standards required for managing different types of IT systems. The DMTF’s key technical contribution has been the development of a common information model— that is, the definition of a standard set of information that all IT systems need to provide in order to be managed in a uniform manner. After the standardization of policy efforts moved to DMTF, further enhancements to the policy information model were published as RFC 3460 in 2003. Although RFC 3060 and 3460 take a big step toward standardization of policy efforts, they are still in the realm of information models and they fall short of concrete architectural prescriptions. Other standards organizations have also defined standards related to policies, and they are discussed in more detail in Chapter 9, “Related Topics,” of this book. However, the work of the DMTF and IETF on policies forms the basis from which many of those standards have been derived.
The reason the folk architecture is cited so frequently in academic circles is that it captures the driving principles behind the derivation and definition of many of the standard RFCs and drafts submitted to the IETF that deal with policies. Thus, even though this architecture is not put out in any of the official documents from the two organizations, it represents the guiding principles behind much of the standards work, and it is appropriate to refer to it as the IETF/DMTF architecture. This policy architecture consists of four components as shown in Figure 1.1: a policy management tool, a policy repository, a policy decision point (PDP), and a policy enforcement point (PEP). The policy management tool provides a user interface for the creation and definition of policies. The policy repository provides mechanisms for storing policies and retrieving them as needed by the decision points. The policy decision points are modules in the system responsible for making policy decisions—that is, they examine the policies stored in the repository that would be applicable under a given circumstance and determine what needs to be done to comply with those policies. Policy enforcement points are elements that are responsible for enforcing the outcome of those policy decisions.
Building a Policy-Based Management System
Policy Management Tool
19
Policy Repository
Policy Decision Point (PDP)
Policy Enforcement Point (PEP)
Figure 1.1
The IETF/DMTF Policy Architecture
Although the formal definition of policies, their types and usage, and the barebones IETF/DMTF policy architecture just described captures basic concepts used in all policy-based management systems for self-management, they hardly convey the complexity faced in implementing a PBMS. When building a PBMS, an architect needs to make several decisions regarding policy lifecycle: when and where policies are created, modified, and transformed from higher levels of abstraction to lower levels of abstraction; how they are stored, distributed, and enforced in a possibly geographically dispersed system; and finally, when a policy becomes obsolete, how it is retracted throughout a system without disrupting the system operation. Another aspect of this complexity comes from possibly conflicting policies in a system that need to be resolved. Conflicts arise often in a PBMS because there are often multiple policy authors responsible for managing different operating behaviors (self-healing policy versus self-protection policy) of the system. Conflicts can arise during definition time or at run-time and they need to be resolved to make a decision. In the following chapters, we will cover some of these implementation issues of a PBMS in more detail.
20
Chapter 1
•
Policy Definition and Usage Scenarios
1.7. Summary The term “policy” can mean many different things to people depending on the context, background, and application domain. This chapter provided a formal definition of “policy” in the system management context, and discussed different types of practical policies and their usages. In particular, it singled out configuration constraint policy, metric constraint policy, action policy, and alert policy. It then briefly introduced that they can be represented in a policy information model in the form of condition-action rules or event-condition-action rules. We will review the policy information model more in detail in Chapter 3. The chapter also presented high-level scenarios for using policies for self-management of a system. The characteristics of self management have been studied in the context of self-configuration, self-optimization, self-healing, and self-protection. Finally, the chapter described a generic architecture for policy-based management systems as defined by IETF and DMTF. More specific examples of policy-based management systems will be presented in later chapters.
Endnotes 1
The system details will be relevant during policy implementation.
2
For instance, if the computer is a machine running a Unix-like operating system, all this information can be easily obtained by executing the “ps” command.
3
In storage area networks, a zone is a logical grouping of ports that belong to servers, switches, and storages in which only the ports in the same zone can exchange data.
Chapter 2
Policy Lifecycle— Creation, Distribution, and Enforcement
T
he lifecycle of a policy goes through multiple stages. First a policy must be created via a human administrator using a policy-authoring tool and specified in a concrete policy language. It can be then translated into different forms of policy rules to bridge the gap between the representation that is more familiar to the administrator and the format that is easier to process at low-level systems. Policies should be stored for persistence and later retrieval. It can also be distributed for execution to various parts of the IT system that may be geographically dispersed. In some cases, more than one policy rule may be merged to create a new policy. Over time, policies may be altered to adapt to the changing IT system environments or to capture the changes in the high level policy set by the organization. Finally, policies that are expired must be disabled in the system. In this chapter, we present a holistic view of the policy lifecycle. This view of the policy lifecycle can be quite useful to fully appreciate the challenges in designing a policy-based management system (PBMS). Also it will help the reader to establish a baseline understanding for designing a generic PBMS. The individual aspects of the policy lifecycle presented in this chapter will be presented in greater detail, and with specific examples, in later chapters.
21
22
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
2.1. A Holistic View of the Policy Lifecycle Figure 2.1 shows a holistic view of the policy lifecycle. At the highest level, policies are specified in natural languages. The policy authors at this level may not have a technical background— they write policies as the user of an IT infrastructure using a vocabulary that may be ambiguous and therefore requires further interpretation. For example, a high-level IT policy taken from the Information Technology Security Standard of a large corporation states that “Each user’s identity must be verified when the user attempts to log on to the system or middleware. … Reusable passwords used in identity verification challenges must be changed at least once every 90 days.” At this layer, a policy author will create policy documents in the form of memos, spreadsheets, presentations, and so on. We call this layer the user policy specification layer. Note that the policy author may further clarify the meaning of various terms used in the policy documents at this layer—for example, the aforementioned Information Technology Security Standard further clarifies what “user,” “system,” and “middleware” means. However, at this layer, terms used in policy specification are still sufficiently ambiguous, and require further interpretation before they can be implemented and executed. We refer to the policies authored at this layer as high-level user specified policies. A more precise and unambiguous interpretation of the policies authored at the policy specification layer is done with the help of modeling approaches that create abstract models of IT infrastructure and processes. As opposed to the previous layer, this layer requires an IT specialist capable of translating high-level policies using a formal specification such as Unified Modeling Language (UML) [UML], or an equivalent language to create such abstract models. High-level policies are then mapped to the abstract level as constraints on the behavior of abstract systems and processes. The language of choice for specifying constraints at the abstract level could be the Object Constraint Language (OCL) [OCL] or a domain-specific language such as eXtensible Access Control Markup Language (XACML) [OASI]. These languages have a precisely defined syntax (and, in some cases, precise semantics as well), which makes them amenable to analysis and interpretation in an automated fashion without human involvement. We call this layer the abstract policy layer.
A Holistic View of the Policy Lifecycle
23
Comments on Policy Layers User Policy Specification Layer High-Level Policies Policy Documents
Abstract Policy Layer Policy Analysis Tech. Independent System-Level Policies
Policy Transformations
Nontechnical users interact at this level.
Models (or templates) define the practical relationships between various policy, IT resource, and process semantics.
Concrete Policy Layer Tech. Specific Component-Level Policies
Policies are implementationdependent and not portable.
Executable Policy Layer Component Configurations
Figure 2.1
Low-level mechanisms exposed by components are used to enforce highlevel policies.
The Layered Policy Architecture
Because the mechanisms that can govern and constrain the behavior of a distributed system are provided by the underlying system components, it is necessary to take policies specified for the whole system and refine them into policies for individual components that must be upheld to meet the overall systemwide policies. In the password example just given, one implication of the security policy is that all servers that follow that policy should expire user passwords for Unix® shell access to the servers after 90 days. Another implication of the security policies could be on the access to the corporate intranet site that serves password protected customized content to the employees. To implement the password policy successfully, the authentication service used by the corporate intranet site needs to expire user passwords after 90 days. We refer to this layer of policy specification as the concrete policy or implementation policy layer. Policies at this layer will be specified in different technology-specific languages for different components and will include details of how an abstract
24
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
system is implemented in a given IT infrastructure. At this layer, policies are no longer abstract, but they are tied to a specific and concrete implementation of a system. Should the system or its components change, policies at this layer may need to be rewritten to take the implementation changes into account. Finally, the component level concrete policies are turned into configuration parameters, rules, constraints, database entries, application deployment descriptors, and so on that influence the behavior of system components at the runtime. We refer to this level as the executable or deployment policy layer because the policies at this layer can be directly consumed by the individual system component. To take further the password example given earlier, an organization may have servers that use AIX®, Microsoft Windows® 2000, and Linux® as their operating system. Each of these operating systems provides technology-specific means to specify the constraint that the user passwords should expire after 90 days. Thus the concrete policy would need to be translated into executable or deployment policies that can be understood by these servers and enforced. Thus in the layered view presented earlier, policies start their lifecycle as userspecified policies that reflect legal, business, and ethical imperatives from the perspective of an IT infrastructure; subsequently, they undergo several levels of transformation and refinement and end up as code or configuration specifications that can be directly deployed on the components of an IT infrastructure. It is worth pointing out that very few policy-based management systems currently deployed take this holistic view. Most policy-based management systems that exist today are designed for individual components—for example, servers, storage devices, and networking routers1. The policy-based management system for these components starts with the policies authored at the technologyspecific concrete layer. These policies are then refined and translated into configuration constraints for the component. The emergence of IT standards such as Common Information Model (CIM) [CIM], Web Services Distributed Management (WSDM) [WSDM], and XACML [OASI] is encouraging the specification of component-level policies to move one layer up where they can be written just once for a single type of component and transformed into technology-specific policies for different instances of a particular component type. Even though the current practices in IT infrastructure management rarely match the holistic view of policy lifecycle, we expect that in the near future the situation will change substantially as greater scrutiny is paid to the IT cost structure. The drive to reduce IT operation expenses is resulting in IT processes
Instances of Policy-Based Systems
25
and management being standardized and outsourced. A holistic view of the policies is attractive from the perspective of the suppliers as well as the customers of the outsourced IT processes. For the suppliers of IT services, automated configuration of the shared IT infrastructure that can satisfy a set of high-level customer requirements reduces the cost of and the time required for boarding a new customer. For the customers of the outsourced IT services, a set of high-level, relatively unambiguous policies for various IT processes allows them to compare offerings by various suppliers. We note that reducing IT expenses is not the only driver behind the push to take a holistic view of policies. There are many other reasons—for example, compliance with government regulations that are driving a holistic view of policies in large enterprises. At this point, it is useful to briefly overview a few specific instances of how policies are used for management in current IT systems. This will give the reader a flavor of how practical systems map into the vision presented earlier and a flavor of the differences between the ideal vision just presented and what has been implemented in the current systems. In the first example, we present how network QoS (Quality of Service) control can be achieved by the servers in an enterprise data center and routers throughout the enterprise network cooperating together to enforce QoS policies for different applications. In the second example, we present the example of privacy policy publication where two devices conduct a policy-based negotiation of their configurations to derive mutually acceptable parameters for communications. In the third example, network access points in an enterprise network use policies to grant appropriate level of network access to end-point devices.
2.2. Instances of Policy-Based Systems 2.2.1. Network QoS Control In an enterprise data center, there are hundreds or even thousands of servers that need to be properly configured to provide a specific end-to-end QoS to different applications hosted in the data center—for example, company Web sites, voice over IP, instant messaging, and streaming video. The quality of service is provided by mapping different traffic flows to different types of scheduling queues within the routers in the network. The identification of which packets ought to be sent to which queues is done by the routers looking at predetermined fields in the header of the network packets. The servers in the data center are responsible for
26
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
creating the IP headers in a manner such that the routers within the network can service them properly. Controlling QoS in this manner is according to the specifications of the Differentiated Services [DIFF] scheme in the Internet. In addition to controlling traffic flow performance, the same scheme can also be used for monitoring the level of different types of traffic flows. To manage a consistent mapping of traffic information to all the servers and routers, one would need a policy creation tool in which the policies for traffic differentiation for different applications can be specified. The policies may be specified at a high level. For example, voice over IP traffic is a delay-sensitive application, hence it has the highest priority among all applications; streaming video is more delay tolerant than voice over IP and has the next highest priority, and so on. To implement these policies, we would need servers capable of marking appropriate fields in the packet headers and routers that support different service levels for different queues. Most modern servers and network routers can support the functionality required for this type of QoS control. However, different servers (for example, Linux versus Windows servers) and routers (for example, Cisco routers versus Juniper routers) may require different policies specified in different languages or as different configuration scripts. Thus, one of the components in the policy-based management system must be responsible for bridging the gap between what has been specified in the policy creation tool and what can be consumed by the routers and switches. This component would require algorithms that transform policies from one form to another. Finally, we need a method to distribute the transformed policies from the policy creation tool to all the servers and routers. In addition, the creation tool used for authoring policies must have the capability to detect any policies that may be problematic (for example, malformed policies, any mutually inconsistent set of policies, redundant policies, and so on), in order to provide useful feedback to the author of the policies and to catch authoring mistakes before the policies are deployed. Similarly, when policies are received at the servers and routers, they must be able to check the consistency of the group of policies that they received from multiple different sources to reorder policies so that more important policies are executed first, and restructure them so that they are executed in the most efficient way possible. Thus policy analysis algorithms that can be used to ratify a given set of policies are needed by both the authoring tool and the enforcement point. These algorithms for policy analysis and ratification will be discussed in Chapter 5, “Policy Transformation and Analysis.”
Instances of Policy-Based Systems
27
In addition to supporting network QoS, the same general principles can be used to manage the configuration and operation of other types of systems to provide differentiated performance to different types of workloads. Although the exact policy information model representing their policies would be quite different, the nature of the policy transformation, analysis, and ratification algorithms needed for such systems will be similar. In addition, this example highlights how a policy-based management system can start with policies specified at a high level by users and be translated into configuration scripts used to configure servers and routers.
2.2.2. Privacy Policy Publication One of the common usage scenarios for policies involves the declaration of privacy policies used by a public server to protect private user data. Many Web sites publish their privacy policies so that they can be used by clients to determine whether the privacy policies offered by a particular Web site are acceptable to them. In a more generalized version of this, policies can be used to negotiate parameters that govern communications between two entities. There are two parties involved in such a communication—a client and the Web site. Each party needs to have two components: a policy creation tool for each party to define privacy policies that should be used to guide their interactions, and a policy negotiation tool that can conduct the actual negotiations, which can vary depending on the role each party is playing in the communication. In a client/server type of exchange where policies are simply offered and examined without subsequent modifications, a client’s negotiation module would check the policies offered by the server and determine whether they are acceptable, whereas a server’s negotiation module would determine what policies to offer to the client depending upon the identity of the client. In a more active case of negotiations, clients and servers may actively offer modification of policies back and forth in subsequent exchanges to determine how they will communicate with each other. Once again, the algorithms needed for this scenario would require the transformation of policies specified in potentially different languages into a single format for comparison. In addition, they will also require negotiation algorithms to determine which policies are mutually acceptable and which policies are not, and to find the greatest common denominator of client and server policies.
28
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
2.2.3. Policy-Based Management of Enterprise Network Access The first two examples highlighted the need for policy creation, distribution, and transformation tools. This final example will highlight the mechanisms required for policy enforcement. Consider an enterprise, ABC Inc., that controls the network access of end-point devices such as servers, desktops, laptops, and PDAs to enhance security and usability of its network infrastructure. To that end, the enterprise provides the following four levels of access to the end-point devices requesting connectivity to its network infrastructure: • Level 1 provides full access to private intranet and public Internet. • Level 2 provides limited access to a remedial quarantined network. The end-device is given access to a Virtual Local Area Network (VLAN), and the device is told the reason why Level 1 access was denied and what remedial actions need to be taken by the end-device to restore its Level 1 connectivity. • Level 3 provides full access to the public Internet and access to an internal Web page where the end-point device can update its credentials to get an access level that is commensurate with its credentials according to the network access policies. • Level 4 declines connectivity altogether. As indicated in the definition of Level 3 access, in this network, access points use policies to determine the level of access provided to an end-device. The level of access provided to an end-device may be periodically re-evaluated and may change due to changed circumstances. The following is an example of the high-level network access policies that the enterprise may have:
1. Provide Level 4 connectivity to known rogue end-point devices. 2. Provide Level 3 connectivity to unknown end-point devices. 3. Provide Level 2 connectivity to known end-point devices that are in violation of latest enterprise security policies.
Instances of Policy-Based Systems
29
4. Provide Level 1 connectivity to known end-point devices that are compliant with the latest enterprise security policies. 5. Allow specific exceptions to network access policies for specific endpoint devices. When an end-point device first requests network connectivity from an access point, the access point uses standard protocols to determine an identifier (for example, its MAC address or an identity certificate) of the end-device. The access point then sends the collected identity information to a policy decision point and asks for the policy guidance regarding the level of network access that should be granted to the end-point device. Upon receiving a request for policy guidance, the policy decision point contacts a central database that stores characteristics of all known end-point devices in the enterprise. These characteristics may be gathered by the mandatory agents running on all known end-point devices. The agents may periodically (perhaps once a day) send critical device information such as the OS and patch levels, antivirus security levels, firewall status, and compliance to secure configuration settings to the central database. After obtaining relevant characteristics of the end-point device from the central database, the policy decision point evaluates network access policies that govern the access level of the end-device. These policies will be a low-level version of the high-level network access policies described earlier. For example, a lowlevel network access policy may include details of what it means to be security compliant in terms of OS and patch level: If (power-on-password=set) && (screen-saver-inactivity-period < 30 minutes) && (antivirus-version = 9.03.1000) && (firewall-version = 3.5.175.087) && (admin-account-password-length > 7) then access-level = 1.
After evaluating access policies, the policy decision point may find that more than one policy is applicable. For example, upon evaluation, two policies may be applicable. According to the fourth policy, the end-point device should be provided Level 1 access because it complies with all requirements on secure configuration; however, according to the second policy, the device must be assigned Level 3 connectivity because it is a vendor machine. In the cases of such conflicts, the policy decision point would invoke a runtime conflict resolver to execute the decision recommended by one of the policies. Using the priority among policies could be one way to resolve such conflicts.
30
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
The policy decision point keeps a tally of devices connected to the network and periodically reviews their access status. At the time of review, device characteristics are re-evaluated and an appropriate level of network access is enforced. The policy decision point may also get notifications from a traffic-monitoring system installed in the network routers. If a device is receiving or sending undesirable traffic (for example, traffic generated by known worm exploits), then the monitoring service sends a notification containing device location, device address, and traffic characteristics to the policy decision point. The policy decision point re-evaluates the network access of the device and enforces appropriate access by sending fresh directives to the network access point. This example illustrates the complexity of policy enforcement. It requires gathering data about the devices being managed, appropriate mechanisms to trigger evaluation of policies, a centralized policy decision point, communication mechanisms to send policy decision to policy enforcement points, and finally policy enforcement points that are capable of enforcing the policy decision. The rest of this chapter will provide an overview of common mechanisms used for policy creation, distribution, and enforcement.
2.3. Policy Creation As discussed earlier, policies usually start as natural language descriptions. From these descriptions, many details need to be sorted out before policies can be implemented. First of all, to write policies that have an unambiguous meaning, a glossary of terms that precisely describes various components, their functionality, and concepts of system management needs to be agreed upon. For example, CIM standards provide one such standard glossary of terms for systems management. For Web services enabled applications and systems, their Web Services Distributed Management (WSDM) documents may provide such a glossary. Second, the creation module needs to include a user interface to define policies. Because policies are usually components of a larger system, the user interface for policy creation is likely to be a panel embedded within a larger system—for example, a system for managing networks or computer applications. Common guiding principles of good Human Computer Interface (HCI) design should be applied to the design of these panels—for example, it should be simple, intuitive, flexible, and consistent, as is typical to consider in the
Policy Distribution
31
course of normal user interface design. It is likely that the user interface will not allow natural language description of policies unless the system is designed to handle limited vocabulary and syntax for a specific application domain so that the high-level policy descriptions can be mapped to low-level ones without ambiguity. In practice, the user interface will support writing policies in a policy language, present templates that can be filled to define policies, or otherwise constrain the specification of policies so that at least policies are syntactically correct. Typically, the user interface would have constraints in place so that only a few standard terms from the agreed-upon glossary can be placed in each blank position in a template. As an example, let us consider the specification of access control policies for an application. In general, these policies are specified as a tuple containing of three items: a group of users (or role a user is performing), a group of resources that are accessed, and the type of operations the group of users is allowed to perform on the group of resources. Specifying all of such access control policies for each user and role is a tedious process. However, a creation tool may define a template that defines a set of pre-existing roles (for example, Administrator, Privileged Users, Normal Users) and defines default access control policies for them to a generic set of resources (Restricted Resources, Support Devices). In addition to the HCI aspects are several policy-specific considerations that are very useful to provide while defining policies. These policy-specific considerations include schemes to check whether the specified policies are syntactically correct, schemes to detect conflicts among policies, schemes to determine whether a set of defined policies provides sufficient coverage, and schemes to assess the impact of defining alternative policies. In addition, there are schemes for easing the process of defining policies, and schemes for detecting inadvertent changes in the policies caused by another instance of a creation module. After policies are created, they need to be distributed to the locations where they will be used. The approaches for policy distribution that can be used are described in the next section.
2.4. Policy Distribution The distribution process deals with the challenge of moving policies from the policy creation tools to the location where they will be evaluated and enforced. How to efficiently, securely, and reliably disseminate data in a geographically
32
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
dispersed system in general is a well-studied topic. However, the distribution of policies has two unique issues that need to be addressed separately. Firstly, different sets of nodes need to receive different types of policies and the distribution mechanism needs to take that into account. Secondly, the policies distributed from an editor to enforcement points may need some transformations for compatibility purposes. These unique requirements are addressed by three different approaches for policy distribution that are used when building a policy-based management system. Each of the schemes for distribution has its own advantages and disadvantages. The first scheme for distribution is through the use of configuration scripts typically written in a scripting language such as shell scripts or Perl. This distribution scheme can be used by the editor module for policies after it has determined the set of enforcement points to which the defined policies are relevant. The editor software has a set of configuration scripts, one per type of enforcement point where policies need to be configured. The scripts are invoked to transfer the policies over to the appropriate enforcement point. The configuration script may use a protocol that is specific to the type of enforcement point. As an example, if the enforcement point is on a server, then the configuration script may use a secure shell to remotely log on to the server, invoke commands that put the desired policies into effect by modifying suitable configuration files, validate that the right policies have been installed successfully, and then log out of the server. If the enforcement point is on a router, then the configuration script may involve using an interactive script that can telnet into the router and issue the right commands to set the policies into effect. If the organization has a management system installed that provides support for configuration management such as HP OpenView or Tivoli® Netcool network management systems, then an interface to that system can be used for distribution of policies as well. The use of configuration scripts has the advantage that it can be used with almost any type of enforcement point, and it does not place any requirement on the enforcement point. On the other hand, it puts the entire burden of distribution on the editor. The editor needs to support all different types of enforcement points, which can grow into a major activity as new types of devices are added into the system as enforcement points. Due to these drawbacks, the configuration script method is advisable only when policy-based management needs to support legacy enforcement points that lack support for any mechanism to retrieve policies.
Policy Distribution
33
The second scheme to distribute policies is through the use of a repository in the system. The editor puts policies into a well-known repository, which may be a directory or a database accessible by all the enforcement points and the editors. The repository would be accessed by means of an access protocol that would provide authentication, access control, and security. Examples of such access protocols include LDAP [HOSM], SOAP [MILA], and RPC [SRIN]. The enforcement points retrieve the policies from the repository, if required, then translate policies into a configuration compatible with the policies, and configure themselves. A more detailed discussion on how the various aspects of distribution using repositories can be done is provided in the next section. The repository-based distribution scheme can support multiple editors, and can provide a single location from which the set of all active policies within the system can be retrieved and examined. Such an examination can be used for a variety of purposes—for example, satisfying audits and checking consistency of the policies. Additional components can be developed on top of the repository to reconcile differences among the hierarchical specifications of policies, and to transform policies from one format to another. In many enterprise environments, there is a need to maintain information about different versions of policies and to track the changes made to operational policies. The maintenance of versions, including the capability to roll back to a previously used set of policies, is recommended in many typical best practices. A repository-based distribution approach provides an easy way to maintain and manage different versions of policies. Furthermore, repositories can be made highly available and scalable to support a variable amount of load on the policy infrastructure. Unlike the script-based approach, the repository approach can work only with the enforcement points that support a method to access the repository. Thus, it would not work for all environments, especially for legacy systems that have not been developed with policy-based management in mind. Furthermore, if the networked environment in which policy is being applied is highly dynamic, such as ad hoc wireless networks, the notion of a static repository of policies may be hard to implement. A third possible approach to distribution of policies can be provided by using a messaging system known as a pub-sub (publication-subscription) system in the IT world. A typical messaging system provides the abstractions of multiple message queues that are supported in a distributed manner. Each of the queues is given a
34
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
name; such a name is usually referred to as the subject or topic name of the queue. The subject or topic name is the mechanism to link publishers and subscribers of information. Publishers produce messages on a particular subject or topic name, and subscribers register their interest in specific set of subjects, which may be updated from time to time. After an application registers interest on a subject topic, it receives the messages created by the publishers of that topic. Information is pushed to subscribing applications as it is generated. Publishers and subscribers can join and leave at any time. The middleware is responsible for routing messages between the publishers and the subscribers as well as for providing a resilient infrastructure that can withstand failures of individual nodes. A typical implementation of the messaging architecture would be through one or more message brokers. End-clients or agents register their interest in a topic with a message broker to receive messages, and to send any messages tagged with the topic name on which it is published to the message broker. The brokers manage information about the topology of the different publishers and subscribers and route messages between them. Commercial messaging systems provide additional mechanisms to filter, analyze, and perform various kinds of access control on the messages that are required for robust operation in an enterprise context. Some commonly known messaging systems include IBM WebSphere® MQ series, TIBCO Rendezvous, and OSMQ, which is an open source Java-based messaging system. Using a messaging system approach, enforcement points need to subscribe to a topic, which will get them the policies they are interested in, and editors publish policies on the same topics. Transformations are allowed in the messaging infrastructure as additional functions on the messaging fabric, although care needs to be taken in selection and assignment of topic names so that the messages from editors are received by enforcement points only after passing through the transformation stage. Messaging systems have the same flexibility advantages as the repository-based approach in distribution of policy. In addition, they provide a nice abstraction between the policy editor and the policy end-points, and are more resilient and scalable. On the other hand, messaging infrastructures can be costly to implement. One can also implement mixed distribution mechanisms, using messaging systems for notifying changes in policies, and using the repository to maintain a persistent record of version changes.
Policy Distribution Using Repositories
35
For many policy systems in enterprise environments, a repository-based scheme seems to provide the right trade-off between functionality and cost. In the next section, we describe how the repository-based distribution scheme operates in more detail.
2.5. Policy Distribution Using Repositories The policy repository stores different types of policies as units of data that are created, retrieved, modified, and deleted by different entities in the system. A policy repository can be implemented using a variety of technologies, each of which has their own set of advantages and disadvantages. One of the primary factors guiding the design of a policy repository is the need to export a protocol that allows remote access of policies by system components. For some application domains, the preferred choice for the repository has been a directory server that could be accessed by a remote access protocol such as LDAP. Several security products store access control policies in LDAP directories, and it was the preferred method to implement repository for policies in the context of industry initiatives such as Directory Enabled Networks (DEN). Another common option, typically used in storage applications, is the use of a database as a policy repository. Many common programming languages provide standard packages to remotely access a database. One example of such a package is the JDBC (Java Database Connector) library, which together with Java RMI (Remote Method Invocation) libraries, enables remote access to databases in programs written in the Java programming language. JDBC provides an interface to access different types of databases, whereas RMI enables the invocation from a remote machine. With the recent gain in prominence of services-oriented architectures (SOA) that use Web services as the common method for interfacing across distributed components, another option is to expose a policy repository as a Web service. Underneath the interfaces provided by the Web service, the policy repository may be implemented as an LDAP directory, a database server, or any other implementation for providing remote access to a set of data. Regardless how policy repository is implemented, it must be highly available because policies are critical in a policybased management environment. Fortunately, all the options discussed earlier can be readily implemented in a highly available configuration.
36
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
The effectiveness of identifying the common components depends on how many policies are stored within the repository. If the repository is used to store only a small number of policies, any such optimization on reusing components may be unwarranted. However, when a policy repository is used to store a large number of policies, efficiency optimizations can prove quite useful when making wholesale changes to policies and distributing them. Regardless of the underlying technology used in its implementation, a policy repository can exploit properties of the policies to increase its efficiency. Some of these properties include the following: • Policies are usually applicable to system components as a group instead of being applicable individually. • Policies are modified relatively infrequently in comparison to how often they are accessed. • Policies often need to be retrieved and stored from remote distributed locations. • Policies often have common components that are reused and shared across several policies. In the following sections, two techniques that exploit these properties are discussed. These techniques are often used by policy repositories to increase their efficiency.
2.5.1. Grouping of Policies by System Components Role A distributed system consists of multiple components, where each component is realized as a software program, a piece of hardware, or both. In a distributed system, each component has a role to play in that system, and many components in the system may have the same role. As an example, a traditional high-volume Web site system consists of a front-end tier of several caching Web proxies, a middle-tier of several Web servers, and a back-end tier of several database servers. Although a high-volume Web site system may consist of tens of servers (for example, it may have 15 caching proxies, 9 Web servers, and 6 database servers), it will likely have only three sets of policies defined, one appropriate to all the Web servers, the other appropriate to all the Web proxies, and a third appropriate to all the database servers. A more detailed map of the Web site may include network devices (for example, routers, firewalls, and load-balancers), and each of the various types of network devices may have a different set of applicable policies.
Policy Distribution Using Repositories
37
Thus, in general, policies are defined, distributed, and evaluated as a group, with each group of policies being applicable to a role that is played by more than one component in a distributed system. Assuming that system components query a policy repository for policies applicable to their roles in the system, it would be more efficient for the repository to store all the policies defined for a specific role as a unit that can be returned back in aggregate in response to such queries. The drawback of grouping policies by the role of system components is that updating an existing policy would be somewhat slower than normal if the system is not carefully designed. A policy may be applicable to more than one role. Thus, each policy would need to maintain pointers to the different policy groups in which it participates. When the policy is updated, these pointers would need to be followed and all copies of that policy would need to be updated. Although this is more expensive than maintaining a single copy, it is still beneficial if devices will be querying policies corresponding to their roles much more often than the frequency at which policies are modified.
2.5.2. Grouping of Policy Components As mentioned in Chapter 1, “Policy Definition and Usage Scenarios,” policies have several components. For example, an event-condition-action policy has three components: events that trigger the policy, the condition part, and the action part. These policy components can be further broken down into subcomponents—for example, the condition part of a policy may be a composite condition that consists of two subconditions: one specifying the time period during which it is valid and the other specifying the system state which requires the action specified by the policy. It is not unusual for many policies to share identical components or subcomponents. When policy repositories are designed, one can take the approach that each policy is an indivisible unit that needs to be stored and handled independently. However, this approach loses several advantages that can be exploited by realizing that policies consist of components that are often reused across multiple policies. For example, if components were stored as units of data, then modification to a single component can be made that will effectively modify all the policies in which this component is reused. Another advantage of storing policy components as units of data is consistency. Many times the specification of policy components is complex, and by storing welldebugged policy components and reusing them, the chance of making a mistake in the policy specification can be reduced.
38
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
To take advantage of the common components, the policy repository must be aware of the underlying policy information model that drives the policy representation. The topic of policy information models will be discussed in detail in the next chapter.
2.6. Policy Creation and Distribution for Multiple Administrative Domains The discussion of policy creation and distribution thus far has implicitly assumed that policies are defined for systems under a single administrative control. However, there are many case when, for business or legal purposes, a distributed system is partitioned into several administrative domains. In such situations, each domain defines its own policies to govern the systems under its own control. However, there are several situations in which these administrative domains cannot work independently and they need to collaborate regarding their system policies to work effectively. To give a flavor of such situations, consider a global enterprise arranged in a hierarchical structure. At the global level, the enterprise defines its IT policies that are consistent with the enterprise values, legal imperatives, and best practices. Examples of such policies include “An authorization from the office of Chief Privacy Officer is required before an employee’s record of email communications is released outside the line management” or “All email communications should be preserved for at least three months,” and so on. At the country level, various subsidiaries of the enterprise may define their own policies that satisfy specific country-specific regulations. Examples of such policies include “Multi-factor authorization is required before someone can access private employee records,” “All documents marked as US GOVT CONFIDENTIAL must be encrypted using 256-bit encryption keys on employee laptops,” and so on. Further down, at the state and local levels, there may be specific exceptions to the global and countrywide policies. Clearly, to support a hierarchical federation of policies, we need an algorithm that can merge two sets of policies. Before the two sets of policies can be merged, they need to be normalized so that the two sets use the same structure and terms for representing policies. If the two sets of policies are defined using different terms for referring to the same entities in the system, then the first step is the normalization of terminology.
Policy Creation and Distribution for Multiple Administrative Domains
39
Assuming that the terminology has been normalized for the set of policies being merged, the next step is to identify and resolve conflicts among the two sets of policies. The algorithm for conflict detection mentioned previously can be used for this purpose. The conflict detection needs to compare all policies in the first set with all policies in the second set, but it suffices to describe the merge algorithm in terms of comparing a single policy (first policy) from the first set to a single policy (second policy) from the second set. As a result of running the conflict detection process between these two policies, there are three possible outcomes: • There are no overlaps in the conditions of the first policy and the second policy. • There is an overlap in the conditions of the first policy and the second policy, but the actions specified by the two policies can be undertaken concurrently. • There is an overlap in the conditions of the first policy and the second policy, but the actions specified by the two policies cannot be taken concurrently. In the first two cases, the merger of the policies poses no problem. In the third case, there is a conflict in the merge, and a decision needs to be taken as to how the conflict should be resolved. One way to resolve the conflicts would be to assign priorities to the different sets of policies—for example, the policy of a parent organization always takes precedence over the policy of a subunit of that organization, or the policy specified for a specific entity takes precedence over a general policy. Another way to resolve the conflict would be by using meta-policies—that is, policies for managing policies. In this context, metapolicies themselves are policies that determine which policy takes precedence over another policy. A meta-policy would define the conditions under which the conflict ought to be resolved in favor of the first policy, and the conditions under which the conflict ought to be resolved in favor of the second policy. After determining the merged set of policies, one may want to apply the coverage algorithms to determine whether the policies that have been defined for a target system will cover all possible cases of system conditions and whether there are any policies in the merged set that will never be used. In that case, those redundant policies ought to be removed so that they do not consume space or computation time unnecessarily, and the result provides the merged set that can be used for policy enforcement.
40
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
Apart from the hierarchical scenario discussed earlier, another interesting situation arises when two systems in two different administrative domains interact in a peer-to-peer manner. In such cases, each system wants to maintain its own policies while conforming with each other’s policies. If their policies conflict with each other, then clearly the systems cannot interact and still be in compliance with their policies. In this case, for systems to interact, either some portion of the policies will have to be changed or policy violation has to be tolerated. An interesting situation arises when there are multiple choices of behavior that one system can take in their interaction with the other system and still conform with its own policies. For example, in many cases, policies need to be defined for an exchange or communication that happens among two or more entities within a distributed system. As a simple example, when two computers need to communicate securely, they need to negotiate the algorithms and parameters for the encryption algorithms. It is not unusual for systems to define policies on the maximum amount of time any specific encryption key ought to be used— for example, a shared key should not be used for more than a few minutes of communication after which another shared key is negotiated. As part of the negotiations for communication, the policies ought to be negotiated as well. In many cases, the negotiation can be reduced to the selection of a few simple parameters—for example, the policy for periodically switching keys can be reduced to a negotiation of a common time period for renegotiating the keys. Negotiations of policies in this case can be implemented in the following manner. Each of the negotiating parties has a range of parameter values that they can accept. The acceptable range may be explicitly enumerated—for example, only DES (data encryption standard), triple-DES, or 40-bit DES (these are three different algorithms for encryption) are the allowed values for the encryption algorithm parameter. Or the acceptable range may be specified as a range (the time period for renegotiating keys should be no less than 30 seconds and no more than 10 minutes). In the case of a two-party negotiation, one of the parties can send an acceptable range of policies to the other party, and the other party can select a value of the parameter that is within its acceptable range and is offered by the other party. In the case of a multiparty negotiation, one of the parties can act as the leader for the negotiation, obtaining the acceptable ranges from each of the other parties and selecting a value within the acceptable range for everyone. The preceding describes some of the simplest schemes for negotiation of policies and parameters. It is also possible to envision more complex negotiations
Policy Enforcement
41
in which different parties do not want to reveal information about their acceptable ranges of parameters, or want to strive to choose parameters that have specific benefit to them—for example, a system may try to maximize the value of one of the negotiated parameters. Although some very interesting negotiation algorithms can be defined for those cases, we have not encountered the need for such sophisticated negotiations in the domain of computer systems, and therefore consider them outside the scope of this book.
2.7. Policy Enforcement The network access scenario described in the beginning of this chapter gives an idea of the typical steps involved in the process of policy enforcement. As shown in Figure 2.2, the policy enforcement process is initiated at the policy decision point upon receiving an evaluation trigger. In PBMS there are three common types of evaluation triggers: event-based, schedule-based, and an explicit request from a managed system component. The type and the contents of the evaluation trigger are used by the policy decision point to determine an enforcement context, which describes the situation in which a policy decision must be made. For example, the enforcement context could be an access request, start of an application, and so on. The enforcement context is used by the policy decision point to select policies relevant to the current round of policy enforcement. We refer to the selected policies as the relevant policies. After determining the relevant policies, policy decision point initiates the datagathering phase of policy enforcement. In this phase, all data required to evaluate the relevant policies is gathered. After the data-gathering phase is over, the policy decision point evaluates relevant policies and finds applicable policies that should be executed in the current round of policy enforcement. It is possible that policies selected for execution specify conflicting guidance. Please note that certain policy conflicts cannot be resolved at the time of policy creation, and they become apparent only at runtime (we will discuss different types of policy conflicts in Chapter 5). Thus, before applicable policies are executed, they are resolved for any potential conflicts. The policy decision point subsequently sends the resolved actions to the policy enforcement point for execution. The final step of the policy enforcement process is the logging of policy enforcement actions for debugging, compliance checking, and auditing activities in the future.
42
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
Evaluation Triggers
PDP
Start Policy Enforcement
Generate Enforcement Context Enforcement Context Select Relevant Policies
Gather Data
Relevant Policies Evaluate Policies
System Data
Applicable Policies Resolve Runtime Conflicts
Decisions Execute Decision
Figure 2.2
The Policy Evaluation Process
The generic policy enforcement steps just described can be eliminated or simplified in many implementations based on the requirements of a particular system. For example, an application may use policies in only one enforcement context, and therefore, it does not need to determine the enforcement context. In the rest of this section, we will take the enforcement process outlined in Figure 2.2 as a baseline and give more details about each of the policy enforcement steps identified in the figure.
2.7.1. Policy Evaluation Trigger Most PBMSs initiate a round of policy evaluation upon receiving a trigger for evaluation. There are three common types of evaluation triggers: event-based, schedule-based, and the ones based on an explicit request from a system component.
Policy Enforcement
43
An event-based trigger for policy evaluation may be generated by a monitoring system after observing a predetermined pattern of events in the managed system. A special case in which this type of trigger is used is in policy systems that employ event-condition-action policies where the policy definition explicitly identifies events that should trigger its evaluation. In such systems, the policy decision point may compile a list of all the events that occur in its policies, and notify the event-monitoring system of its interest in such events. Many PBMSs do not have a separate event-monitoring infrastructure, and the policy decision point itself may look for the events of interest. Schedule-based evaluation triggers cause enforcement of policies at predetermined time instances. One example of such a trigger is a timer event in an email server that may generate a trigger every morning at 3:00 AM to purge email documents no longer required to be archived. Finally, policy enforcement may be triggered by an explicit request from a managed system component for policy guidance. In such cases the managed element reaches a decision point during the course of its operation, and to further determine its course of action, it asks a policy decision point for policy guidance. For example, when a user logs in to a server, the login manager may ask a policy decision point for policy guidance to assign appropriate privileges to the user, or when a system component first starts, it may contact the policy decision point to obtain necessary configuration in accordance to the current policies in the system. Event-based and schedule-based evaluation triggers correspond to the unsolicited mode of policy guidance because in such cases system components do not explicitly ask for policy decision. On the other hand, evaluation triggers generated by explicit request from a system component correspond to the solicited mode of policy guidance. Typically, when a system component requests solicited guidance, it expects a synchronous reply back from the policy decision point; in other words, the system component would wait until it receives guidance from the policy decision point before proceeding further. Note that a system component can implicitly ask for policy guidance by emitting an event that warrants policy guidance. In such cases, the managed element would receive a response from the policy decision point asynchronously, that is, after emitting the event, the managed element would perform other operations while waiting for the policy decision point to return policy guidance in response to the emitted event.
44
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
The synchronous and asynchronous request-reply interaction of the respectively solicited and unsolicited mode of requesting policy guidance has many secondary implications. In synchronous interaction, the system component establishes a connection with the policy decision point, and the policy decision point can take advantage of the established connection to return the policy guidance. In the unsolicited mode, the policy decision point would have to discover the relevant system components that needs policy guidance and send a result of policy evaluation to it. Furthermore, the processing of asynchronous replies at the system component is typically more complicated than the processing of synchronous replies because the system needs to put efforts to match a response to one of the requests that have been sent earlier.
2.7.2. Policy Enforcement Context One of the functions of an evaluation trigger is to provide a context for further policy enforcement. A policy enforcement context is established by any additional information used to determine the condition and the specific target system to which a policy is applicable. For example, an event-based evaluation trigger generated due to a server overload may provide information about which server is overloaded and what the current CPU utilization is. A schedule-based trigger may provide names of specific policies that need to be evaluated at a given time, and a list of system components for which each of these policies should be evaluated. Finally, when the evaluation trigger is generated by an explicit request from a system component, it may provide information regarding why it needs the policy guidance (that is, for access control, for its initial configuration, and so on). Thus, evaluation triggers help establish the context in which policy enforcement should take place. The enforcement context serves many important functions: First, the enforcement context is used to determine the target of policy enforcement, which may range from a single system component to all system components of a particular type. Second, the enforcement context is used to determine which group of policies needs to be enforced. Additionally, the enforcement context may carry initial boot-strapping data that will let the policy decision point establish a contact with target system component(s). In a solicited policy request, the system component making the request knows about the precise context of its request, and therefore it can send significant details regarding the context of policy request in its evaluation trigger. However, an evaluation trigger generated by a monitoring system after observing a pattern
Policy Enforcement
45
of events may only contain the GUID (globally unique identifier) of the device that generated the event pattern and a primary cause of event generation (for example, startup or shutdown of a device). The context generator would then have to gather additional contextual information—for example, resolve a GUID to a particular customer to help find policies that are relevant for that customer, and construct a context for enforcement. For a schedule-based evaluation trigger, the context may be specified by the user when the schedule was written—for example, enforce all backup and archiving policies for Database Server 1 and 3 every morning at 3:00 AM, and for Database Server 2 and 4 at 5:00 AM.
2.7.3. Data Gathering After the enforcement context is known, the policy decision point can deduce targeted system components for policy enforcement and the set of policies that needs to be enforced on each of the targeted system components. After this, the enforcement process enters in the data-gathering phase during which the policy decision point collects all data required for the evaluation of relevant policies. The mechanism for gathering data varies widely from system to system. The data may be obtained by invoking a local system call, a Remote Procedure Call (RPC) on the system components, pulling data from a database (for example, an LDAP database to find credentials of an entity), or by invoking third-party services (for example, by polling a performance monitor, and so on). There are a host of engineering issues related to data gathering, such as the discovery of data sources, naming, security (credentials and permissions for gathering data), and encoding used for data transfer. In many systems, components may expose standard interfaces, such as the CIM interface or Web Service Distributed Management (WSDM) interface, that can be used to resolve these engineering issues. However, distributed systems often include a large number of legacy components and they require customized data-layer adapters to gather data. One significant issue in gathering data revolves around the time it takes for a centralized policy decision point to gather data in a large distributed system. In many systems, the policy decision point will need to establish a connection with hundreds of system components, send specific queries to these components, and wait for a response. Many system components give low priority to management requests for gathering data. For example, in networking routers the highest priority by the router operating system is given to processes that
46
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
are forwarding data packets; queries that gather configuration data have to wait until the router processor has free cycles. This induces large delays in gathering configuration data from routers when the traffic load is heavy. It follows that, given the delay in gathering data from a single system component, we may not be able to get a full snapshot of the whole system, but only measurements from different components at significantly different points of time. Using such data to evaluate policies may be meaningless if policies are trying to correlate data from a single snapshot of the system. Thus, in this case, the policy author should write policies only against system attributes that change on a longer time-scale than the time-scale required for data gathering. For enforcing constraints that require a much smaller reaction time, a centralized policy decision point may not be the answer. In such cases, each component may need to have a small policy decision point, and high-level policies that govern the behavior of the overall system can be refined to generate policies that need to be enforced at individual components. The embedded policy decision point then executes the refined policies and keeps the whole system compliant with the other policies. Of course, for many systems, such refinement is not possible; in these cases, the system administrator needs to find a way to manage them with some other mechanism. On the flip side, the data-gathering phase can be completely avoided in many situations. For example, in a solicited call, a system component asking for policy guidance may pass all the data required for policy evaluation to the policy decision point along with the solicited request and eliminate the need for a separate data-gathering phase. In that case, however, we assume either that the system components know about all the attributes against which policies are written and therefore can selectively pass data required for policy evaluation, or that the total number of attributes is small and therefore all attributes can be passed along with the guidance request without incurring too much overhead.
2.7.4. Policy Evaluation In policy evaluation phase, the decision point evaluates all relevant policies to find policies that are applicable in the system. As discussed in Chapter 1, policies can be specified in many forms. They could be simply specified as a configuration or a metric constraint, or the policy could be of the form “if condition then action” and so on. In general, algorithms to efficiently evaluate policies need to strike a trade-off among different performance metrics, such as the time taken for policy evaluation, memory footprint, scalability in terms of
Policy Enforcement
47
number of concurrent evaluations, frequency of policy updates, and so on. Different domains and applications need to strike different trade-offs between these quantities, and correspondingly, many different policy evaluation algorithms have been developed. To give a flavor of these algorithms, we will give a brief description of three such algorithms: simple table lookup, generic expression evaluator, and fast tree-based search algorithms. 2.7.4.1. Table-Based Algorithms
A table-based algorithm is suitable for policies whose conditions are conjunctions of comparative conditions on individual variables. The table in this case has one column for each variable and one column for each action. Each row in the table expresses a policy. A straightforward evaluation algorithm without any fancy optimization would simply traverse the table row by row and evaluate columns left-to-right either until a comparative condition specified by a column is not satisfied (policy is not applicable) or until all conditions are satisfied (policy is applicable). This algorithm has a relatively straightforward implementation. It works well when the number of attributes that occur in policies and the total number of policies are small, and system resources (CPU and memory) are not at a premium. 2.7.4.2. Generic Boolean Expression Evaluator
This is another straightforward implementation of the policy evaluation. Essentially, in many policy models, a policy is applicable if a Boolean expression derived from the policy evaluates to true. For example, the Boolean expression could be the condition part of an “if condition then action” policy. In such cases, the evaluation engine can extract from each policy a corresponding Boolean expression and evaluate it using a generic Boolean expression evaluator to see whether the policy is applicable. The evaluation engine evaluates each policy one by one and does not optimize policy evaluation by considering multiple policies at a time. 2.7.4.3. Complex Evaluation Algorithms
In many cases, the policy structure lends itself to evaluation optimizations. For example, if a large number of policies share a condition component that occurs in conjunction with other condition components, then it may make sense to first evaluate the condition component shared by a large number of policies. If this component evaluates to false, then all policies sharing this component are not applicable.
48
Chapter 2
•
Policy Lifecycle—Creation, Distribution, and Enforcement
Policies may have more sophisticated interdependencies that can be exploited for faster policy evaluation. For example, consider a situation where the system has four policies: P1:If P2:If P3:If P4:If
(X (X (X (X
< ? < ?
5) 5) 5) 5)
&& && && &&
(Y (Y (Y (Y
< < ? ?
4) 4) 4) 4)
then then then then
action=1; action=4; action=2; action=3;
In a generic expression evaluator, for each policy evaluation, we have to make two comparison operations. Assuming that we need to find all applicable policies, we need to evaluate all policies, resulting in eight comparison operations. The complexity of the generic expression evaluator will be linear in the number of policies. However, these policies have interdependent condition components. Clearly, only one policy can be true at a time and the true policy can be found by using just two comparison operations—that is, by comparing the value of variables X and Y to 5 and 4, respectively, as shown in Figure 2.3. A more general tree-based algorithm for policies is described in [COST]. The complexity of the tree algorithm is typically logarithmic in the number of policies. Yes
Yes
P1
Figure 2.3
Y< 4
No
X t * Count (call_made). restricted_mode causes restrict_calls restricted_mode, ^ (call_made | time_out) triggers normal_mode if Count(time_out) < t’ * Count (call_made). normal_mode causes accept_call_calls,
In this sample code, normal_mode and restricted_mode are policy-defined events, call_made and time_out are primitive events, and restrict_calls and accept_all_calls are actions that can be taken. Here ^ represents the Boolean NOT operator and | represents the Boolean OR operator. As mentioned earlier, PDL has the limitation that rules cannot be grouped in any form, and all policies have a flat structure.
76
Chapter 4
•
Policy Languages
4.2.2. Ponder Ponder [POND] is a general policy language that can define both access control rules like XACML, called authorization rules, general management rules a la PDL, called obligations, and policies related to certain roles or positions. For example, one can write an authorization policy rule with subject, target, action, and arguments to specify that a subject is authorized (or not permitted) to perform a certain action on target objects. The syntax of the authorization policy rule is as follows: inst (auth+ | auth-) policyName “{“ subject [ domain-Scope-Expression ; target [] domain-Scope-Expression ; action action-list; [when constraint-Expression ; ] “}”
The bold words are keywords and the elements within routing brackets () separated by “|” are choices. Optional elements are specified in square brackets [], and repetition is specified with braces {}. Here auth+ represents a positive authorization and auth- represents a negative authorization. The following examples show positive and negative authorization policy rules. inst auth+ switchPolicyOps { subject /NetworkAdmin; target /Nregion/switches; action load(), remove(), enable(), disable() ; }
The preceding Ponder statement means the following policy rule: Members of the NetworkAdmin domain are authorized to load, remove, enable, and disable objects of type PolicyT in the Nregion/switches domain. This indicates the use of an authorization policy to control access to stored policies. inst auth- /negativeAuth/testRouters { subject /testEngineers/trainee ; action performance_test() ; target /routers ; }
The preceding policy specifies that trainee test engineers are forbidden to perform performance tests on routers specified as router type routerT. This policy is stored within the /negativeAuth domain.
Survey of Policy Languages
77
Those examples show direct declaration of policy instances using the keyword inst. The language provides reuse by supporting the definition of policy types to which any policy element can be passed as a formal parameter. Then multiple instances can be created and tailored for a specific environment by passing actual parameters. The following shows an example: type auth+ PolicyOpsT (subject s, target t) { action load(), remove(), enable(), disable() ; } inst auth+ switchPolicyOps=PolicyOpsT(/NetworkAdmins,/Nregion/switches); inst auth+ routersPolicyOps=PolicyOpsT(/QoSAdmins, /Nregion/routers);
The two instances allow members of /NetworkAdmins and /QoSAdmins to execute the actions on policies within the /Nregion/switches and /Nregion/routers domains, respectively. In addition to authorization policies, Ponder can write delegation rules (in which a subject temporarily grants access rights to others), information-filtering rules (similar to the condition part in a PDL policy), refrainment rules (indicating that a subject should refrain from performing certain actions), and obligation rules. In Ponder, obligation rules are always triggered by a particular event. Ponder supports simple composition of events, which can also be used to trigger an obligation rule. Ponder also has a concept of meta-policies—policies about the policies within a composite policy or some other scope. It specifies meta-policies for groups of policies to express constraints which limit the permitted policies in the system. A meta-policy is specified as a sequence of OCL (object constraint language) expressions [OCL]. The raises clause is followed by an action that is executed if the last OCL expression evaluates to true. inst meta metaPolName raises exception [ “(“ parameters “)” ] “{“ {OCL-expression} boolean-OCL-expression “}”
For example, the following meta-policy presents a conflict of duty in which the same person cannot both submit and approve budget: inst meta budgetDutyConflict raises conflictInBudget(z) { [z] = self.policies -> select (pa, pb | pa.subject -> intersection (pb.subject)->notEmpty and pa.action -> exists (act | act.name = “submit”) and
78
Chapter 4
•
Policy Languages
pb.action -> exists (act | act.name = “approve”) and pb.target -> intersection (pa.target)->oclIsKindOf (budget)) z -> notEmpty ; }
Like XACML, policies can be organized into groups. Policy groups present structures that reflect organizational structure, the natural way system administrators operate, or provide reusability of common definitions. The following example shows policies in a login group, which authorize staff to access computers in the research domain, record login attempts, load the users environment on the computer, and deal with login failures: inst group loginGroup { inst auth+ staffLoginAuth { subject /dept/users/staff ; target /dept/computers/research; action login; } inst oblig loginactions { subject s = /dept/computers/loginAgent ; on loginevent (userid, computerid) ; target t = computerid ^ {/dept/computers/} do s.log (userid, computerid) -> t.loadenvironment (userid); } inst oblig loginFailure { ... } }
Finally, Ponder supports writing a role policy used to define a role so that a somatic group of policies with a common subject can be created easily. The following provides an example of a role policy: type role ServiceEngineer (CallsDB callsDb) { inst oblig serviceComplaint { on customerComplaint(mobileNo) ; do t.checkSubscriberInfo(mobileNo, userid) -> t.checkPhoneCallList(mobileNo) -> investigate_complaint(userId); target t = callsDb ; // calls register } inst oblig deactivateAccount { ... } inst auth+ serviceActionsAuth { ... } // other policies }
Survey of Policy Languages
79
The reading of this policy is as follows. The role type ServiceEngineer models a service engineer role in a mobile communication service. A service engineer is responsible for responding to customer complaints and service requests. The role type is parameterized with the calls database—a database of subscribers in the system and their calls. The obligation policy serviceComplaint is triggered by a customerComplaint event with the mobile number of the customer given as an event attribute. On this event, the subject of the role must execute a sequence of actions on the calls database to check the information of the subscriber whose mobile number was passed in through the complaint event, check the phone list, and then investigate the complaint. Note that the obligation policy does not specify a subject because all policies within the role have the same implicit subject.
4.2.3. CQL In this chapter, we review two policy languages designed to work with the CIM information model. One is CIM-SPL, which we will study in the next subsection and the other is the CIM Query Language or CQL [CQL]. CQL is designed primarily for querying the CIM information model, and thus is not a true policy language. CQL is a DMTF preliminary standard that facilitates writing queries for extracting data from a CIM data management infrastructure. CQL closely mimics the SQL query syntax of select-from-where. The key difference is that a CQL query is applied to CIM properties (for SELECT clauses) and CIM classes (for FROM clauses), whereas an SQL query is written for a database. It also defines the conditions and data for indications, which models the handling of events specified by CIM_IndicationFilter. The CQL preliminary standard document suggests a mechanism to use CQL for the specification of CIM policy rules. To specify a CIM-style policy rule (as defined by the CIM Policy Model), CQL queries are used to specify both the condition part and the action part of the rule. For example, in a policy that allocates more space to a disk low on free space, the condition query first selects the disks with low space, and then the query action invokes a method to increase space to those selected disks. This makes the definition of actions very awkward because the policy author needs to generate a relational table that encodes the action call, and then process the CQL query action with completely different semantics from the semantics defined for the condition. For example, the following CQL statements specify a policy to identify a StoragePool that is low on space and allocate more space. The first query is used
80
Chapter 4
•
Policy Languages
in a QueryCondition; let us assume that the name of the results of this query is set to “NeedySPPath”. This query selects a StoragePool that is low on space. SELECT OBJECTPATH(IM.SourceInstance) AS NeedySPPath FROM CIM_InstModification AS IM, CIM_PolicyRule AS PR, CIM_PolicySetAppliesToElement AS PSATE WHERE IM.SourceInstance ISA CIM_StoragePool AND PR.Name = ‘AllocateMoreSpace’ AND OBJECTPATH(PR) = PSATE.PolicySet AND OBJECTPATH(IM.SourceInstance) = PSATE.ManagedElement AND 100 * (IM.SourcInstance.CIM_StoragePool::RemainingManagedSpace / IM.SourcInstance.CIM_StoragePool::TotalManagedSpace) < 10 AND IM.SourcInstance.CIM_StoragePool::RemainingManagedSpace IM.PreviousInstance.CIM_StoragePool::RemainingManagedSpace
And the next query is used in MethodAction to invoke the CreateOrModifyStoragePool method. It uses the PR_Needy instances generated in the previous QueryCondition. SELECT OBJECTPATH(SCS) || ‘.CreateOrModifyStoragePool’ AS MethodName, QCR.NeedySPPath AS Pool, QCR.NeedySPPath.Size + (QCR.TotalManagedSpace / 10) AS Size, OBJECTPATH(SP) AS InPools FROM PR_Needy AS QCR, CIM_ServiceAffectsElement AS SAE, CIM_StorageConfigurationService AS SCS, CIM_StoragePool AS SP, CIM_AllocatedFromStoragePool AS AFSP WHERE QCR.NeedySPPath = SAE.AffectedElement AND OBJECTPATH(SCS) = SAE.AffectingElement AND SP.ElementName = ‘FreePool’ AND QCR.NeedySPPath = AFSP.Antecedent AND OBJECTPATH(SP) = AFSP.Dependent
Even for the specification of a query condition, CQL imposes a complicated syntax because CIM is based on an object model and CQL assumes a relational model. Thus even for a simple comparison of the values of two object properties, CQL requires a query that interprets the objects as relations.
Survey of Policy Languages
81
4.2.4. XACML The OASIS eXtensible Access Control Markup Language (XACML) [OASI] provides a means for writing access control policies. In addition to the syntax and semantics for access control policies, XACML also prescribes a standard format for access control requests and responses. An XACML rule has a conditioneffect structure, which is similar to that of PDL, except that effect can take only one of two values: “Permit” or “Deny.” An XACML rule also has a target component that defines the set of resources, subjects, actions, and environments to which the rule applies. A significant difference from PDL is that policy authors can specify intricate orders of rule evaluations. In PDL the order of the rules is irrelevant. In addition, XACML policies can be organized in hierarchical policy groups. If we compare the syntax of XACML to PDL, we will find several differences and the two languages would look very different. However, if we compare the information model of the two languages, the similarities as well as differences will become obvious. There is no event in XACML, unlike PDL, and the reason is that the event is implicit in the access control domain—the policies are invoked when an attempt is made to access a resource. The set of possible actions that can be defined in XACML is restricted. The ordering and hierarchy information is missing from PDL.
4.2.5. ACPL The Autonomic Computing Policy Language (ACPL) from IBM [ACPL] is an attempt to support distributed IT systems management by extending the Policy Core Information Model (PCIM) model [PCIM, PCIME]. In PCIM, each policy rule can specify a condition and an action (that is, if the condition is true, then the action is executed). In addition, each policy rule has a role that specifies devices to which the policy rule should be applied, and an integer priority that specifies the relative priority of the policy rule in a policy group. ACPL supports two different formats to specify policy rules: an XML format for easy distribution and support from standard XML tooling for common functions such as parsing and validation, and a simple text format for writing policies using a text editor. Unlike most other policy languages, ACPL allows the user to extend the language syntax by defining new operations. The XML version of ACPL has a one-to-one correspondence to the text version of ACPL.
82
Chapter 4
•
Policy Languages
The text version of ACPL has been subsequently standardized [SPL] by DMTF into the CIM-SPL (Simple Policy Language for CIM). CIM-SPL has been designed to comply with the CIM Policy Model and it fully incorporates CIM constructs. Its design also includes many features of, and lessons learned from, the design and use of the other languages described earlier. It also provides a vehicle for highlighting relevant aspects of CIM and CIM policy model. In the next section, we will study CIM-SPL in detail.
4.3. CIM-SPL In this section, we will make extensive use of CIM terminology. For a basic understanding of this terminology, we refer the reader to CIM standards and tutorials available from the DMTF site [CIM].
4.3.1. CIM-SPL Policy Rules To understand CIM-SPL policy rules, let us consider the following example: If the file system is 85% full then expand the storage pool by 25%.
Two elementary questions arise from this policy: • What kind of data is available to determine that the file system is 85% full and how can we access it? • What kind of operations are available to manipulate the storage pool and how can we access them? In the context of CIM there are natural answers to these questions. The data available is contained in the CIM data structures and it can be accessed via a CIM Object Manager (CIMOM). The operations are methods defined in CIM Classes and methods defined for manipulating CIM Objects (for example, methods for creating a new CIM instance or for setting the value of a property). These operations correspond, respectively, to the extrinsic and the intrinsic operations described in the Web-Based Enterprise Management (WBEM) standards and can be accessed using a WBEM implementation. Returning to our example, a file system can be represented in CIM as an instance of the CIM_LocalFileSystem class which has, among others, two properties, AvailableSpace and FileSystemSize. These properties can be used to
CIM-SPL
83
find the percentage of space in use. In the CIM model of the system that contains the file system, there also exists an object that is an instance of the CIM_StorageConfigurationService class. This object can be accessed by traversing CIM associations starting at the CIM_LocalFileSystem object. Then we can use the CreateOrModifyElementFromStoragePool method from the CIM_StorageConfigurationService object to expand the storage pool. We now give a general overview of the syntax of CIM-SPL policy rules. A CIMSPL policy rule follows the following schema: import ‹MOF Name›::: policy{ declaration { } condition { } decision { } }
Each CIM-SPL policy is written under the scope of a single CIM Object referred to as the anchor object of the policy. All other CIM Objects referenced by a CIMSPL policy should be accessible by traversing CIM associations starting from the anchor object of the policy. The CIM Class specification in the import statement unambiguously defines the class of the anchor object for the policy. CIM uses the Managed Object Format (MOF) to publish class schemas. The import statement will have the name of a MOF file followed by the name of a CIM class described in that MOF file. There is an optional Condition that any anchor object must satisfy before a policy is evaluated on the object. This condition is a simple conjunction of equalities and inequalities expressed in terms of the properties of the class. All unqualified identifiers referenced in the rest of the policy refer to the properties of the anchor object. To refer to the anchor object itself, we either use the reserved identifier self or we use the name of the CIM class directly. Other objects can be accessed by traversing associations. The declaration section is optional and defines macros and constants to simplify notations in the condition, in case the long CIM names make it cumbersome to write simple expressions. The condition and decision sections correspond respectively to the if-condition and the then-action clauses of a policy rule and are discussed next.
84
Chapter 4
•
Policy Languages
4.3.1.1. The Condition Section
The condition section of a policy rule contains a Boolean expression similar to a Java boolean expression. We allow the standard boolean operators, namely AND, OR and NOT. It also allows the comparison of arithmetic expressions using , ==, ≠, and so on. In these comparisons, it allows arithmetic expressions on each side involving operations such as +, –, *, /, and so on. It can also compare strings and provide many predefined string and calendar operations. In general, subexpressions of any CIM intrinsic type can be part of a condition, as long as the overall condition is a boolean expression. The following is an example of a condition that uses two properties from the CIM_LocalFileSystem class: import CIM X XX XXX::CIM_LocalFileSystem policy { condition { AvailableSpace/FileSystemSize < 0.15 } decision { . . . } }
Because CIM LocalFileSystem is specified in the import statement of this policy, the anchor object for it should be an instance of CIM LocalFileSystem. In other words, the policy server will evaluate this policy against an anchor object that is an instance of CIM LocalFileSystem. The policy condition would be true if the available space provided by the AvailableSpace property of the anchor local file system divided by the total file system size provided by the FileSystemSize property of the same file system is smaller than 0.25. 4.3.1.2. The Decision Section
As mentioned earlier, an action in a decision will most likely be a method invocation on an object. The following is an example of such an invocation: import CIM X XX XXX::CIM_StorageConfigurationService policy { condition { . . . } decision { self.CreateReplica(. . .) } }
In
this
example,
is a method of the CIM_ StorageConfigurationService class, and it will be invoked if the condition part of the rule is true. CreateReplica
CIM-SPL
85
In addition to the basic action invocations, CIM-SPL allows composition of basic actions into complex actions similar to that of the Ponder policy language [POND]. In Ponder, basic actions can be combined using two types of control structures: • Serial execution—There are two modes in the serial execution. In a sequence of two actions, execute the second action only if the first action has been completed successfully or the second action is executed only if the first action has failed. • Concurrent execution—Similarly, there are two modes of execution in the concurrent case. In a block of actions, execute all actions concurrently or execute at least one action in a block. These execution modes are specified as part of the action statement. Finally an action in CIM-SPL can in turn invoke another policy. This is usually referred to as cascaded policies. The invocation occurs simply by using a reference to a CIM Policy object in place of a basic action in the decision. The call must have as its argument a collection of objects. There will be an evaluation of the specified policy for each object, using the object as an anchor object. The class of these objects must match the class specified in the import statement of the invoked policy. 4.3.1.3. Data Types and Operators
CIM-SPL supports all intrinsic data types defined by the CIM Meta Schema and one-dimensional arrays of an intrinsic type. CIM-SPL has all conventional arithmetic, boolean, and casting operators along with operators for string and date-time manipulation. CIM-SPL also has operations that apply to arrays, such as finding the minimum and maximum elements of an array, checking whether a given value appears or does not appear in an array, and computing averages, medians, and so on, of elements in an array. A unique feature of CIM-SPL is that it has built-in operators that allow traversal of CIM associations. One such operator is the collect operator with the following signature. collect( , , , , , )
86
Chapter 4
•
Policy Languages
As an example, assume that the variable mySwitch has a reference to a SAN switch, and we want to collect references to all fibre channel ports2 in that switch in an array of references. We can use the collect operator as follows: collect(mySwitch, CIM_SystemDevice, GroupComponent, PartComponent, CIM_FCPort, true)
In the CIM model, a SAN switch and a fibre channel port in that switch are associated with each other through an instance of the association CIM_SystemDevice. The CIM_SystemDevice association has two properties: GroupComponent, which is a reference to a CIM_System (in this case, the SAN switch) and PartComponent, which is a reference to a CIM_LogicalDevice (in this case, the fibre channel port). Thus, the first argument of the collect operator is the root of the traversal, and the second argument is the traversed association. The third argument is the name of the property that holds a reference to the first argument, and the fourth argument is the name of the property that holds a reference to the other end of the association traversal. Thus, the collect expression given here will collect all the instances of CIM_SystemDevice that have a reference to the mySwitch object in the property GroupComponent. It will then collect references pointed to by the PartComponent property in all such instances in an array. The fifth argument of the collect operator specifies the class of the collected references, and can be null if there is no ambiguity concerning the possible classes of the collected references. Finally, the last argument specifies a Boolean expression that must be satisfied by objects referenced by the elements of the resulting array. The last argument acts as a filter to exclude certain references from being returned. For example, if we were only interested in ports of a switch that have unknown type, then we could use the following expression: collect(Self, CIM_SystemDevice, CIM_GroupComponent, CIM_PartComponent, CIM_FCPort, PortType == 0)
By composing several collect operations, we can traverse multiple associations and collect system components that are related to a single root instance by complex relationships.
CIM-SPL
87
4.3.2. Policy Groups Effective policy management requires hierarchical organization of policies. As described earlier, a policy hierarchy may be constructed for several reasons— for example, to group policies by functionality; to group policies by means of the components and subcomponents to which they apply; to group policies by their relative importance, and so on. A policy group consists of policy rules and other policy groups. In CIM-SPL, the definition of a policy group follows the following schema. import ::: strategy then include traffic for monitoring. If and < application == yyy> then do not include traffic for monitoring. These policies need to be translated into a set of configuration statements that can be provided to the different servers. The self-configuration architecture shown in Figure 6.5 consists of a policy definition tool, a policy repository, and a policy agent that runs at each of the servers. The policy agent on each server is responsible for retrieving policies that are relevant to the server it is running on, and for converting them to the configuration of the local server that is used for marking the packets outbound from the server. The task of the policy definition tool is to convert the classification policy specified for each customer to a set of lower-level policies that can be consumed by policy agents and used to automatically configure the marking
Example in Hosted Server Environment
135
behavior of the servers. The policy definition tool consists of three components. The first one takes the input for the classification you saw earlier; the second keeps track of the different servers that belong to each customer; and the third component translates the policies provided by the user to a set of lower-level policies that can be converted readily into the configuration of the different servers. The lower-level policy has to specify the precise directions for the different packets that need to be marked by each of the servers. Typically, such policies would consist of the format If and then mark packet header with 0x01 If then mark packet header with 0x02 The software on the server can use information on the packet headers, such as the local port, the executable used to invoke the process sending the packets, remote destination and port, or the transport protocol being used in a packet to make the right marking. These conditions and actions can be expressed as part of a policy representation—for example, as a derived part of the CIM information model described in Chapter 3. The task of translating from the higher-level policy to the lower-level policy is performed by the translation module of the policy definition tool. The translation is done by determining the set of applications that need to be marked different from the default. Each application is associated with a profile that describes how packets belonging to that application can be identified—for example, by using a specific port or port range, by using a specific executable, or by sending packets to a specific destination (such as the management subnet in the hosting center). The translation module simply has to translate the application profiles into a set of lower-level policies. The translation module groups all the policies belonging to a specific customer into a single set of policies. The lower-level policies are converted into a canonical format and stored into a policy repository. Such a policy repository may be implemented in many ways, common techniques include using a directory accessed by the LDAP (Lightweight Directory Access Protocol), a database accessed using a remote procedure call, or a Web server with specific applications. The agents running on the server are provided the identity of the customer who owns the server. Each agent contacts the policy repository, and obtains
136
Chapter 6
•
Policy-Based Configuration Management
the policies defined by the customer. Using the terminology of the CIM model, the identity of the customer is being used as the PolicyRole, which is used to select the policies applicable to the server. The policy agent then converts the low-level policies into a specific configuration of local machine.
6.4.2. Variations on the Architecture The preceding policy architecture is but one implementation in which policies can be used to determine the amount of bandwidth used by different customers. The marking may be done by means of an external box in some cases— for example, when all applications can be readily identified by fields carried in the Internet Protocol header. This confines the changes to a single device for each customer, but requires an additional device per customer. Instead of using a dedicated agent on each of the servers, the policy definition tool can generate the right configuration of each server locally and send it to the right server. This step eliminates the need to write special software for each server. This can also be augmented with an elimination of the policy repository— instead of storing the policies (or the configuration) in a repository, the policy definition tool simply provides all the configurations directly to the servers. Another variation would be to automatically throttle the traffic emanating from a customer when the bandwidth limits are violated. This requires the ability to limit the traffic emanating from each customer subnetwork, either by putting a rate controlling device at the egress point of the network or by using a rate control function at each of the servers. The amount of permissible bandwidth can be determined on the basis of policies. A policy may indicate that a customer subnet be throttled once the maximum possible bandwidth is reached, or that the customer be notified and may need to pay more for exceeding the maximum allowed bandwidth. In all of these variations, policies defined at the higher level, where they are easier to specify, are used to generate and configure devices. The configuration of the lower-level devices, which can be many, is largely automated, and the approach can therefore result in significant simplification of the management complexity in hosted servers.
Summary
137
6.5. Summary This chapter presented the general problem of configuration management in large-scale distributed systems and motivated the use of policy-based configuration management in such systems. Policy-based configuration management can be used to reduce configuration errors leading to improved availability, performance, and system security. Policy-based configuration management can be used to automatically configure devices using high-level user-specified goals, to fine-tune configuration parameters of a system, or to ensure that the system configuration follows known best practices. We presented general steps to design a policy-based configuration checker for storage area networks (SANs). For more details, we refer the reader to [AGLV]. Due to the complex constraints imposed on a SAN deployment, enabling automated verification of SAN configurations provides a substantial value to the administrator who needs to validate the correctness after addition of a device or configuration change, or before a new plan is deployed. Although conceptually simple, designing a configuration checking system teaches us various practical problems that occur when designing a policy-based management system. They include collecting meaningful English language policies, translating them into concrete policy rules, encoding them using a proper information model and an unambiguous policy language and data representation, collecting necessary information for policy evaluation, and defining the interaction of policy enforcement module with the managed system (that is, the SAN).
Endnotes 1
We will use server as well as host as a synonym for host server.
This page intentionally left blank
Chapter 7
Policy-Based Fault Management
T
his chapter describes the use of policies for managing faults in the IT systems. A fault or an error in an IT system disrupts the continued and efficient operation of the system. As components in an IT system become more interdependent on each other for their operation, a single fault in a component may cause many other components connected to the component to experience problems in their operation. Consequently, even a single fault may cause a large number of alarms to be generated that can easily overwhelm the operator. The task of a fault management system is to take the input alarms raised by various system components, diagnose the underlying root cause fault for the alarms, and then take corrective steps to fix the fault. Policies can be used in a variety of ways to simplify and automate some of or all the tasks in fault management in IT systems. In this chapter, we will look at the application of policies into the different aspects of fault management. We begin with an overview of how fault management is performed in an enterprise system and a large telecommunications network1.
7.1. Fault Management Overview The general approach toward fault management can be viewed logically as conforming to the loop shown in Figure 7.1. The first stage of this loop consists of monitoring the available information from different components in the system. The monitored information will include various error and diagnostic
139
140
Chapter 7
•
Policy-Based Fault Management
information that may be generated by different components of the system. This task can be performed via different types of agents developed for specific managed elements. For example, a system administrator may have an agent to monitor the health of an application or a series of applications in a transaction system. Another type of agent may try to probe the availability of a device or a certain port via ping or Telnet. Also there can be agents to query SNMP MIBs databases [SNMP], or listen to system trap events, or system error events. In the second stage of fault management, the monitored information is converted into a canonical format that is amenable to analysis. In general, there is little consistency in the format of different agent output. For example, an agent may generate a simple comma-separated value format file in which each column may have agent-specific semantics. For different monitored data from different data sources to be analyzed and combined, it needs to be transformed into a common format. In the third stage of the cycle, all the gathered information is analyzed. Analysis of the monitored data consists of removing duplicate alerts, reducing the volume of the monitored information to the maximum extent possible, correlating various pieces of information to each other, comparing them with known patterns, and determining the root cause of the various alerts that have been generated. To remove duplicates, we need to have a unique ID that can indicate a set of alerts is actually introduced by a single fault. To diagnose a root cause in a networked system, it is commonly necessary to understand the underlying topology. As a simple example, if all the devices behind some gateway cannot be reached at about the same time as the gateway became unreachable, one can suspect that the gateway may have gone down, instead of all the individual devices and the gateway having problems. In the fourth stage of the fault management process, remedial steps to correct the fault are determined, and the remedial steps are undertaken. This may require manual processes such as calling a local support person to look at the problem, or remotely logging in to a server or a router to troubleshoot.
Figure 7.1
Monitoring
Storing
Remedying
Analysis
General Process of Fault Management
Fault Management Overview
141
Before the advent of policy-based fault management, it was a common practice to manually perform many of the functions in these stages. This was especially true for the analysis and remedy determination stages. A policy-based fault management approach can automate the process to some extent in all the stages, and provide for the structure of a system that can automatically react to component failures and other types of fault management tasks. In actual products, the logical process described earlier for fault management can be implemented in different manners. In particular, the use of policies to simplify and automate the process of fault management is dependent on the details of the implementation. In this chapter, we describe two representative implementations of fault management products to illustrate such differences. The first implementation will be that of fault management in computer networks, and the second one will be for managing faults in web systems.
7.1.1. Fault Management in Networks The first example is that of large-scale network management, which is geared towards managing failures in the operation of large-scale enterprise networks or telecommunications networks. The typical fault management setup in such environments is shown in Figure 7.2. In this scenario, the monitoring stage is implemented by a variety of software modules called probes. Each probe is capable of reading a given network device, or more precisely a set of network devices that support an interface implemented by the probe. A single probe may be able to read any fault information available in the standard device MIB, whereas another probe may be able to read proprietary fault information provided by a given class of Cisco switches, and a third probe may support an interface to retrieve and parse the system log files from another class of devices. In practice, there are several types of probes, including ping probes, file probes, trap probes, and sniffer probes. The file probe parses specific files related to logs and system information. The trap probe listens for SNMP traps sent by devices, such as cold start of a device, link up or down, authentication failure, or EGP (exterior gateway protocol) neighbor loss. The sniffer probe may extract IP addresses or MAC addresses and their class of service information from packet inspection. These probes can work with other agents, for example, a discovery agent for the ARP cache to determine the IP address to MAC address mapping.
142
Chapter 7
Graphical User Interface
•
Policy-Based Fault Management
Analytics
Data Store
Analytics Deduplication
Analytics Format Conversion
Probe
Figure 7.2
Probe
Probe
Fault Management in Communication Networks
On top of that, the network management tool can discover various types of network information using protocol-specific probes. This level of information includes Ethernet switch connectivity, layer 3 connectivity, ATM (Asynchronous Transfer Mode) devices, MPLS (Multi-Protocol Label Swapping) devices, NAT (Network Address Translation) gateways, containment information, and so on. The Ethernet switch connectivity can be discovered by switch specific protocols (for example, SRP or Spatial Reuse Protocol for Cisco routers) and standard protocols such as telnet or SNMP (Simple Network Management Protocol). The layer 3 connectivity information can be discovered from different protocols used in communications networks, such as BGP (Border Gateway Protocol) information, routing tables, IP forwarding table, IP backup routes, traceroute tool, frame relay connections, HSRP (hot stand-by routing protocol) or VRRP (Virtual Router Redundancy Protocol). Discovering ATM device connectivity and MPLS devices are typically done via vendor-specific protocols. NAT gateway discovery is enabled by downloading static NAT translations via telnet from the device.
Fault Management Overview
143
Containment discovery is also done via vendor-specific protocols such as STP (spanning tree protocol), CDP (Cisco discovery protocol) [CDP], and so on. Probes feed the information into a data store, which may be logically centralized but in fact may be a cluster of hierarchical database systems. For fast event handling, the data store may be implemented as a real-time in-memory database, which is specialized in large scale efficient event processing. Separate from the event database, many network management systems maintain a data store, which contains topology information to build a network model for root cause analysis. When the probes store the collected information to the data store, the format conversion layer changes the information from the native format into the record format of the data store. Prior to storing the data, some event processing may be applied to the monitored data. For example, duplicate alerts will be removed to reduce the volume of records that are entering the data store. Similarly, the network management system may also perform postprocessing after raw data have been stored in the database. This may include associating the physical level events with information from location database or directory server so that the original event is enriched by the postal address of the building where the faulty device is located and the contact information of the person who is in charge of the management of the device, and so on. The information in the data store can be processed by a variety of analytic plugins. Each analytic plug-in is responsible for querying and processing the fault information stored in the data store. The analytic plug-ins can be used for a variety of functions such as correlation of fault information to information stored in other databases (for example, interact with a domain name server to convert all IP addresses into their fully qualified domain names). Also the analytic plug-ins can use the information in the topology database to highlight the most likely cause of certain problems. This can be used to modify the records in the database, or process the events for visual presentation (for example, create new records that highlight areas of a network where more than a specific threshold number of fault events have been received, and suppress alerts on devices that are temporarily inaccessible). The logic used for processing and analyzing the events is dependent on the specific installation of the product and analytic plugins. The analytical plug-ins may also start remedial processes for some of the fault conditions that are amenable to automated remediation. The final component of the network management system is the presentation of the visualization information to the administrator. The system administrator can manually start a remedial action according to the output of the analysis components.
144
Chapter 7
•
Policy-Based Fault Management
7.1.2. Fault Management in Web-Based Applications The second example is that of a management system for distributed Web-based applications where some of the errors in the system are automatically detected and in some cases corrected based on a recipe to ensure continued operation of the application. The IT infrastructure for a Web-based application (shown in Figure 7.3) typically consists of several servers running different software systems. A common scenario is that of a three tier system, with the tiers consisting of Web servers (for example, Apache or Microsoft IIS), one consisting of application servers (for example, Apache Tomcat or IBM WebSphere Application Server) and database systems (for example, IBM DB/2) in the back-end. The Web servers are responsible for handling the HTTP requests from the clients. The application servers implement the business logic to handle interactions with clients, support transactions, and create dynamic Web content. The database systems store all the persistent data related to Web-based applications. In most cases, it is usual to have more than one instance of servers in each tier. In this scenario, faults may manifest in either hardware or software during operation, and corrective actions need to be taken to remedy these faults. Common errors include those due to an improper configuration (for example, misconfiguration during a server upgrade or addition of new racks, a missing or wrong version of a Java class library at an application server, or too few threads provisioned at an application server, or an incorrectly specified value for a parameter at the Web server), errors due to memory leaks, and those due to the hardware failure of one of the servers or the intervening network. The goal of the fault management system is to monitor the error messages that appear in the log files of the different applications, understand the root cause for the error messages, and then take remedial actions. To analyze the messages for a root cause, the error messages from different log files need to be converted into a canonical format. For this, the log collection component periodically collects logs from different servers and database systems. After collecting log information, a converter handles the differences in the syntax of different device and software logs, and stores them in a common base event (CBE) format. The common base event format is a canonical representation of different types of error notifications generated by different software components. The canonical event stream is then analyzed by a symptom analyzer which uses a symptom database as its knowledge base. The symptom database is an index that stores the event patterns seen in a system due to a potential set of
Policy-Based Fault Management
145
underlying causes. The symptom analyzer matches the events that have been observed to the event pattern stored in the symptoms database and selects the most likely cause for an error message, along with a recommended corrective action such as changing the configuration of a system, or restarting a system. These corrective actions are implemented by a system configurator.
Web Server
System Configurator
App Server
Log Collector
Common Base Event Converter
Database Symptom Symptom Analyzer Analyzer
Figure 7.3
Symptom Database
Fault Management in Autonomic Computing
Relating this Web application fault management system back to the logical architecture in Figure 7.1, we can see that the log collector provides the monitoring part of the logical step. The common base event converter provides the functionality of the canonical conversion step. The symptom analyzer provides the analysis layer, whereas the remedying step is performed by the system configurator.
7.2. Policy-Based Fault Management There are many possible applications of policy in the effective management of faults in IT systems. Policy can be used in all stages of the fault management process for a variety of functions, such as:
146
Chapter 7
•
Policy-Based Fault Management
• Managing the process of monitoring the fault information • Controlling the conversion of fault information into a canonical format • Reducing the number of events that need to be processed and enhancing the monitored information with the help of other data sources. • Influencing the approach used for diagnosing root cause of failures • Controlling the manner in which remedial actions are performed In the next subsections, we will give a brief description of how policies can be used in each of these functions.
7.2.1. Policy-Based Acquisition of Fault Information The acquisition of fault information is the first step in the fault management process. The acquisition process needs to obtain information from a variety of devices, some of which may support standards such as SNMP to provide information, some of which may have applications that provide fault information in application logs, and some of which may have agents that can asynchronously send fault information to the management server. Assuming that we have the right set of software components that can retrieve the fault information from the different devices, the role of policies is to determine when and what type of information ought to be collected for managing faults in the system. Policies used for data acquisition would tend to be defined at the lowest level of the different layers of policies defined in Chapter 2, “Policy Lifecycle—Creation, Distribution, and Enforcement.” They would define constraints on when specific information available in a device is related to a fault and ought to be acquired. Data acquisition policies may also provide for limits on how much of the available throughput of the network should be used for data acquisition purposes. As a result, policies defined for data acquisition may either be metric constraint policies or action policies. Some examples of policies that will be specified for information acquisition include the following: A data acquisition component (for example, a probe) should not use more than 2% of maximum network bandwidth available. The data acquisition from application servers in subnet 9.2.10.0–9.2.10.255 should be done at least once every hour. SNMP-based probes should not be used for network information from devices such as servers and Windows.
Policy-Based Fault Management
147
Only error information of severity critical should be obtained from 9AM–5PM on the last date of the month in subnet 9.2.10.0–9.2.10.255, which is used by accounting for end of month data fulfillment.
The policies used for fault data acquisition can thus be described into the two classes: • Resource Management Policies: These policies would be metric constraint policies that limit how much resources are used for data acquisition or how frequently to acquire data. They may also define different constraints for different areas of the network or computer system being monitored or for different times of day. • Probe Selection Policies: Some devices provide multiple interfaces for providing monitored information and may have a preference for one type of probe or data acquisition system. Probe selection policies specify the data acquisition mechanism used for obtaining fault information from the different devices. For each of these two classes, policies can be defined as extensions to the CIM policy model using extensions to the PolicyCondition and PolicyAction classes defined in the information model. These can then be represented in a policy language of choice (for example, CIM-SPL or Ponder), and used to enforce them in a software-packaged module.
7.2.2. Policy-Based Format Conversion The diversity in the nature of fault information produced by devices is one of the leading causes of complexity in systems management. The content and format of the fault information produced by each different implementation of a device or a software system can vary greatly. Furthermore, in real life, it is common to have custom-made applications, which generate their specific types of logs containing fault information. As a result, information gathered tends to be in a large number of formats and needs to be converted to a canonical format. Policies can be used to automate the format conversion process. Policies used for format conversion tend to be defined at the lowest level of the different layers of policies defined in Chapter 2. Policies defined for format conversion would typically be expressed as action policies. Some examples of policies that can be specified for format conversion include the following: If the CPU workload exceeds 40%, and severity of fault is less than critical, conversion can be a low priority task.
148
Chapter 7
•
Policy-Based Fault Management
If the error message type indicates “malformed request,” then the message can be discarded. If the message is from a device with IP address in range 9.2.10.0–9.2.10.255, use format conversion routine A, otherwise use format conversion routine B. If the time-of-day stamp is not present in the incoming data, then use the current time of day at the receiving server.
The policies used for format conversion can thus be described into many different classes: • Performance Management Policies: These policies would indicate what to do when the number of requests coming into the management system is excessive. They may defer conversion of some policies, or they may require moving some requests to another instance of a management system. • Error-Handling Policies: These policies would define what to do when some of the incoming fields in a request are not valid or improperly formed. These would also define what corrections, if any, ought to be applied to the data that is missing. • Discard Policies: These policies would define the conditions under which the incoming alert messages can be discarded. Incoming error notifications may be discarded in a variety of conditions—for example, if a notification is a duplicate of another, or a notification is just a regular status update message. These policies are a special case of volume reduction policies that will be described in the next section. • Selection Policies: These policies would identify the types of conversion routines to be applied. Conversion routines would differ depending on the type of device from which the information is obtained. These policies would associate some identification of the device that generated the error message (for example, IP address or application type) and then select the type of conversion procedures to be invoked based on that identity. From an architectural and lifecycle perspective, the implementation of the format conversion policy would be relatively simple. Among the different stages of the policy lifecycle described in Chapter 2, the most significant components in the conversion process are the task of policy creation and the task of policy
Policy-Based Fault Management
149
enforcement. In most implementations, conversion policies are used on a central server, so the policy distribution aspect is trivial.
7.2.3. Policy-Based Event Volume Reduction Most system components will generate fault events when they encounter unusual conditions that prevent them from operating as expected. Event volume reduction is a process that reduces the number of fault events that need to be stored, processed, and analyzed by the fault management system. The goal of event reduction is to drop redundant or duplicate events and possibly consolidate multiple events into a single one. Policies for event reduction would tend to be defined at the lower two levels of the policy layering architecture described in Chapter 2. They would define conditions under which an event may be dropped or combined with others. The reduction in volume may happen before events are stored in the database or it may be performed as an analytics plug-in. Regardless of the exact location where volume reduction is to be done, policies for event reduction would tend to be action policies and a few examples of such policies are provided next. Any event of type t whose fields of node-id, event-type match with an existing record, and the time-of-day stamp is within one minute of existing time-of-day stamp should be discarded. If more than five records indicating system utilization has exceeded 75% are received, replace them with a single record marking the earliest time and the latest time of such records. If a node has reported the failure of its neighbor, suppress the failure notification of same node by other neighbors. Replace each sequence of event-type 1, event-type 2, and event-type 3 occurring within 30 seconds with an instance of event-sequence type 4.
The policies used for volume reduction can thus be described into the following classes: • Duplicate Elimination Policies: These policies identify when two events can be considered duplicates of each other. These tend to be action policies where the condition part identifies the fields of the record used to determine whether an event is a duplicate of the other, and the action part specifies how the original event record must be updated.
150
Chapter 7
•
Policy-Based Fault Management
• Event Combination Policies: These policies define the conditions under which events ought to be combined and reduced. The policies define which patterns of event sequences can be combined. The combination of multiple events may result in a new event. In general, event volume reduction policies can be used at an event correlation module, which may allow the user to define sophisticated event correlation rules. These correlation rules can be used as an evaluation trigger, which will either eliminate duplicate entries, build a relationship between multiple events, or combine several events to generate a single event.
7.2.4. Policy-Based Root Cause Analysis The root-cause analysis (RCA) process deals with the task of observing a sequence of events and understanding which fault may have caused the observed sequence of events to be generated. There are several mechanisms that can be used to develop a root cause analysis system, each with its own set of advantages and disadvantages. The symptom database referred to in the beginning of this chapter in the autonomic computing implementation of fault management uses such an approach. The rules are typically specified in the form of a decision-tree logic that specifies when and which event in a sequence represents the root cause. A second approach for root cause analysis is to use a codebook approach. A codebook identifies event patterns which correspond to a root cause. Thus, the task of a codebook-based root cause analysis engine is to search for correlations of the observed event sequence to the event patterns specified in the codebook and declare a root cause if the correlation exceeds a certain threshold. Both approaches, rules and codebook, can be considered specialized forms of root cause analysis policies. Other root cause analysis techniques use system topology in addition to the observed event sequence to find the root cause. For example, in the network management domain, IBM Netcool product suite [NCIM] stores the network topology tree as seen from the perspective of a monitor that sends regular pings to various network devices. If the pings to all devices in the topology subtree fail, then the root cause is declared to be the root node of the network subtree. The policies used for these root cause analysis techniques can be put in the following classes: • RCA Engine Firing Policies: These policies specify when should the root cause analysis engine be activated. The root cause analysis engine may
Policy-Based Fault Management
151
be activated according to a schedule, or it may be triggered by the arrival of a specific event or event pattern. The RCA engine may also be fired as the action of another policy in the fault-management system. • Device Selection Policies: These policies specify the devices examined in a particular round of analysis by the engine depending on what triggered the analysis engine. The root cause analysis engine evaluates all events coming from the set of devices specified by the device selection policies and characterizes them as either a dependent or a root cause event. • Processing Policies: These policies specify how the incoming event stream should be processed in the view of the output from the RCA engine. After the RCA engine marks an event either as a root cause or as a dependent event, different actions have to be taken to modify the event so that its further processing and distribution to the operators can be controlled. Processing policies may specify that events be dropped (similar to the event volume reduction policies discussed earlier) or be modified in certain fields. The processing policies may also specify that the RCA engine should be fired again with another event as a trigger, or the processing policies may generate additional events that are inserted in the event stream.
7.2.5. Policy-Based Remedial Action After a fault management system has identified a remedial action to be performed, policies have an important role to play in the automation of the remedial action. Common remedies for fixing faults in systems include the reboot or restart of a device or application, upgrading the software of an application, and changing the configuration of a device. The remedial actions may be determined manually or by using an automated root cause analyzer that can recommend a remedial action. It is instinctive to think that after a remedial action has been determined, it ought to be applied as soon as possible. However, this is not the case. As an example, a common problem in many applications and some network devices is the presence of a slow but persistent memory leak. As a result, the performance of the application or the network device will gradually degrade over time, resulting in error alerts from any performance monitoring system that may be installed. The corrective action in these cases may be the reboot of the affected device. However, such a reboot may not be advisable immediately when the problem is first diagnosed because doing so may disrupt the operation during the peak business
152
Chapter 7
•
Policy-Based Fault Management
hours. It may be better to let the affected system run at degraded performance during the peak hours, and then reboot it during off-peak hours. Similarly, some remedial actions may be done only after obtaining a manual intervention or a required set of approvals. A policy-based system allows a system administrator to specify conditions under which a remedial action is deferred so that the remedial action can be performed without adversely affecting operations. Policies impacting remedial actions would tend to be action policies or alert policies. The typical action would be to enable the operation to be undertaken immediately, enable the operation to be undertaken after some amount of time, or to send an alert notification to another system. The condition part of the policy would include attributes such as time of day, the type of remedial action being invoked, and the category or properties of the target system on which the remedial action is being taken. These policies can be used to determine whether a remedial action is safe to take. As an example of using policies for undertaking remedial action, let us look at the use of policies in controlling the remedial action of rebooting servers during different time-periods. Assuming that we are using the CIM-SPL policy language to represent the policies for remedial action, one possible set of policies may look like the following: policy { ;; Policy Rule 1 condition { ServerGroup == WAS && Operation == Shutdown && TimePeriod == WorkHour} decision { DisAllow} } policy { ;; Policy Rule 2 condition { ServerGroup == DB/2 && Operation == Shutdown && TimePeriod == WorkHour} decision { Defer(4 hours) } } policy { ;; Policy Rule 3 condition { ServerGroup == DB/2 && Operation == Upgrade && TimePeriod == WorkHour} decision { Allow } }
In the rest of this chapter, we will describe a representative architecture of policybased fault management of telecommunication networks in more detail as an example.
Architecture of a Policy-Based Fault Management System
153
7.3. Architecture of a Policy-Based Fault Management System Figure 7.4 shows a representative architecture of a policy-based fault management system used for monitoring large telecommunication networks. We note that this architecture illustrates a logical structure of a network management system in general without referring to a specific network management solution. Unlike the SAN configuration checker discussed in Chapter 6, “Policy-Based Configuration Management,” this architecture is harder to logically arrange in the IETF/DMTF Policy Architecture. One reason is the pervasive use of policy in almost all aspects of fault management as described earlier in this chapter. The fault management system uses a variety of probes and monitors to gather fault management information from the underlying network. The probes may gather data from network devices, log files, databases, or through programmatic APIs such as CORBA or web-services interfaces. Probes are run centrally on a server that is co-located with the event server. As discussed earlier, the probes need to be configured appropriately so that they do not consume too many resources on the server and network. They also need to be configured appropriately so that each information source of interest in the network gets probed by an appropriate set of probes. Finally, the probes need to perform format conversion so that the collected information reaches the event server in a canonical format. For example, the event server may require that each event has a “summary” field. The probes may take multiple fields of the underlying information gathered from a device (say the name of the enterprise that the device belongs to and the device status) to generate a summary of the event. All these configuration tasks can be made less tedious and less prone to error by using policy-based management as discussed earlier in this chapter. To perform the automation and root cause analysis, the fault management system needs to know devices that exist in the network and how they connect to each other. This is done via discovery plug-ins. The discovery plug-ins are capable of querying various network devices using standard interfaces and protocols (for example, telneting into a router and gathering configuration information, using SNMP traps [SNMP], or a proprietary protocol such as CISCO discovery protocol [CDP]). These plug-ins continually gather data from the network and feed it into a topology builder that stitches together all the discovered data to create a network topology model.
154
Chapter 7
•
Policy-Based Fault Management
Network Admin
Event List
Network Visualization Management Policy
Topology Model
RCA Data Store
User Interface
Event Server/Processor
Probes, SNMP, Ping Network Discovery
Data Source Adapter
Third-Party System
Managed Network
Figure 7.4
Representative Architecture of Fault Management System
The centerpiece of this architecture is the combination of an event server and an event processor. The event server gathers events generated by probes and other data sources, and it serves as the centralized repository of all important events from the fault management perspective in the network. The event processor enriches existing events in the event server, generates new events or takes actions based on the event patterns observed in the event server, creates relationships between events stored in the event server, and so on. Clearly, the event server needs to be customized to suit the networking environment in which it is deployed. This is done by extensively using policies. The root cause analysis engine of the fault management system uses both the event stream and the topology model to correlate events and find the root of observed event patterns. As discussed earlier, the topology-based RCA engine uses firing policies to decide when to start a round of root cause analysis, and
Architecture of a Policy-Based Fault Management System
155
device selection policies to find out which devices are examined in a particular round of analysis. Finally, it uses policies to correlate events to each other using the topology information and feeds the enriched events (for example after marking events as root cause or dependent events) back to the event server. The system administrator interacts with the system via a web-based GUI. There could be multiple system administrators for a large telecommunication network. Each system administrator has a role assigned that determines their view of the fault management system. For example, a system administrator may be responsible for troubleshooting all routers of a particular manufacturer in the network. Another system administrator may be responsible for the overall health of the network in a particular geographical region. The customization of the administration GUI is again done by using policies. A different view of the topology and associated events is shown to each system administrator according to the defined role of each. Similarly, each system administrator can specify policies that control how probes, discovery, event processor, and RCA operate within the confines of their administrative domain. Finally, we note that a large telecommunication network may generate tens of millions of events each day, overwhelming the centralized event server and event processor. In such cases, a tiered architecture may be deployed to reduce the event volume that reaches the top-level event processor while covering a large scale of network deployment as shown in Figure 7.5. In this case, in the first tier, multiple instances of the event server and processors are deployed on the regional basis and event processor works to reduce the volume of events going into the centralized server. The first tier can be very effective in several tasks such as event reduplication, event enrichment from third-party data, and combining multiple events into a single event. The regional event server then feeds the national event servers, which finally feed their output to a centralized event server and processor instance. The communication between different event servers is done by using event gateways. The event gateways are customized by using configuration policies which specify which events should be sent to the next tier. Thus, policies are pervasive in the fault management system currently being used in the management of large telecommunication networks. Unfortunately, many of these management systems do not take a systematic approach to policies as outlined in the earlier chapters of this book. For example, it is all too common for multiple system administrators to write conflicting policies to control the behavior of probes or event processors. This results in unexpected behavior that can take days to debug. Current fault management products are now being enhanced by various policy technologies to remove these inefficiencies.
156
Chapter 7
•
Policy-Based Fault Management
Event Server/Processor
Gateway
Event Server/Processor
Event Server/Processor
Gateway
Gateway
Event Server/Processor
Probes, SNMP, Ping
Figure 7.5
Event Server/Processor Event Server/Processor
Probes, SNMP, Ping
Probes, SNMP, Ping
Event Server/Processor
Probes, SNMP, Ping
Scalable Event Handling by Fault Management System
7.4. Summary This chapter presented some of the different applications of policies in the task of fault management in IT systems. Policies can be used to simplify, coordinate, and manage several steps in the process of monitoring and detecting faults, converting the monitored data into a canonical formation, analyzing the problem using various analytic plug-ins and logics, and taking remedial actions. We illustrated policy-based fault management using various examples including a large-scale network management system that may be used to monitor an enterprise-wide network or telecommunication networks. We also looked at the fault monitoring problem in a multitier Web-based application environment where certain symptoms (for example, entries in log files) can be searched and matched to known problems to provide a hint to the system administrator for a corrective action. The use of policies can automate many of the tasks in the logical fault management process, and help attain the vision of a self-correcting system that automatically handles issues arising due to failures or errors in its operation.
Endnote 1
For more description of policy based networking, see [VERPN, STRA].
Chapter 8
Policy-Based Security Management
S
ecurity is a wide area, covering several aspects related to the operation of computer systems and networks. It deals with three basic aspects of computer systems operations: confidentiality, integrity, and availability.
Confidentiality is the requirement that a piece of information stored on a computer system, or a flow of information among computers on the network, be available only to users authorized to access that information or the flow. To preserve the confidentiality of information, computer systems and networks need to provide mechanisms for access control, authenticating the identity of entities accessing the network, and mechanisms for encrypting any information stored on computers or being transmitted on the network. Integrity is the requirement that a computer system or network is operating in a mode and performing the functions expected of it by the administrator and authorized users. It requires that no unauthorized changes have been made to a system’s configuration, installed software, or processes running on the system. Availability is the requirement that the computer system is accessible and can perform the functions that are expected of it by authorized users. In other words, availability is the task of ensuring that no one is intentionally denying the service offered by the system to authorized users. Policies are ubiquitous in security management. A security policy defines what it means for an administrator that a system or an organization is secure. For a computer system, a security policy defines the constraints on the behavior of
157
158
Chapter 8
•
Policy-Based Security Management
the system elements, constraints on the offered functions, and constraints on the data flow among them so that certain security criteria are met. A security policy would define constraints on the confidentiality, integrity, and availability of the various elements of a computer system. In this chapter, we first provide a brief overview of security management for computer systems and networks, and then discuss various applications of policy technologies to simplify the security management of computer systems and networks.
8.1. Overview of Security Management Computer security deals with providing the three features of security— confidentiality, integrity and availability—in the operations of computer systems and networks. Providing for these aspects requires having processes, software, and hardware systems that can work together to enable these three features of security in all the information, services, and applications present in the environment. As part of security management, a computer systems or network administrator needs to perform various tasks related to security management. We list some of them next. Confidentiality-Related Tasks: • Provide support for authentication: That is, have a method by which anyone using the system can be identified correctly. This requires the task of ensuring that means are provided to assign users unique identities, and that mechanisms exist for authentication—validating that the person claiming to have some identity is indeed the person with that identity. • Provide support for access control: That is, have methods by which computer system resources and information are accessible to only those users authorized to access that resource or information. • Encryption: That is, have methods and means by which information stored on a computer system (for example, on a disk or file) is protected if it accidentally falls into the wrong hands, and prevent any eavesdropper from listening in on information exchanged by two parties on a network.
Policy Applications in Security
159
Integrity-Related Tasks: • Malware prevention: That is, have methods and means by which any viruses, Trojan horses, or malicious components of software and hardware are prevented from being installed on the computer systems, and detected if they managed to get installed on the system. • Intrusion Detection: That is, identify attempts by unauthorized entities that violate or abuse the privileges provided by the access control and security mechanisms to enter the system. • Compliance: That is, have methods and means by which any guidelines, policies, or constraints imposed on the system are enforced properly, and methods to validate the system to be complying with the same. Availability-Related Tasks: • Prevent Denial of Service Attacks: That is, have methods and means by which any attempts to thwart the availability of service are prevented. Although the preceding list is not a comprehensive catalog of all the tasks related to security of computer systems and networks, it provides a general sense for the complexity and magnitude of problems that are associated with the secure operation of systems. Security management is closely related to policy management. In all the preceding tasks, policies can be used to manage different aspects related to the security of distributed computer systems and networks. In the next section, we look at some of the ways in which policies can ease the task of security management in the various scenarios.
8.2. Policy Applications in Security Policy-based schemes can be used for a variety of purposes when managing security. Some of the common applications of policies for managing security aspects of computer systems are as follows: • Policies are defined to determine access control guidelines about who can and cannot use the system. The system then enforces the policy controls into the configuration of the security gateways deployed within the network.
160
Chapter 8
•
Policy-Based Security Management
• Policies are defined for high-level guidelines on organization of system behavior. These policies are then translated to underlying access control guidelines. • Policies are defined to declare how the system administrators and system elements should react to an external threat. The system then uses these policies to respond automatically to any detected security threats. • Policies are defined to specify the level of security required for communications among different elements of a computer network. The system then automatically enforces such communication when secure information flows need to be created. These applications in specific contexts are described in more detail in the following subsections.
8.2.1. Policy-Driven Access Control Although access control policies in general define restrictions on who can access specific devices in a network, we will consider the application of access control policies to the case of web-based resources accessed via an intermediate proxy responsible for maintaining and enforcing the security policies. The concepts of access control presented in this example can be generalized into many other instances where access control needs to be managed. The specific instance of access control that we consider is shown in Figure 8.1. It shows a data center that comprises different applications that are accessible to users via an external network such as the Internet. These applications are referred to as resources in the figure. The access by users to the resources is facilitated by a proxy. The proxy is responsible for ensuring that each user is able to access the applications he or she is authorized for, and that there are no unauthorized accesses. To ensure that the proper accesses are allowed, the proxy uses an access control list to determine which user is allowed to access which resource. An access list is a set of tuples, where each tuple consists of a user, a resource, and a set of permissions that the user has with respect to the resource.
Policy Applications in Security
161
Resource 1
User A
Internet User B Proxy
User C
Resource 2
Access Control List Resource 3 Data Center
Figure 8.1
Access Control in a Data Center
Access control lists can be viewed as an instance of policies—when you consider the system configuration to be the set of users accessing any resources. An access control list prevents some combinations of users and resources to be occurring in the system, allowing only the combinations permitted by the access control list to be happening. In this sense, the access control list can be viewed as a special case of implementing configuration constraint policy. In real systems, the number of users and the number of resources can be in the order of thousands, and manually specifying the full access control list is tedious, error-prone, and inefficient. As a result, in practice, mechanisms to simplify the specification of access control lists are commonly used. There are two common manners in which access control list management can be simplified: one is through labeling and the other is through role-based access control. 8.2.1.1. Labeled Access Control Models
In the most basic model of access control ([Biba], [Bell]), access to resources is controlled by assigning labels to each resource and to each user. Each label can be viewed as a security classification of the resource. The assignment of a label to the user allows the user to have access to all resources that are assigned to
162
Chapter 8
•
Policy-Based Security Management
that label. As an example, one could classify all resources into the labels of “Unclassified,” “Secret,” and “Top Secret,” and allow users to have the same set of labels. A user with “Secret” label would be able to access all resources that are labeled as “Secret.” In many common usage scenarios, the labels have an inherent order among them. As an example, one can define increasing amount of restriction relationship between “Unclassified,” “Secret,” and “Top Secret” categories. Any user that has a label of “Secret” could access either “Unclassified” or “Secret,” whereas any user that has a label of “Top Secret” could access resources with any of the three labels. Although a label-based scheme is simple and intuitive, it is not effective when we need to support fine-grained access control or when the mapping between the users and the resources changes dynamically. 8.2.1.2. Role-Based Access Control
Role-based Access Control (RBAC) [OSBO] is a widely used access control mechanism. RBAC is based on the concept of roles, which are labels assigned to users and are independent of any labels assigned to the resources. The RBAC model reflects the realization that a subject’s access rights should be based not on the identity of the individual, but the role that this individual plays in an organization. For example, only members of the human resources department will have the right to see personnel records, and only managers can modify them. RBAC would describe the access rights of the members of the human resources department and thus any individual assigned to that department will automatically get those rights. Having the concept of roles significantly simplifies the management of access control policies because adding or removing permissions from a role simultaneously changes the access rights of all users assigned to that role. 8.2.1.3. Hierarchal Organization
Organizing the roles, users, and resources in a hierarchical manner often reduces the complexity of managing access control policies. In general, roles, user groups, and resources can be organized in a tree-like hierarchy often patterning after the structure of real life—for example, all the members in a group have the same role. Access control lists can specify any node in the user hierarchy tree and any node in the resource hierarchy tree when declaring permissions.
Policy Applications in Security
163
The effective access control rules available to any role or user are obtained by combining all the access control policies specified in all the roles between the given role and the root of the role hierarchy tree. At any node, access to a resource may be allowed or denied. The culmination of all of these access control rules is used to determine the access control list of any given specific user.
8.2.2. Higher-Level Access Policies The organization of users into roles, grouping of resources, and creating a hierarchy of roles and resources are all steps toward simplifying the task of access control into higher-level representation that provides an abstraction to the access control in a system. In the actual implementation, this abstraction has been removed for efficiency reasons and the control is implemented as access control lists that can be efficiently processed by computing elements. At higher level of organizations, access control guidelines and restrictions can be provided as broader policies that are specified independent of the actual resources or users and more in terms of application and services. Two examples of such higher-level policies are the concepts of separation of duty and the Chinese wall. Separation of duty is the requirement that different steps of a process or business workflow be done by different users or roles. As an example, let us consider a workflow that implements the request payments for services in an enterprise. The workflow consist of three steps: the initiation of a payment request, the approval of the payment request, and the processing of the request by making the payment to the required person. A separation of duty principle would require the person performing step 1, initiation of the request, to be different from the person performing step 2, the approval of the payment request. Separation of duty enables inadvertent escalation of privileges—for example, separating the duty of users who are accessing resources defined by a set of rolebased access control policies, and the administrator who is authorized to modify the access-control policies, preventing a user from modifying his or her access control policies to attain access to an expanded set of resources. A Chinese wall is a requirement of separating the functions among different parts of an organization. The definition of the Chinese wall policy was motivated by financial applications where a financial analyst must not provide advice to an institution if he or she has insider knowledge of a competitor, but the analysts are free to advise institutions that are not in competition. The Chinese wall requirement can be viewed as a specific case of separation of duty.
164
Chapter 8
•
Policy-Based Security Management
Such higher-level policy guidelines need to be enforced to the operation of the underlying access control mechanisms that exist within an enterprise. They require following a policy architecture similar to that described in Chapter 2, “Policy Lifecycle—Creation, Distribution, and Enforcement,” and use many of the transformation algorithms described in Chapter 5, “Policy Transformation and Analysis.” Defining the higher-level policies hides details present at the lower level, and thus may introduce conflicts that may not be detectable at the higher level, but can be identified using analysis and conflict resolution algorithms described in Chapter 5. Higher-level policies can be used to either derive or generate access control rules, or most commonly, be used to validate the existing set of access control rules for compliance. Such a validation is very similar conceptually to the configuration validation process described in Chapter 6, “Policy-Based Configuration Management.”
8.2.3. Policy-Based Self-Protection An important application of policies to security management is the facilitation of self-protection in the system. Most computer systems and networks are subject to different types of attacks, both from outside (external attacks) as well as from the inside (insider attacks). To react to these internal and external attacks, several security mechanisms such as firewalls, anti-spam filters, and intrusion detection systems have been devised. However, when a threat is observed (for example, a new type of virus is detected), the human administrator often needs to step in to have the system altered so that it can handle the new threats more effectively. Policies in many cases can be used to automate the behavior of the system so as to reduce the amount of manual intervention required. Chapter 1, “Policy Definition and Usage Scenarios,” provided a brief overview of the general approach for self-protection using policies. In this chapter, we look at some specific applications of policies to build a self-protecting system. The first instance that we look at policy-based self-protection is that of the updates required at a computer system for preventing and detecting malware on a personal computer. To provide the maximum protection to a computer system from viruses, Trojan horses, keyloggers, and other types of malicious code lurking in the Internet, it is common practice to turn on automatic updates of threat profiles on personal computers and systems. Nevertheless, in some instances, an update of the operating system or installed components may cause some of the existing applications to fail to perform as intended. Thus, each of the updates that may be available needs to be examined carefully by an administrator before
Policy Applications in Security
165
it is downloaded and installed. This step can be quite an onerous chore for the IT department of a large enterprise, which is required to look after the proper operation of hundreds of thousands of its employees. Policies provide a way to automate the function of the administrator in properly updating the configuration of several machines, reducing the number of machines that need special attention from the IT department. The architecture of self-protection that can be used for such an automation mechanism is shown in Figure 8.2.
Intranet
Enterprisewide Upgrade Policies
Upgrade Controller Internet Local Machine
OS Upgrade Site
Application Upgrade Site
Figure 8.2
Architecture for Self-Protection of Personal Computers
As shown in the figure, a piece of software for controlling software upgrades on the personal computer is deployed on each of the enterprise personal computers. Because most enterprises tend to roll out a standard configuration as the initial install image for their managed personal computers, this software can be incorporated as part of the standard update. The upgrade controller runs on each computer periodically (for example, every time the computer is booted up or at some fixed time every day), and checks which pending updates are required on the personal computer. A list of such updates is available from the providers of the operating system or software packages for the computers, and is usually available on the local machine, such an update list itself being a software module that runs periodically on the personal computer. In the figure, it is shown as the software upgrade monitor program on the personal computer.
166
Chapter 8
•
Policy-Based Security Management
The enterprise IT department publishes a list of current policies that enumerates which upgrades ought to be installed and which upgrades ought not to be installed. The determination of the upgrades that are safe to perform is made by the IT department after checking the proper operation of the applications on the personal computer with the specific upgrades. These policies could be expressed in a language such as CIM-SPL, where the action part of the policy is whether the upgrade ought to be allowed or disallowed and the condition part includes the set of applications currently installed on the personal computer, the operating system of the personal computer, and the identity of the update that needs to be applied. Some examples of such policies are listed next: If OS is XP, currently has Service Pack 2 installed, and wireless driver upgrade request is for version 5.2, then allow the upgrade. If OS is Windows XP, and upgrade is to Service Pack 2, and the machine has Adobe Photoshop v8.0 installed, disallow the upgrade. If OS is Linux, and upgrade is for Firefox browser; allow it. If OS is OS/2®, then block all upgrades.
An administrator may want to disallow requests if he has validated that some requests may cause malfunction in the operation of some software. In the preceding example, it is assumed that Adobe Photoshop v8.0 will not work properly with an upgrade of Windows XP to Service Pack 2, and is therefore disallowed from operating. Another example of a self-protecting system is to protect enterprise networks by quarantining systems that fail compliance checks. It is an unfortunate fact of modern network-based computing systems that some laptops may get infected with malicious code despite taking all precautions. This may happen because users take their laptops to unprotected networks such as hotspots at a coffee shop, and sometimes are lured by malicious sites into inadvertently installing a malicious piece of software on their machines. When such infected machines are brought back into the enterprise network, the malicious code on the infected machine tries to detect other machines on the network and spread the infection. A policy-based scheme can be used to automate the process of quarantining and protecting the infected machines in this manner. The architecture for the system is shown in Figure 8.3.
Policy Applications in Security
167
Compliance Agent Intranet Notebook Computer
Network Access Point
Network Access Device
Noncompliance Report
Quarantine Command
Network Management Module
Figure 8.3
Architecture for Self-Protection of an Enterprise Network
Each personal computer in the enterprise has a compliance agent that checks for the presence of malicious software. The compliance agent can keep updating the list of malicious software by contacting a server for an updated list—a process frequently used in many types of anti-virus programs. When the compliance agent detects a potential infection, it informs a network management system of the machine, its access point into the network, and the identity of the malicious software detected on the machine. The network management system compares the identity of the detected machine, and determines the action that it needs to take for the infected machine. The potential actions are to quarantine the machine entirely from the network, allow the machine access to a limited part of the enterprise network where the only access is to instructions and software to remove the malicious code, or simply notify the owner of the machine about the presence of the malicious code. The last option may be used for malware that does not attempt to infect other communicating machines—for example, unauthorized adware. The selection among the first two options would depend on the nature of the malicious software detected. After consulting the policy for dealing with the machine, the network management system can change the configuration of the access point of the device to the network to restrict access of the infected machine to the enterprise network. This can be implemented in a variety of ways in current networking technology,
168
Chapter 8
•
Policy-Based Security Management
including the use of IP packet filtering to restrict network access from the infected machine, or the use of virtual LANs to restrict the infected machine to communicating with a limited (or null) set of other machines on the network.
8.2.4. Policy-Based Communication Assurance Another application of policies is in the approach to simplify and automate the configuration of security aspects of computer communications. Although many such applications of security configuration simplification can be provided, we consider the case of managing the encryption of communication based on IP security protocol as an example. IP security (or simply IPsec) is a complex protocol that allows for the establishment of secure communication channels among two or more parties communicating using the Internet Protocol. It can be used to support many different business needs among enterprises, such as establishing secure virtual private networks and extranets for business partners, or providing remote access to the enterprise for employees working from home. Nevertheless, the protocol is complex to configure and requires the specifications of various modes in which the security parameters for such communications can be established. The policy-based communication assurance paradigm enables administrators and operators to simplify the process of establishing such secure communication channels. Instead of configuring complex security protocols and options, the administrator can specify security properties required at a higher level, and reduce the possibility that error may be introduced into the system due to mistakes by the user. We describe the process for IPsec-based communication assurance in the next section. We note that an analogous method for using policies to simplify and specify the configuration of different security options can be used for other protocols that allow secure communications, such as Secure Sockets Layer (SSL), Transport Layer Security (TLS)—a variation of SSL, and secure communications enabled using new schemes and paradigms such as Service Oriented Architecture.
8.3. Policy-Based Security Assurance for IPsec Protocol The objective of security assurance using policies is to allow users to manage the configuration of an IPsec-enabled network in terms of the business needs of the user. The user describes the high-level security assurance requirements
Policy-Based Security Assurance for IPsec Protocol
169
for communication in terms of their business needs. These requirements are then translated into the IPsec-specific communication control policies for the individual firewalls and gateways. The communication control policies are then converted into the device-specific configuration of the devices, using the configuration management paradigm described in Chapter 6. We discuss the case of policy-based service assurance in four major subsections. In the first subsection, we discuss the typical business needs that IPsec-based networks would need to fulfill. The next subsection discusses the communication control policy specifications that are required by the IPsec protocol suite. This is followed by a subsection that describes the architecture and operation of the security assurance tool. The final subsection provides an overview of a policy information model, which when rendered into XML or another policy expression language, as described in Chapter 4, “Policy Languages,” can be used for implementing the security assurance tool.
8.3.1. Business Needs Satisfied by the Security Assurance Tool The service assurance tool is designed to satisfy three types of business needs that a network operator may encounter. Virtual Private Network (VPN). An enterprise is composed of many different subnetworks located in different geographies. A private network is a network made up of communication links that are owned or leased by the enterprise and interconnect the different subnetworks of an enterprise. Private networks are secure and have predictable performance, but are expensive to own and operate. At the present time, almost all the subnetworks of an enterprise are connected to the Internet. Thus, a lower-cost option for establishing a private network is to establish secure channels through the Internet. The network is not truly private, but made to look like a private network by encrypting traffic that travels on the Internet. Hence it is called a virtual private network or VPN. Business Partner Network (BPN). Many enterprise networks need to work with members of different organizations. When a close working relationship exists between an enterprise and its partners, the enterprise may want to provide access to some of its machines to business partners. As an example, an automobile manufacturer may want to provide access of its inventory database to its franchised dealers so that they can track the status of their orders, or enable its parts suppliers and contractors access to a shared application that allows them to bid for supplying components online. The enterprise does not
170
Chapter 8
•
Policy-Based Security Management
expose all of its machines to its business partners, but would like to expose some of its applications and machines to them. In cases where a legacy application without a web-based interface needs to be exposed to the business partner, an IPsec-based solution sharing a selected subset of the network in a restricted manner to the partners is more appropriate than exposing that application on the Internet. Remote Access Network (RAN). The remote access network is a service offering geared toward enterprise networks, which enables their employees to access the network securely from the open Internet. An employee would be allowed access to the company’s intranet on dial-in access when working from home, or when traveling from a hotel. Such a service usually would tunnel through the Internet, tunneling over the HTTP protocol if necessary, and providing unrestricted access to the employees to the corporate intranet. Each of these business needs may be satisfied by many different secure communication protocols, such as IPsec, SSL, or TLS.
8.3.2. Communication Control Policies for IPsec Protocol Like most common security protocols, the IPsec protocol operates in two phases. In the first phase, the communicating parties shake hands with each other and validate the identity of the other parties in the communications using mechanisms such as validation of public key certificates issued to the communicating parties from a trusted certificate authority. The other important aspect of the first phase communication is to negotiate on the encryption scheme to be used for actual data transfer—for example, the communicating parties may want to negotiate a private key among themselves, and encrypt voluminous data transfer using the private key. Private key-based encryption is usually much more efficient than public key-based encryption schemes. The second phase of the communication is the actual transfer of the encrypted data. As a result, several complex configuration parameters and communication control policies need to be established for IPsec communication. For each phase of communications, two sets of parameters are required, one specifying the different configuration parameters for that phase, and the other specifying the parameters for the specific validation/encryption schemes for that phase (also known as the transforms or how the transmitted data will be transformed). Low-level communication control policies for IPsec define what the value of the parameters and transforms for each phase of communication ought to be when a
Policy-Based Security Assurance for IPsec Protocol
171
secure communication request is established from different partners. In general, any combination of phase one and phase two parameters and transforms can be used with a given set of remote communication party. These low-level communication policies are usually expressed in the “If Condition then Action” information models for policies described in Chapter 3, “Policy Information Model.” To simplify the task of communication control policies, we can envision a combination of different parameters and transforms specified as a security class. In that case, the communication control policies can be viewed as an instance of a CIM PolicyRule where the condition part is the parameters identifying the remote communication party, whereas the action part is the identification of the right security class to be used. With such a mapping, the information model of the low-level communication control policy can be represented using the UML diagram shown in Figure 8.4. Phase One Transform 1..* Security Policy Rule 1 Phase Two Transform
1..*
Name
1..*
1..*
1..* Security Class Name
1
Phase One Parameters
1..* 1
1..*
1
Phase Two Parameters 1..*
Communication Tunnels Source Subnets Source Ports Destination Subnets Destination Port Protocol
Figure 8.4
Communication Control Policies for IPsec
After the low-level communication control policies for IPsec-based communications are obtained, the policy-based configuration approach described in Chapter 6 can be used for converting the policies into the configuration of IPsec gateways and firewalls. Therefore, we focus on the task of obtaining the definitions of the
172
Chapter 8
•
Policy-Based Security Management
IPsec communication control policies from the definition of the business needs (VPNs, business partner networks, or remote access needs) of a user.
8.3.3. Generating the Communication Control Policies The key goal of security assurance simplification is to determine the IPsec communication control policies from a specification of the business needs of secure communications (for example, the definition of the virtual private networks, business partner networks, or remote access users) in a manner that has a high degree of guarantee that the security requirements of different types of access will be met. This process converts a business need specification, which may not necessarily follow the condition-action model of policies into a set of policies that follows the condition-action model. To do so, we assume that an expert security administrator is able to define a set of security classes (that is, a collection of lower-level parameters and transforms) that enable secure communications. The task of defining the business needs then maps down to determining which of the different security classes should be used with each of the business-level notions of VPNs, BPNs, and RANs. A VPN is specified by enumerating its members; that is, the addresses of the subnetworks that are to be connected into a VPN. Each subnetwork would be supported by one or more IPsec gateways that act as termination points for IPsec machines. Thus, a VPN can be specified fully by enumerating the enterprise subnetworks and the security gateway(s) associated with them. A business partner network is specified by enumerating the list of partners, the subnetworks from the partners that are allowed access, and the subnetwork in the enterprise to which the business partners are provided access. The partners and their subnetworks may be identified by the provided IP domain name of the partner networks, along with the associated IPsec gateways of each partner. Remote access users do not have a static IP address or an IP domain name. They can be identified with user names and the address of the remote access server that they use for communication. Given the specifications of a VPN, a business partner, or a remote access user, the set of point-to-point IPsec communication tunnels that need to be established can be readily determined. For a VPN, it would be the n(n-1)/2 tunnels that are possible among all the n IPsec gateways in the VPN. For a business partner network, it would be the n tunnels among the IPsec gateway at the
Summary
173
enterprise and each of the IPsec gateways among the n business partners, and for the remote user it would be one tunnel between the remote user and the IPsec gateway of the enterprise. The remaining task then is the identification of the security class that needs to be associated with each of the tunnels. To identify the security class to be associated with each of the tunnels, one could use the design-time transformation schemes described in Chapter 4. One possible approach is to use the policy transformation using static rules. Each static rule would define the security class to be used for a specific VPN, business partner, or remote user based on the attributes of the members. Thus, the combination of transformation schemes can result in the translation of the specification of business-level abstractions into the security policies to be installed for communications at different devices. In conjunction with the techniques described for configuration management in Chapter 6, the system can be used to perform self-configuration of the security parameters at different devices.
8.4. Summary Security management and policy-based security are a large area of research and development with very important practical applications. In this chapter, we have focused our discussion on some applications of policy technologies to the task of managing security in computer systems and networks. We have shown how policy-based techniques can be used for building a self-protection mechanism in computer systems as well as handling of security at a business-level abstraction. We have also discussed the application of policies in the domain of access control. Lastly, we discussed secure communication using IPsec and its application in VPNs, BPNs, and RANs. We then talked about how policybased mechanism can simplify the management of IPsec-based security. These are but a few examples of the many different ways in which policies can be used to improve the security of computer systems and networks.
This page intentionally left blank
Chapter 9
Related Topics
T
here are several topics related to policies that fall somewhat slightly out of scope of the use of policies for self-management. In this chapter, we take a brief look at some of those subjects and explain their relationships to policy-based self-management.
9.1. Production Rules We have seen that many policies are naturally expressed using condition/action rules. This is very similar to production rule systems in the artificial intelligence area [KlaP]. The classic computational model of a production rule system takes as input a set of objects and stores them locally in what is called its working memory. The conditions of the rules are evaluated against the working memory and a subset of the rules whose conditions evaluate to true is selected for execution using some sort of priority over the rules. The execution of a rule results in an insertion or a deletion of an object from the working memory. This process is repeated until none of the conditions in the rules is satisfied or the actions of the rules do not modify the working memory. The output is the content of the working memory. Thus, the premise of a production rule system is to do a computation based on the interactions of the rules.1 These systems were inspired by the logical syllogism modus ponens that says If A implies B, and we know A, then we can deduce B.
175
176
Chapter 9
•
Related Topics
The rules together with the computational model are providing an implementation of logical inferencing. The output does not have a direct effect on the external world, and if it has any effect the rule system definitely does not wait for this effect to take place (asynchronous effect). A large amount of effort has been dedicated to finding fast algorithms to detect which conditions become true in each iteration [FORG]. This is important when the number of rules to be evaluated in each cycle is large. There has also been research in how to specify priority relations between rules (for example, [AgCL, JaMM]). It is likely that computation is done using a few hundred rules. Some large systems may have thousands of rules. A policy, in contrast to a production rule, is a constraint on the state that a system can take. The specification of these constraints allows the development of self-managing components. It is quite feasible that the implementation of some of the policy based self-managing components leverage software based on production rules. However, even in such implementations, the focus of rule-based policy implementations is almost never on exploiting modus ponens computation. It is expected that most of the computation needed to trigger a policy is not done by the policy itself but by the system where the policy is being applied. It is always possible to externalize what causes a policy rule to fire even though for efficiency reasons some policy systems may trigger rules internally (for example, timer events or event correlations calculated inside the policy system). Thus, the main emphasis of rule-based policy systems is to provide flexible mechanisms to interact with the environment. In contrast to production rules where rules are meant to interact with each other, policies are usually specified independently of each other, they are implemented to constrain the external environment, and the intent is for the policies to be individually implemented regardless of the other policies. Although in practice policies may interact with each other and may conflict with each other, this is not the main intent of a policy-based system. Therefore, policy-based systems should provide guidance as to how to address interdependencies and conflicts. A production rule system can be used in different ways, and one possible way to use a production rule system is to use it as a policy evaluation engine. In other words, one can write policies so that they look like production rules, and use a production rule system to evaluate them. In many cases, that may be using only a fraction of the capabilities of the production rule system. However, if a production rule system is already available as a component in the environment,
Business Rules and Processes
177
it may be an expedient way to implement policies. With that specific mode of usage, a production system can be viewed as a way to implement policies. Regardless of the preceding usage model, the rules of a production system are not policies in the same sense as policies for self management. They are simply rules that capture some logical relationship, typically an element of knowledge (for example, “If a patient has a stuffy nose, sneezing, sore throat, and cough, then the patient may have a cold or the flu”).
9.2. Business Rules and Processes Business rules are a written definition of a business’s policies and practices. A standard reference defines a business rule as “a statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business.” Business rules technology has been developed with the goal of having declarative specifications of business processes. They are analogous to policies for selfmanagement, except that the domain of application is different. Business rules are used for development of constructs such as pricing guidelines, customizing the behavior of applications dealing with business processes such as sales, purchase, and procurement. Business rules may often be combined with production rules with modus ponens computation to enable complex decision-making. Sometimes business rules are implemented as relational database triggers [Ceri]. A relational database trigger is a subroutine or procedure that is called by a database management system when a qualifying condition is made true due to an action such as an insertion, deletion, or modification to the database. A business process is a specification of the procedures to be undertaken by employees of an organization when performing a business function. A business process may be specified as a sequence, with branching if necessary, of the steps that need to be undertaken. Business process steps are usually high-level declarations of the operations to be undertaken by humans and usually include the execution of various cooperative and coordinated activities. For example, in a mail order store, a business process describes the steps from the point that a client starts interacting with the interface that lets him or her order an item to the time that the item is delivered, and it includes the process of returning the item if necessary.
178
Chapter 9
•
Related Topics
The implementation of a business process goes beyond business rules, and systems that support development of this kind of process are called workflow systems. Workflow management systems provide a framework for the specification, analysis, implementation, and monitoring of workflows. The specification of a workflow pertains to the scheduling of tasks including sequences of tasks, concurrent execution of activities, and choices (that is, conditional executions) based on the business policies and procedures. Business rules may implement individual tasks but they are not supposed to involve more than one database transaction. Workflows usually cover several transactions. How are workflows related to policies? It is clear that workflows can be constrained by policies. In the mail-ordering example, the workflow must implement the return policies of the store. There can also be discount policies, credit policies, and so on, implemented in the workflow. Thus, different modules of the workflow implement different policies. This agrees with a common view that internal tasks in the workflow can be implemented as business rules. Furthermore, a workflow can be used to describe the action part of a policy—where the policy specifies when the workflow ought to be initiated. At a higher level there might be policies that cross over several tasks in the workflow. For example, “An order from a gold class client must be delivered in less than three days.” An enforcement of this policy affects the processing of the order throughout the whole workflow. In addition there are some policy implementations that require human intervention. Take the following example: “For nongold class clients, verify that a returned item has been shipped with the original packaging and is in good condition before processing a full refund. For gold class clients, process the refund immediately.” In this policy, a human supervisor will make the determination of good condition and original packaging. There is also some human processing required when entering the return information into a database or computer system. Isolating or finding the right set of attributes to define these policies and providing the right abstraction for the actions that need to be applied to enforce these policies seems to require setting some actions to be implemented by humans that are not under the direct control of the implementation. Hence, correct enforcement will depend on the collaboration of an independent system. In general, these independent systems are not necessarily humans and there can be different degrees of dependency. Business rules and business processes are important for understanding and implementing the business level policies of an enterprise, but do not directly contribute to the usage of policies for self-management of IT systems.
IT Processes
179
9.3. IT Processes Closely associated with the concept of business processes is the concept of information technology (IT) processes. Although business processes deal with aspects of the general business of an enterprise, IT processes deal with the procedures and methods that are required to operate, use, and manage the IT infrastructure of an enterprise. The Information Technology Infrastructure Library (ITIL) is perhaps the best known standard today for best practices for the support of IT services [ITIL]. ITIL is a specification of several types of processes that need to be undertaken to deliver an IT service and to support the operation of IT services. Under the broad umbrella of IT service support and IT delivery is a variety of processes dealing with topics such as incident management, problem management, change management, configuration management, and so on. The ITIL incident management process is a set of best practices that deal with such issues as how to get a service restored when some disruptive incident occurs, and the ITIL problem management process deals with such issues as how to diagnose the root cause of the problem resulting from the disruptive incident. The ITIL change management process is a set of best practices to ensure that a change to the IT infrastructure follows standard methods and procedures—for example, proper documentation is done and requisite approvals obtained. The best practices are designed to ensure that the changes have the lowest possible impact on the services being provided by the system while increasing user and IT personnel productivity and reducing the cost of the IT infrastructure as a result of the changes. The ITIL configuration management process is a set of best practices to track and record all the configuration items in IT components, systems, and processes. Proactive policy-based configuration management, fault management, and security management as described in this book can be considered as components of the different ITIL processes. As an example, proactive configuration checking should be performed as part of the change management process to ensure that the new configuration resulting from proposed or completed changes does not violate any of the known best practices or configuration constraints. It can also be used to make sure that the proposed configuration meets all the objectives for which the changes are being performed.
180
Chapter 9
•
Related Topics
9.4. Event Correlation and Notification Systems In network management, detecting the cause of network failures and performance degradation is commonly done by finding correlations between network events. All network elements today provide some type of alarm notification mechanism based on events. Event correlation systems try to filter the flood of events and transform them into complex events (or situations). Event notification services are used as intermediaries to support event interaction as an abstraction for process interactions in distributed systems (see the section on policies in fault event handling in Chapter 7, “Policy-Based Fault Management”). There are many commercial and research implementations of event correlation and notification systems (see for example [CaRW, GrKP] and the references therein). One thing common to all of them is that they define an event definition language. These languages syntactically come in many flavors, but they usually work with the following types of operations: • Events have some kind of attributes and they can be filtered using properties of the attribute values. • Given a set of events, you can detect when •
All events in the set have occurred in any order
•
At least one event has occurred
•
All events have occurred in a particular sequence
• Some combination of the two previous rules The advantages of having an event language is that, on one hand, it abstracts out time from the specification of event correlation (for example, the user does not need to refer to the times when the events occurred to say that one event occurred before another). And, on the other hand, knowing in advance the temporal relationship used to compare events leads to more efficient implementations. Most importantly, we note that many of these languages define temporal operators that can be described using an appropriate first-order logic formula. Central to any event notification and correlation system is an efficient implementation of the temporal operators that they have selected to support because in many cases they are required to handle thousands of events per minute.
Event Correlation and Notification Systems
181
Event correlation is related to the domain of policies—for example to ECA-style policies. The evaluation of active predicates is the detection of events, and the more sophisticated the support available to policies from event notification systems, the more sophisticated are the policies that can be written. Therefore, any support or enhancement to event correlation or notification technology translates into support and enhancement of policy systems. From the point of view of policies for distributed systems management, event correlation and notification systems implement useful basic capabilities on the stream of events generated by the underlying system, and can be used for management purposes. Table 9.1 compares some of the rule-based technologies discussed in this chapter; namely, policies for distributed systems, business rules, event correlation systems, and production rules. The comparison is done with respect to five characteristics, all affecting or related to their implementation. These characteristics have been selected to emphasize the differences among the technologies. Table 9.1 Comparison Between Policies, Business Rules, Event Correlation, and Production Rules Policies for Distributed Systems
Business Rules
Event Correlation
Production Rules
Specification Goals and Rules
Rules
Rules
Rules
Execution Model
Sequential evaluation
Declarative
Fixpoint evaluation
Declarative: Simultaneous evaluation of conditions with no side effects
Infrastructure Policy-enabled Support distributed system
Database Policy-enabled management event system management
Domain-specific expertise
Performance Moderate to high: distributed system
Moderate: database
Calculation based
Linkage
High: distributed system
Bidirectional; Bidirectional Mostly Calculation multiple distributed mostly to the monodirectional input data elements database to receive events
182
Chapter 9
•
Related Topics
The first characteristic is their manner of specification. Regardless of the syntax of the languages employed in the different systems, all of them use rules as the means to denote behavior, with the exception of policies for distributed systems, which also uses low-level goals or guidelines. The second characteristic is the execution model: the main characteristic that drives implementation of polices for distributed systems is the fact that the interpretation of a policy is independent of all other policies. The evaluation of conditions should not have side effects, and the actions are not executed by the policy; they are meant to be given to the policy-enabled system to decide what to do with them. In business rule systems, rules are evaluated sequentially and the result of one rule can affect the triggering of the following rules. Event correlation rules are specialized policy rules, thus, they are based on the same evaluation model as policies for distributed systems. Production rule-based systems (for example, expert systems) have many evaluation methods, but all have the common characteristic that they evaluate the rules either sequentially or simultaneously, or in hybrid ways over a working set of objects until a fixpoint is reached; that is, the working set does not change further. We restate that this process might not terminate. The third characteristic is infrastructure support: except for production rulebased systems, all the other technologies are not used in isolation. Policies for distributed systems require the orchestration of the whole system manifested through the implementation of a policy-based infrastructure (see Chapter 2, “Policy Lifecycle—Creation, Distribution, and Enforcement”). For business rules, the emphasis is on having access to a database and transaction system. In event correlation systems the conditions are mostly limited to detection of the events. Production rule-based systems behave more like traditional programs with a single input/output model. The difficult part of their implementation is in that codification of the rules requires expert knowledge. The fourth characteristic is performance: There are situations where the evaluation of policies needs to be done under very strict real-time constraints. Systems built under these requirements may compromise expressibility for performance, and we can easily place event correlation systems in this category. For business rules, database management is what is expected to dominate performance. For production rules, this is difficult to predict. Performance is domain specific and will depend on the problem being solved. Many times, hard problems (that is, NP or harder) are solved using rule systems [Bara].
Service Level Agreements
183
The fifth characteristic is linkage: Linkage is tied to infrastructure support. In policies for distributed systems, the managed elements must be able to provide information to the policy and accept action requests from the policy infrastructure. This will be achieved either by standardizing the communication with the system elements or by the policy infrastructure being able to support a variety of communication mechanisms. Business rules again need robust linkage to database systems and event correlation systems need to be able to listen to as many types of events as possible. For production rules, the linkage is not explicit, and depends upon how they have been programmed. We note that the definitions and interpretations that we use here for the various systems and their characteristics, although prevalent in the literature, are not unique. This is mostly due to the fact that actual implementations may mix various aspects of the different systems. A policy-based infrastructure might use an event correlation system, and business rules might be expressed in terms of production rules for making inferences about the operation of a business. This has led to considerable confusion when comparing the various technologies.
9.5. Service Level Agreements A service level agreement (SLA) is a formal definition of the relationship between a service provider and its customer. SLAs are used in many industries, and typically specify what a customer could expect from the service provider. Service level agreements are often used when corporations outsource functions considered outside the scope of their own core competencies to third-party service providers. The operation and maintenance of computer networks is outsourced by many companies to third-party network providers, making SLA support an important subject in the context of computer networks. An SLA might read as follows: The service provider must be able to process at least 1000 transactions per second with a response time of less than 150 milliseconds. A service level agreement would typically contain the following information: • A description of the nature of service to be provided • The expected performance level of the service—specifically its reliability and responsiveness
184
Chapter 9
•
Related Topics
• The procedure for reporting problems with the service • The timeframe for response and problem resolution • The process for monitoring and reporting the service level • The consequences for the service level not meeting its obligations An SLA would generally provide an overview of the different things that can go wrong with the provided service and remedial actions in such cases. Depending on the specific situation, some of the parts just listed may not be present. Three common approaches are used to support and manage service level agreements signed by a provider of IT services. The first approach takes the model of an insurance company toward monitoring and supporting SLAs. The second approach uses configuration and provisioning techniques to support SLAs, and the third approach takes a more dynamic and adaptive approach toward supporting service level agreements. In the insurance approach, the service provider makes its best attempt to satisfy the performance, availability, and responsiveness objectives specified in the service level agreement according to its normal operating procedures. Typically, all customers get the same level of service. The provider would monitor its service to assess is compliance with the SLA objectives. The objectives in the SLA are set so that they are unlikely to be violated during the normal operation of the system. If the objectives are not met, the service provider would pay the penalty charges specified in the agreement. A service provider calculates the financial risk associated with service level violation and provides an objective where the underlying risk is an acceptable business risk. In the provisioning approach, the service provider may sign up for different levels of service objectives with different customers. It allocates different resources to different customers, allocating enough resources to meet the objectives promised to each customer. The ability to provide different service levels and provision resources to meet the objectives for individual customers is the key characteristic of this approach. In this approach, a translation from SLA objectives to the policies that need to be made in the system is needed, which can be done in a variety of ways [AIB] [VermB]. In the adaptive approach, the service provider would dynamically modify the configuration of the system used to support the customer when monitoring indicates that the service objectives provided to the customer are in the danger
Regulatory Compliance
185
of being violated. This step reduces the probability that the service objectives will actually be violated, but does not eliminate it altogether. Also this approach may be more resource efficient than the provisioning approach in the ideal case because it will dedicate resources only when there is demand. In the second and third approach towards SLA management, policies provide a mechanism to support SLAs. Policies dictate how much resource to allocate to each customer during the initial configuration as well as the conditions under which additional resources need to be provided during run-time. Thus, policies and SLAs are related, yet quite distinct concepts in management of computer systems.
9.6. Regulatory Compliance Several regulations and laws exist to govern the operation of the companies in different nations. These regulations need to be complied with in order to do business. Many of the laws and regulations in a nation have a direct impact on the operation of an IT system. As an example, the following regulation is part of the privacy rule issued by the U.S. Department of Health and Human services to implement the privacy and security requirement for the electronic exchange of health information that appeared in the “Health Insurance Portability and Accountability Act” passed by the U.S. Congress in 1996: There are no restrictions on the use or disclosure of de-identified health information. The following identifiers of the individual or of relatives, employers, or household members of the individual must be removed to achieve the “safe harbor” method of de-identification: (A) Names; (B) All geographic subdivisions smaller than state, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of Census (1) the geographic units formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) … This is about a third of the de-identification rule, but this one is relatively simple to verify when compared to more abstract principles in the rules such as “minimum necessary” use and disclosure of information required from all serving entities (for example, health care providers and insurance companies).
186
Chapter 9
•
Related Topics
Policy-based management provides a way to satisfy these regulatory constraints in an IT environment. As a first step toward this goal, the policies need to be translated into a machine-readable version. After they are converted to a machine-readable version, the configuration of a system can be compared against the policies for compliance checking. Any possible violation would then be flagged. In the operation of a business, it is more important to avoid the violation rather than catch the violation after it happens. Policy-based configuration checking can be used for this purpose by computing the new configuration after the change is applied, and then checking the changed configuration for potential violation of guidelines (see Chapter 6, “Policy-Based Configuration Management,” for an analogous capability for SANs).
9.7. Proliferation of Policy-Based Technologies There have recently been a great many systems purporting to incorporate policy-based technologies. They cover a wide range of application domains from personalization, compliance, and virtualization, to business processes and the semantic Web. There are also a number of standards bodies that have or are developing policy related standards, including the IETF, DMTF, SNIA, OASIS, OMG, W3C, and GGF. Each approaches policy from a different perspective, and, indeed many of them define the term “policy” somewhat differently.2 In this book, we have described policy-based technologies from the perspective of network and systems management; in particular, the application of these technologies to the development of self-managing systems. Even after narrowing the field of discourse to this area, it is difficult to be comprehensive. We have presented, from the point of view of self-managing IT systems, a basic definition of what is meant by policies, how they are used in the self-managing systems context, how they may be represented, what languages have been developed for their specification, the mechanisms and algorithms necessary to implement a policy-based management system, how policies and sets of policies can be analyzed and refined, and mechanisms for policy transformation. We then described how policies have been applied in particular application domains within the self-management systems; namely, policy-based configuration checking, policy-based network management, and policy-based security management. We could have chosen other application areas as examples of how policies can help to provide self-management, but we felt that these cases were the most fully developed now.
Endnotes
187
We conclude by noting that the research continues, and continues to increase in almost all aspects of policy-based computing. We expect that in the next several years, there will be a better understanding of the fundamentals of policy technologies, and a greater consensus on the capabilities of the management infrastructures needed for self-management.
Endnotes 1
This rule-based model is that most typically associated with expert systems [JACK].
2
The IETF, DMTF, and SNIA have joint standards in the policy area.
This page intentionally left blank
References
[ACPL] IBM: Autonomic computing policy language. http://dl.alphaworks.ibm.com/ technologies/pmac/acpl.pdf, 2005. [AgCL] R. Agrawal, R. Cochrane, and B. G. Lindsay, “On maintaining priorities in a production rule system,” Proceedings of International Conference on Very Large Data Bases (VLDB), pages 479–487, 1991. [AGLV] D. Agrawal, J. Giles, K-W. Lee, K. Voruganti, K. Filali-Adib, “Policy-Based Validation of SAN Configuration,” Proceedings of IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 77–86, 2004. [AgrG] D. Agrawal, J. Giles, K-W. Lee, and J. Lobo, “Policy ratification,” Proceedings of IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 223–232, 2005. [AgrGL] D. Agrawal, K-W. Lee, and J. Lobo, “Policy-based management of networked computing systems,” IEEE Communications, 43(10):69–75, 2005. [AIB] I. Aib, M. Salle, C. Bartolini, A. Boulmakoul, and R. Boutaba, “Business Aware Policy Based Management,” Proceedings of IEEE/IFIP Workshop on Business Driven IT Management (BDIM 2006), 2006. [APPL] K. Appleby et. al., “Oceano-SLA based management of a computing utility,” Proceedings of 2001 IEEE/IFIP International Symposium on Integrated Network Management (IM), pages 855–868, 2001.
189
190
References
[AUTO] Autonomoc Computing: IBM’s Perspective on the state of Information Technology, http://www.research.ibm.com/autonomic/manifesto/autonomic_ computing.pdf. [BanL] A. K. Bandara, E. Lupu, and A. Russo, “Using event calculus to formalise policy specification and analysis,” Proceedings of IEEE International Workshop on Policies and Distributed Systems and Networks (POLICY), pages 26–35, 2003. [Bara] C. Baral, Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, 2003. [BeaG] M. Bearden, S. Garg, and W. Lee, “Integrating Goal Specification in PolicyBased Management,” Proceedings of IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 153–170, 2001. [Bell] D. E. Bell, L. J. LaPadula, “Secure Computer System: Unified Exposition and Multics Interpretation,” MTR-2997, The MITRE Corporation, March 1976. [Biba] K. J. Biba, “Integrity Considerations for Secure Computer Systems,” MTR-3153, The Mitre Corporation, April 1977. [BroF] L. Brownston, R. Farell, E. Kant, and N. Martin, “Programming Expert Systems in OPS5,” An Introduction to Rule-Based Programming. Addison-Wesley, 1985. [Brog] W. Brogran, “Modern Control Theory,” Prentice Hall, October 1990. [BunL] H. Kleine Büning, U. Löwen, and S. Schmitgen, “Inconsistency of production systems,” Journal of Data and Knowledge Engineering, 3:245–260, 1988/89. [CANA] Cisco, Active Network Abstraction, http://www.cisco.com/go/ana, 2008. [CaRW] A. Carzaniga, D. S. Rosenblum, and A. Wolf, “Design and evaluation of a wide-area event notification service,” ACM Transactions on Computer Systems (TOCS), 19(3):332–383, 2001. [CDP] Cisco, CDP-Cisco Discovery Protocol, http://www.cisco.com/en/US/ products/hw/switches/ps663/products_tech_note09186a0080094713.shtml#cdp. [Ceri] S. Ceri, R. Cochrane, and J. Widom, “Practical applications of triggers and constraints: Success and lingering issues (10-year award),” Proceedings of International Conference on Very Large Data Bases (VLDB), 2000.
References
191
[ChaF] M. Charalambides, P. Flegkas, G. Pavlou, A. K. Bandara, E. C. Lupu, A. Russo, N. Dulay, M. Sloman, and J. Rubio-Loyola, “Policy conflict analysis for quality of service management,” Proceedings of IEEE International Workshop on Policies and Distributed Systems and Networks (POLICY), pages 99–108, 2005. [CHEN] Peter P. Chen, “The Entity-Relationship Model - Toward a Unified View of Data,” ACM Transactions on Database Systems (TODS), 1(1): 9–36, 1976. [ChoL] J. Chomicki, J. Lobo, and S. Naqvi, “Conflict resolution using logic programming,” IEEE Transactions on Data and Knowledge Engineering, 15(1):244–249, 2003. [ChoT] J. Chomicki and D. Toman, “Temporal logic in information systems,” Logics for Databases and Information Systems, Kluwer Academic Publishers pages 31–70, 1998. [CIM] DMTF, Common Information Model (CIM) Standards. http://www. dmtf.org/standards/cim, 2008. [CJDA] C. J. Date, What not How, Addison Wesley, 2000. [CODD] E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM (CACM), Vol. 13, No. 6, pages 377–387, 1970. [COST] D. Agrawal, D. Olshefski, and D. Verma, Cost Conversant classification of objects, US Patent 6928445, August 2005. [CQL] DMTF, CIM Query Language Specification. Version 1.0.0h edition, 2006. [Dant] G. B. Dantzig, Linear Programming and Extensions, Princeton University Press, 1963. [DiaG] Y. Diao, N. Gandhi, J. Hellerstein, S. Parekh, and D. Tilbury, “Using MIMO Feedback Control to enforce Policies for interrelated metrics with application to the Apache Web Server,” Proceedings of IEEE/IFIP Network Operations & Management Symposium (NOMS), 2002. [DIFF] D. Black, S. Blake, M. Carlso, et. al. An Architecture for Differentiated Services, IETF RFC 2475, December 1998. [DudH] R. Duda, P. Hart, and G. Stork, Pattern Classification, John Wiley & Sons, Second edition, 2001. [FORG] C. Forgy, “Rete: A fast algorithm for the many patterns/many objects match problem,” Artificial Intelligence, 19(1):17–37, 1982.
192
References
[GARTNER] J. Pescatore, Taxonomy of Software Vulnerabilities, The Gartner Group, 2003. [GrKP] R. E. Gruber, B. Krishnamurthy, and E. Panagos, “High-level constructs in the READY event notification system,” Proceedings of ACM SIGOPS European Workshop, 1998. [Hayk] S. Haykin, Neural networks: A Comprehensive Foundation, Prentice Hall, July 1998. [HopU] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison Wesley, 1979. [HOSM] T. Howes and M. Smith, LDAP: Programming Directory-Enabled Applications with Lightweight Directory Access Protocol, Macmillan Technical Publishing, 1997. [IoaS] Y. E. Ioannidis and T. K. Sellis, “Supporting inconsistent rules in database systems,” Journal of Intelligent Information Systems, 1(3/4), 1992. [ITIL] ITIL – IT Infrastructure Library, http://www.itil-officialsite.com, 2008. [JACK] P. Jackson, Introduction to Expert Systems, Addison-Wesley, 1998. [JagM] H. V. Jagadish, A. O. Mendelzon, and I. S. Mumick, “Managing conflicts between rules,” Proceedings of ACM SIGACT/SIGMOD Symposium on Principles of Database Systems, pages 192–201, 1996. [Kagal] K. Lalana, “Rei: A Policy Language for the Me-Centric Project,” TechReport, HP Labs, September 2002. [KlaP] D. Klahr, P. Langley, and R. Neches, Production System Models of Learning and Development, The MIT Press, 1987. [LupS] E. C. Lupu and M. Sloman, “Conflict analysis for management policies,” Proceedings of IFIP/IEEE International Symposium on Integrated Network Management (IM), pages 430–443, 1997. [MERR] Merriam-Webster, editor. Merriam-Webster OnLine. Merriam-Webster Inc., 2004. [MILA] N. Mitra and Y. Lafon, SOAP Version 1.2 Part 0: Primer, http://www.w3.org/TR/2006/PER-soap12-part0-20061219/, December 2006. [MofS] J. D. Moffett and M. S. Sloman, “Policy conflict analysis in distributed system management,” Journal of Organizational Computing, 4(1):11–22, 1994.
References
193
[MonB] M. Casassa Mont, A. Baldwin, and C. Goh, “POWER Prototype: Towards Integrated Policy-Based Management,” Technical Report: HP Labs,1999. [NCIM] IBM, Tivoli Network Manager IP Edition. http://www.ibm.com/software/tivoli/products/netcool-precision-ip, 2008. [OASI] OASIS: Extensible Access Control Markup Language, http://www. oasis-open.org/committees/download.php/2406/oasis-xacml-1.0.pdf, 2004. [OCL] OMG: Object Constraint Language, http://www.omg.org/technology/ documents/formal/ocl.htm, 2007. [OSBO] S. Osborn, R. Sandhu, and Q. Munawer, “Configuring role-based access control to enforce mandatory and discretionary access control policies,” ACM Transactions on Information and System Security 3(2): 85–106, 2000. [P3P] L. Cranor et. al, The Platform for Privacy Preferences 1.1 (P3P1.1) Specification, W3C Working Group Note, Available at URL http://www.w3.org/TR/P3P11/, November 2006. [PCIM] B. Moore, E. Ellesson, J. Strassner, A. Westerinen, “Policy Core Information Model – Version 1 Specification,” RFC 3060, February 2001. [PCIME] B. Moore, “Policy Core Information Model (PCIM) Extensions,” RFC 3460, January 2003. [PDL] J. Lobo, R. Bhatia, S. Naqvi, “A policy description language,” Proceedings of AAAI, 291–298, 1999. [POND] N. Damianou, N. Dulay, E. Lupu, M Sloman, “The Ponder Specification Language,” Proceedings of IEEE Workshop on Policies for Distributed Systems and Networks (POLICY), pages 18–38, 2001. [ShaT] C. Shankar, V. Tlawar, S. Iyer, Y. Chen, D. Milojicic, and R. Cambell, “Specification-enhanced policies for automated management of changes in IT systems,” Proceedings of USENIX Large Installation System Administration Conference (LISA), pages 101–116, 2006. [SISU] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, 3 Ed. McGraw-Hill, 1999. [SMIS] SNIA, SNIA Standard - Storage Management Initiative Specification (SMI-S), Version 1.0.0, rev 2 edition, 2005.
194
References
[SNMP] William Stallings, SNMP, SNMPv2, SNMPv3 and ROMON 1 and 2, Addison-Wesley Professional, 1998. [SPL] DMTF, CIM Simplified Policy Language (CIM-SPL). Version 1.4.5 edition, http://www.dmtf.org/apps/org/workgroup/policy/, 2006. [SRIN] R. Srinivasan, RPC: Remote Procedure Call Protocol Specification Version 2, IETF RFC 1831, http://www.ietf.org/rfc/rfc1831.txt, 1995. [STRA] J. Strassner, Policy-based Network Management: Solutions for Next Generation, Morgan Kaufmann, August 2003. [SuCh] L. Su, D. Chadwick, A. Basden, and J. Cunningham, “Automated Decomposition of Access Control Policies,” Proceedings of IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), 2005. [SUSE] Linux Server Management: SUSE Linux Enterprise Server 10, http://www.novell.com/products/server/management.html, 2008. [UML] Object Management Group, “Unified Modeling Language,” Online document at http://www.uml.org/, 2008. [Ved] A. Vedamuthu, et. al, Web Services Policy 1.5 - Framework, W3C Recommendation, Available at URL http://www.w3.org/TR/ws-policy/, 2007. [Verm] D. Verma, “Simplifying Network Administration using Policy based Management,” IEEE Network Magazine, 16(2):20-26, March/April 2002. [VermB] D. Verma, M. Beigi, and R. Jennings, “Policy Based SLA Management in Enterprise Networks,” Proceedings of IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 137–152, 2001. [VERPN] D. Verma, Policy based Networking: Architecture and Algorithms, New Riders Publications, November 2000. [WHITE] B. White, et. al, “Communications Server for z/OS V1R9 TCP/IP Implementation Volume 4: Security and Policy based Networking,” IBM Redbook SG24-7535-00, Part 3.0 Policy based Networking, April 2008. [WSDM] OASIS, Web Services Distributed Management (WSDM), http://www.oasis-open.org/committees/wsdm, 2008. [YANKEE] Z. Kerravala. Enterprise Computing & Networking. The Yankee Group, 2004.
Index
A abstract policy layer, 22 access control, 14, 158-160 enterprise network policies, 28-30 higher-level access policies, 163-164 labeled access control models, 161 policy-based communication assurance, 168 policy-based self protection, 164-168 RBAC, 162 access control lists, 161 accounting management, 116 ACPL (Autonomic Computing Policy Language), 81-82 acquisition of fault information, 146-147 action policies, 8, 15
adaptive approach to SLA management, 184 administration GUI, 155 administrative domains, policy distribution, 38, 41 alert policies, 9 analytic plug-ins, 143 analytical models for design-time transformation, 96 analyzing root cause of fault events, 150-151 anchor objects, 83 application servers, 144 applications for security management, 159 higher-level access policies, 163-164 policy-based communication assurance, 168
actions, 5
195
196
policy-based self protection, 164-168 policy-driven access control, 160-162 arbitrated loop, 122 architecture of policy-based fault management systems, 153 administration GUI, 155 discovery plug-ins, 153 event server, 154 probes, 153 root cause analysis engine, 154 tiered architecture, 155 of policy-based SAN configuration checker, 128-131 arguments, modus ponens, 175 associations, 87 asynchronous replies to policy guidance requests, 44 attributes of target systems, 4 within relational models, 52 authentication, 158 authorization policies conflicts, checking for, 107 rules, 76-77 availability, 157 related tasks, 159
B backward feature generation scheme, 103 behavior, 5
Index
BPNs (business partner networks), 169 building a PBMS, 17-19 business processes, 177-178 business rules, 177-178 Business Rules Systems, 2
C canonical representation of error notifications, 144 cardinality, 53 cascaded policies, 85 case-based reasoning data preprocessing, 102 data value clustering, 101 policy transformation, 99-103 change management, ITIL change management process, 179 checking for conflicting policies, 106, 109 Chinese wall policy, 163 CIM (Common Information Model) Policy Model, 51, 62-66, 69, 89 policy enforcement, decision execution, 49 CIM-SPL example policy, 89-91 policy groups, 87-89 policy rules, 82-83 condition section, 84 data types, 85-86 decision section, 84-85 operators, 85-86 strategy statements, 89
Index
CIM-style policy rules, specifying, 79-80 classes, 52 ManagedElement class, 63 Policy class (CIM Policy Model), 64-65 PolicyActionStructure class (CIM Policy Model), 65 PolicyGroup (CIM Policy Model), 67-69 PolicyRule (CIM Policy Model), 67-69 PolicySet class (CIM Policy Model), 66
197
concurrent execution (Ponder), 85 condition section of CIM-SPL policy rules, 84 condition-action information model, 56-57 condition-action policies conflicting policies, checking for, 107 rules, 9 conditions, checking for overlap, 108-109 confidentiality, 157 related tasks, 158
codebook approach to root cause analysis, 150
configuration attributes, 7
collection policies, 126
configuration constraint policies, 7
collections of policy rules, 60
configuration management, 116-117 in hosted server environment, 131-133 self-configuration, architecture, 133-136 ITIL configuration management process, 179 policy-based, 118 example SAN configuration, 122, 125-127 goals, identifying, 118-119 system configuration, policybased checking, 120-121 system configuration, policybased tuning, 119-120 SAN configuration example configuration checking, architecture, 128-131
communication control policies generating, 172-173 for IPsec protocol, 170-172 comparing rule-based technologies, 181 execution model, 182 infrastructure support, 182 linkage, 183 manner of specification, 182 performance, 182 XACML and PDL, 81 complex policy evaluation algorithms, 47 component-level policies, 24 components, role-based policy grouping, 36-37 concrete policy layer, 23
configuration best practices, 120
198
Index
configuration scripts, policy distribution, 32
declarative nature of policy languages, 72-73
conflicting policies, detecting, 39, 62, 93, 106-109 overlapping conditions, checking for, 108-109 resolving, 109-110 what-if analysis, performing, 112-113
default policies, 111
connectivity graph policies, 126 constraints, 5, 9 enforcing, 72 control theory, 105 conversion of fault information into canonical format, 147-149
DEN (Directory Enabled Networks), 35 deployment policy layer, 24 describing information models, 52 design-time policy transformation, 95 data normalization, 103-104 using analytical models, 96 using case-based reasoning, 99-103 using policy lookup table, 97-99 using static rules, 96-97 detecting conflicting policies, 93
coverage checking, 111-112
device selection policies, 151
CQL (CIM Query Language), 79 CIM-style policy rules, specifying, 79-80
Differentiated Services, 26
creation tools for QoS policies, 26, 30-31
dimensionality reduction, 102 via feature selection, 102 via principal component analysis, 103
D
discard policies, 148
data gathering phase of policy enforcement, 45-46 data models. See information models data normalization, 103-104 data preprocessing for case-based reasoning, 102 data types (CIM-SPL), 85-86 data value clustering for case-based reasoning, 101 decision section of CIM-SPL policy rules, 84-85
discovery plug-ins, 153 distributed IT systems, configuration management, 116-117 policy-based, 118-122, 125-127 distributing policies, 31 administrative domains, 38, 41 configuration scripts, 32 pub-sub, 33-35 repositories, 33-35 component role-based policy grouping, 36-37
Index
199
DMTF (Distributed Management Task Force), 17-18 CIM, 62 CIM-SPL policy groups, 87-89 policy rules, 82-86
event correlation systems, 180
DoS attacks, 159
event-condition-action information model, 59
duplicate elimination policies, 149
E E-R (entity-relationship model), 52 ECA (event-condition-action) information model, 74 rules, 9 encryption, 158 enforcement context, 41, 44-45 enforcing policies, 41 constraints, 72 data gathering phase, 45-46 decision execution, 49 enforcement context, 44-45 evaluation trigger, 42-44 policy evaluation phase, 46 complex evaluation algorithms, 47 generic Boolean expression evaluators, 47 table-based algorithms, 47 tree-based evaluation, 48 enterprise networks, access policies, 28-30 entities, 52 error handling policies, 148 evaluation trigger, 41-44 event combination policies, 150
event notification services, 180 event servers, 154 event volume reduction, 149-150 event-based evaluation triggers, 43
events monitored time, 74 specifying in PDL, 75 example CIM-SPL policy, 89-91 executable policy layer, 24 explicit requests for evaluation triggers, 43
F failure handling during policy enforcement, 49 fault management, 115, 139 analytic plug-ins, 143 event notification services, 180 in networks, 141-143 in web-based applications, 144-145 canonical representation of error notifications, 144 policy-based, 145 acquisition of fault information, 146-147 conversion of fault information into canonical format, 147-149 event volume reduction, 149-150
200
remedial actions, 151-152 root-cause analysis process, 150-151 probes, 141-143 FCAPS, 115 configuration management, 116-117 in hosted server environment, 131-136 policy-based, 118-122, 125-127 fault management, 139 in networks, 141-143 in web-based applications, 144-145 policy-based, 145-152 security management, 158 applications, 159 higher-level access policies, 163-164 policy-based communication assurance, 168 policy-based security assurance for IPsec, 168-172 policy-based self protection, 164-168 policy-driven access control, 160-162
Index
glossary of standardized terms, 30 normalizing between different policy sets, 38 goal policies, 8 grouping policy components, 37 policy rules, 60
H HBAs (host bus adapters), 122 HCI (Human Computer Interface) design principles applying to policy creation module, 30-31 high-level user-specified policies, 22 higher-level access policies, 163-164 HIPAA (Health Insurance Portability and Accountability Act), 185 holistic view of policies, 25 hosted server environments, policybased configuration management, 131-133 self configuration, architecture, 133-136 hyperrectangles, 98
feature selection, dimensionality reduction, 102
I
firewalls, access control policies, 14
IETF/DMTF Policy Architecture, 17-18
G generating communication control policies, 172-173 generic Boolean expression evaluators, 47
if-condition-then-action policy rule, 64 impact of policies on systems, performing what-if analysis, 112-113 implementation policy layer, 23
Index
information models, 18, 55 CIM Policy Model, 62-66, 69 condition-action information model, 56-57 describing, 52 event-condition-action information model, 59 mode-subject-action-target information model, 59 priority, 62 infrastructure support of rule-based technologies, comparing, 182
201
k-nearest neighbor clustering for case-based reasoning, 101 labeled access control models, 161 layered policy architecture abstract policy layer, 22 concrete policy layer, 23 deployment policy layer, 24 high-level user-specified policies, 22 user policy specification layer, 22
integrity, 157 related tasks, 159
legislation HIPAA (Health Insurance Portability and Accountability Act), 185 Money Laundering Suppression Act, 3
intercomponent policies, 126
Level 1 access, 28
intracomponent policies, 126
Level 2 access, 28
intrusion detection, 159
Level 3 access, 28
IPsec communication control policies, generating, 172-173 policy-based security assurance, 168 communication control policies, 170-172
Level 4 access, 28
inheritance, 53 insurance approach to SLA management, 184
IT processes, 179 ITIL (Information Technology Infrastructure Library), 179
J–K–L JDBC (Java Database Connector) library, 35
linkage of rule-based technologies, comparing, 183 low-level network access policies, 29
M malware prevention, 159 ManagedElement class, 63 merging policies, conflict detection, 39 messaging systems, pub-sub policy distribution, 33-35 meta-policies, 77 conflict resolution, 110
202
Index
methods, 52
object-oriented model, 52
metric constraint policies self-optimizing systems, 15
obligation policies, checking for conflicts, 108
MIBs, 141
obligations, 76
Microsoft 2000 Exchange server policies, 3
OCL (Object Constraint Language), 22 expressions, 77
mode-subject-action-target information model, 59 modus ponens, 175 MOF (Managed Object Format) files, 63, 83 Money Laundering Suppression Act (1994), 3 monitored data, fault management process, 140 monitored time events, 74 Monte Carlo simulation, 113 multiparty policy negotiation, 40
N network management systems, 141-143 network traffic security policies, 13 networks, fault management, 141-143 neural network model, 105 normalizing data, 103-104 normalizing policies, 38
O OASIS (Organization for the Advancement of Structured Information Standards), 14 XACML, 81
operators (CIM-SPL), 85-86 overlapping conditions between policies, checking for, 108-109
P PBMS (policy-based management systems), 1 building, 17-19 component-level policies, 24 enforcement context, 44-45 evaluation triggers, 41-44 fault management, 139 in networks, 141-143 in web-based applications, 144-145 policy-based, 145-152 policy enforcement data gathering phase, 45-46 decision execution, 49 policy evaluation phase, 46-48 security management, 158 applications, 159 higher-level access policies, 163-164 policy-based communication assurance, 168 policy-based self protection, 164-168 policy-driven access control, 160-162
Index
PCA (principal component analysis), 103 PCIM (Policy Core Information Model), 51, 81 PDL (Policy Description Language), 73 events, specifying, 75 propositions, syntax, 74 PDP (policy decision point), 18 PEP (policy enforcement point), 18 performance management, 116 policies, 148 performance of rule-based technologies, comparing, 182 policies, 2 abstract, 22 action policies, 8 actions, 5 alert policies, 9 behavior, 5 cascaded, 85 compliance, 159 configuration constraint policies, 7 conflict detection, 39, 62, 93, 106, 109 resolving conflicts, 109-110 constraints, 5, 9 coverage checking, 111-112 creating, 30-31 distribution process, 31 administrative domains, 38, 41 configuration scripts, 32
203
pub-sub, 33-35 repository-based, 33-37 ECA information model, 74 enforcing, 41 policy enforcement context, 44-45 policy evaluation trigger, 42-44 enterprise network access, 28-30 high-level user specified policies, 22 holistic view of, 25 layered policy architecture, user policy specification layer, 22 metric constraint policies, 7-8 normalizing, 38 privacy policies, 27 QoS, 25 creation tools, 26 target system, 4 in self-configuring systems, 11 versus production rules, 176 and workflows, 178 Policy class of CIM Policy Model, 64-65 policy definition tools, 134 policy enforcement data gathering phase, 45-46 decision execution, 49 policy evaluation phase, 46 complex evaluation algorithms, 47 generic Boolean expression evaluators, 47 table-based algorithms, 47 tree-based evaluation, 48
204
policy groups, 78, 87-89 associations, 87 managing, 61 scope of, 61 policy information models, 9, 51, 54-55 CIM, 62-66, 69 condition-action information model, 56-57 describing, 52 event-condition-action information model, 59 mode-subject-action-target information model, 59 priority, 62 policy languages, 71 ACPL, 81-82 CIM-SPL example policy, 89-91 policy groups, 87-89 policy rules, 82-86 strategy statements, 89 CQL, 79 CIM-style policy rules, specifying, 79-80 declarative nature of, 72-73 PDL, 73-75 Ponder authorization policy rules, 76-77 meta-policies, 77 policy groups, 78 role policies, 78-79 XACML, 81 policy lifecycle, 22-24
Index
policy lookup tables design-time transformation, 97-99 policy profiles, 128 policy rules (CIM-SPL), 82-83 condition section, 84 data types, 85-86 decision section, 84-85 operators, 85-86 policy synchronization, 95 policy transformation, 15, 94 conflicting policies, performing what-if analysis, 112 design-time transformation, 95 data normalization, 103-104 using analytical models, 96 using case-based reasoning, 99-103 using policy lookup table, 97-99 using static rules, 96-97 real-time transformation, 95, 104-106 transforming business-level objectives into system configuration parameters, 105 policy-based communication assurance, 168 policy-based configuration management, 118 example SAN configuration configuration checking, 122, 125, 128-131 policy modeling and representation, 125-127 goals, identifying, 118-119
Index
in hosted server environment, 131-133 self configuration, architecture, 133-136 system configuration, policy-based checking, 120-121 system configuration, policy-based tuning, 119-120 policy-based fault management, 145 acquisition of fault information, 146-147 conversion of fault information into canonical format, 147-149 event volume reduction, 149-150 remedial actions, 151-152 root-cause analysis process, 150-151 system architecture, 153 administration GUI, 155 discovery plug-ins, 153 event server, 154 probes, 153 root cause analysis engine, 154 tiered architecture, 155 policy-based management, 1 policy-based security assurance for IPsec, 168 communication control policies, 170-172 policy-based self protection, 164-168 PolicyActionStructure class (CIM Policy Model), 65 PolicyGroup class (CIM Policy Model), 67-69 PolicyRule class (CIM Policy Model), 67-69
205
PolicySet class (CIM Policy Model), 66 Ponder authorization policy rules, 76-77 concurrent execution, 85 meta-policies, 77 policy groups, 78 role policies, 78-79 serial execution, 85 principal component analysis, dimensionality reduction, 103 prioritizing conflicting policies, 109 priority of policy rules, 62 privacy policies, 27 probes, 141-143, 153 selection policies, acquiring fault information, 147 processing policies, 151 production rule system, 175-177 proliferation of policy-based technologies, 186 propositions, PDL, 74 provisioning approach to SLA management, 184 pub-sub (publication-subscription) policy distribution, 33-35
Q–R QoS (quality of service), 25 policy creation tools, 26 SLAs, 183 quarantining systems, 166
206
Index
RANs (remote access networks), 170
linkage, comparing, 183 manner of specification, comparing, 182 performance, comparing, 182
RBAC (role-based access control), 162 RCA (root-cause analysis) engine, 151 firing policies, 150 real-time policy transformation, 95, 104-106. See also design-time policy transformation transforming business-level objectives into system configuration parameters, 105 reducing fault event volume, 149-150 refinement templates, 113 regulatory compliance, 185-186 relational database triggers, 177 relational models, 52 relevant policies, 41 remedial actions for fault events, 151-152 repositories, policy distribution, 33-35 component role-based policy grouping, 36-37 resolving policy conflicts, 39, 109-110 resource management policies, acquiring fault information, 147 RFCs (Requests for Comments), 18 role policies, 78-79 root cause analysis engine, 154 rule-based policy systems, 176, 181 execution model, comparing, 182 infrastructure support, comparing, 182
rules, business rules, 177-178
S SANs (storage area networks) collection policies, 126 configuration management, 128-131 connectivity graph policies, 126 intercomponent policies, 126 intracomponent policies, 126 policy-based configuration management configuration checking, 122, 125 policy modeling and representation, 125-127 schedule-based evaluation triggers, 43 scope of policy groups, 61 scripts, enabling self configuration, 11 security assurance tools, services provided by, 169-170 security management, 116, 158 applications, 159 higher-level access policies, 163-164 policy-based communication assurance, 168 policy-based self protection, 164-168 policy-driven access control, 160-162
Index
policy-based security assurance for IPsec, 168 communication control policies, 170-172
207
synchronous replies to policy guidance requests, 44 syntax of authorization policy rules (Ponder), 76-77
selection policies, 148
system metrics, 7
self-configuring systems, 10-12 in hosted server environments, 133-136
T
self-healing systems, 16 self-optimizing systems, 15 self-protecting systems, 13 access control policies, 14 serial execution (Ponder), 85 shared keys, policy negotiation, 40 SLAs (service-level agreements), 183 SMI-S (Storage Management Initiative-Specification), 124-125 SNIA (Storage Networking Industry Association), 124
table-based policy evaluation algorithms, 47 target systems, 4 in self-configuring systems, 11 terminology, normalizing, 38 tiered architecture for policy-based fault management systems, 155 transforming policies. See policy transformation translation modules, 135 traps, 141 tree-based policy evaluation process, 48
SNMP (Simple Network Management Protocol) MIBs, 141 traps, 141
U–V–W
SOA (services-oriented architectures), 35
UML (Unified Modeling Language), 52
solicited policy guidance requests, 44
unsolicited policy guidance requests, 44
standards organizations DMTF, 17-18 CIM, 62 IETF, 17-18 static rules for design-time transformation, 96-97
two-dimensional hyperspace, 98
user interface of policy creation module, 30-31 user policy specification layer, 22
strategy statements (CIM-SPL), 89
variables, policy coverage checking, 111
symptom database, 150
VPNs (virtual private networks), 169
208
Web servers, 144 web systems, fault management, 144-145 what-if analysis, performing, 112113 workflow systems, 178 working memory, production rules, 175-177
X–Y–Z XACML, 55 XACML (eXtensible Access Control Markup Language), 14 XACML (OASIS eXtensible Access Control Markup Language), 81
Index
This page intentionally left blank