314 75 13MB
English Pages 202 [204] Year 2017
MODELLING BUSINESS INFORMATION
Entity relationship and class modelling for business analysts Keith Gordon
As the roles of Data and Business Analysts become more intertwined, this book is timely in its publication. Businesses often fail to recognise information is a key resource and are confused by how it is presented or overwhelmed its complexity during use. Keith brings to the forefront of the readers mind the importance of communicating and analysing the relationship between Business, Information, Systems and Data, and the value in developing models cooperatively, gaining ‘consensus, not perfection’ from stakeholders. Simple everyday examples and analogies to support the readers understanding and make the subject more relatable are used. I enjoyed reading the book and completing the exercises. An excellent learning aid for Analysts who are new to modelling or need reminding of good practice. Katie Walsh, Business Analyst and Mentor Anyone interested in a thoughtful, well-done text on how to do high-quality business analytical data modelling should definitely proceed with this book. David Hay, Essential Strategies International, CEO Modelling Business Information provides an introduction to data modelling, to the nomenclature used by common modelling techniques, and to techniques for representing common patterns. This is a useful book for business analysts who are creating the information model as well as for business and IT users who need to understand a data model. Keith W. Hare, JCC Consulting, Inc., Senior Consultant Keith Gordon’s wonderfully compact yet thorough introduction to business-friendly information modelling is a terrific contribution to the field. Globally, there’s a surge of interest in data modelling as a powerful tool for improving communication, especially with professionals who used to think business-oriented entity relationship modelling didn’t need to be in their tool kits. Business analysts, Agile developers, data scientists, big data specialists, and other professionals will all benefit from Keith’s work. Alec Sharp, Senior Consultant, Clariteq Modelling Business Information by Keith Gordon, is aimed at those who are new to business analysis or information modelling. Keith draws on a wealth of experience in information management, both as a practitioner, and as a lecturer with the Open University in his writing. The first six chapters provide an accessible and clear foundation in the topic covering the reasons for developing information models, the basic elements of entityrelationship diagrams, how to develop an information model from basic information requirements, and finally how to normalise existing data. I particularly like that it uses two graphical notations, the Barker-Ellis notation, noted for its readability, and the ubiquitous Unified Modelling Language notation, which helps to demonstrate that there are different notations that entity-relationship models can be developed in. This first part of the book also takes care to cover the syllabus for the Data Analysis certificate that is part of the scheme for the BCS Advanced International Diploma in Business Analysis. The second part of the book covers a range of more advanced topics from naming conventions and yet more entity-relationship model notations, to considerations of quality in
information models, corporate data models, modelling for business intelligence applications, and finally goes on to look at data and database topics including an overview of SQL, and moving to database design and optimisation. Overall, the book provides an excellent grounding in the full range of topics related to information modelling. Matthew West, Director, Information Junction
MODELLING BUSINESS INFORMATION
BCS, THE CHARTERED INSTITUTE FOR IT BCS, The Chartered Institute for IT, champions the global IT profession and the interests of individuals engaged in that profession for the benefit of all. We promote wider social and economic progress through the advancement of information technology science and practice. We bring together industry, academics, practitioners and government to share knowledge, promote new thinking, inform the design of new curricula, shape public policy and inform the public. Our vision is to be a world-class organisation for IT. Our 75,000-strong membership includes practitioners, businesses, academics and students in the UK and internationally. We deliver a range of professional development tools for practitioners and employees. A leading IT qualification body, we offer a range of widely recognised qualifications. Further Information BCS, The Chartered Institute for IT, First Floor, Block D, North Star House, North Star Avenue, Swindon, SN2 1FA, UK. T +44 (0) 1793 417 424 F +44 (0) 1793 417 444 www.bcs.org/contact http://shop.bcs.org/
MODELLING BUSINESS INFORMATION
Entity relationship and class modelling for business analysts Keith Gordon
© 2017 BCS Learning & Development Ltd The right of Keith Gordon to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted by the Copyright Designs and Patents Act 1988, no part of this publication may be reproduced, stored or transmitted in any form or by any means, except with the prior permission in writing of the publisher, or in the case of reprographic reproduction, in accordance with the terms of the licences issued by the Copyright Licensing Agency. Enquiries for permission to reproduce material outside those terms should be directed to the publisher. All trademarks, registered names etc. acknowledged in this publication are the property of their respective owners. BCS and the BCS logo are the registered trademarks of the British Computer Society, charity number 292786 (BCS). Published by BCS Learning & Development Ltd, a wholly owned subsidiary of BCS, The Chartered Institute for IT, First Floor, Block D, North Star House, North Star Avenue, Swindon, SN2 1FA, UK. www.bcs.org Paperback ISBN: 9781780173535 PDF ISBN-13: 9781780173542 EPUB ISBN-13: 9781780173559 Kindle ISBN-13: 9781780173566
British Cataloguing in Publication Data. A CIP catalogue record for this book is available at the British Library. Disclaimer: The views expressed in this book are those of the authors and do not necessarily reflect the views of the Institute or BCS Learning & Development Ltd except where explicitly stated as such. Although every care has been taken by the authors and BCS Learning & Development Ltd in the preparation of the publication, no warranty is given by the authors or BCS Learning & Development Ltd as publisher as to the accuracy or completeness of the information contained within it and neither the authors nor BCS Learning & Development Ltd shall be responsible or liable for any loss or damage whatsoever arising by virtue of such information or any instructions or advice contained within this publication or by any of the aforementioned. Typeset by Lapiz Digital Services, Chennai, India.
vi
CONTENTS
List of figures and tables About the Author Foreword Acknowledgements Glossary Introduction
x xiii xv xviii xix xxv
PART 1 THE BASICS
1
1. WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION What is business analysis? Information and data The importance for a business analyst of understanding information needs The role of models in business analysis Data models and data Entity relationship modelling Class modelling Use of data models in business analysis What makes a good data model? Introducing data analysis
3 3 5 6 7 10 11 12 13 14 14
2. MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM Entities and objects Naming of entity types and object classes Introduction to relationships and associations Relationship notation in entity relationship models Association notation in UML class models Degrees of cardinality and optionality Multiple relationships and associations Recursive relationships and reflexive associations Exercises for Chapter 2
16 16 18 19 20 22 24 27 29 30
3. MODELLING MORE COMPLEX RELATIONSHIPS The problems with many-to-many relationships and associations Resolving entity relationship model many-to-many relationships Resolving class model many-to-many associations
32 32 33 35
vii
MODELLING BUSINESS INFORMATION
The ‘bill of materials’ structure Mutually exclusive relationships and associations Generalisation and specialisation in entity relationship models Generalisation and specialisation in class models Aggregation and composition Exercises for Chapter 3
4. DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS The model drawing process Identifying the entity types or the object classes Identifying the relationships or associations Drawing the initial diagram Validating the diagram Exercises for Chapter 4
36 39 41 43 46 48 50 50 51 53 54 56 63
5. RECORDING INFORMATION ABOUT THINGS 65 Revisiting entity types, object classes, relationships and associations 65 Introduction to attributes 66 The naming of attributes 69 Entity type, object class or attribute? 69 Unique identifiers 72 Domains74 The UML extended attribute notation 75 Showing operations on class models 77 Exercises for Chapter 5 79 6. RATIONALISING DATA USING NORMALISATION What is normalisation? The relational model of data The rules of normalisation Starting the normalisation process First normal form Second normal form Third normal form The third normal form data model Candidate keys, primary keys and alternate keys The relationship of normalisation to modelling Exercises for Chapter 6
81 81 82 84 85 86 89 90 94 95 95 96
PART 2 SUPPLEMENTARY MATERIAL
97
7.
OTHER MODELLING NOTATIONS The IDEF1X notation The Information Engineering notation The Chen notation Comparison of the notations
8. THE NAMING OF ARTEFACTS ON INFORMATION MODELS The naming of entity types or object classes The naming of domains viii
99 100 104 104 107 108 108 110
CONTENTS
The naming of attributes The naming of relationships in Ellis-Barker entity relationship models The naming of associations on UML class models
110 112 112
9.
INFORMATION MODEL QUALITY Genericity and specificity in models The nine characteristics of a good data model The six principles of high quality data models The five dimensions of data model quality The layout of models
114 114 116 118 120 121
10. CORPORATE INFORMATION AND DATA MODELS The problems Principles for the development of a corporate model
123 123 125
11.
DATA AND DATABASES 127 The data landscape 127 Databases130
12.
BUSINESS INTELLIGENCE The data warehouse The multidimensional model of data Dimensional modelling
139 139 140 141
13. ADVANCES IN SQL (OR WHY BUSINESS ANALYSTS SHOULD NOT BE IN THE WEEDS) The basics of SQL New SQL data types The future Implications for business analysts and information modellers
144 144 145 151 151
14. TAKING A REQUIREMENTS INFORMATION MODEL INTO DATABASE DESIGN First-cut database design stage Optimised database design stage
154 154 155
APPENDICES157 Appendix A: Table of equivalences 158 Appendix B: Bibliography 159 Appendix C: Solutions to the exercises 162 Index 172
ix
LIST OF FIGURES AND TABLES
Figure 1.1 Figure 1.2 Figure 1.3 Figure 1.4 Figure 1.5 Figure 1.6 Figure 1.7 Figure 1.8 Figure 1.9 Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 Figure 2.8 Figure 2.9 Figure 2.10 Figure 2.11 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 Figure 3.13 Figure 3.14 x
Three levels of system The relationship between data and information A rich picture A business activity model A business process model A use case diagram Requirements engineering in context An example entity relationship model using the Ellis-Barker notation An example of a UML class model The vehicle hire company using Ellis-Barker notation The vehicle hire company using UML class model notation A relationship in an entity relationship model An association in a UML class model The use of role names One-to-many (1:n) optional–mandatory relationship and association One-to-many (1:n) mandatory–optional relationship and association One-to-one (1:1) optional–mandatory relationship and association Many-to-many (m:n) optional–optional relationship and association Modelling the ‘one-way’ hire situation Employee supervision Employees and branches Introducing the ASSIGNMENT entity type Introducing the ASSIGNMENT object class Introducing the ASSIGNMENT association class Introducing products within products The bill of materials structure in Ellis-Barker notation The bill of materials structure in UML class model notation Employee supervision in a matrix organisation Employee supervision in a matrix organisation resolved The vehicle hire company as shown in Figure 2.1 The introduction of an exclusive arc The introduction of the {xor} constraint An example of a supertype–subtype hierarchy Alternative depiction of a supertype–subtype hierarchy
4 6 7 8 8 9 9 12 13 16 17 20 22 23 25 26 26 27 28 29 32 34 35 36 36 37 37 38 39 39 40 41 42 43
LIST OF FIGURES AND TABLES
Figure 3.15 A UML superclass–subclass hierarchy Figure 3.16 Alternative notation for a UML superclass–subclass hierarchy Figure 3.17 A UML class model with multiple superclass–subclass hierarchies Figure 3.18 Aggregation using Ellis-Barker notation Figure 3.19 An example of the use of the aggregation symbol in a UML class model Figure 3.20 An example of the use of the composition symbol in a UML class model Figure 3.21 Composition using Ellis-Barker notation Figure 4.1 The model drawing process Figure 4.2 A ‘relationship matrix’ Figure 4.3 The initial Ellis-Barker entity relationship model Figure 4.4 The initial UML class model Figure 4.5 The first data navigation path Figure 4.6 The second data navigation path Figure 4.7 The revised Ellis-Barker entity relationship model Figure 4.8 The revised UML class model Figure 4.9 Partial high-level process map Figure 4.10 Completed CRUD matrix Figure 4.11 The final Ellis-Barker entity relationship model Figure 4.12 The final UML class model Figure 5.1 The previous models Figure 5.2 Attribute types shown on an Ellis-Barker entity relationship model Figure 5.3 Attributes shown on a UML class model Figure 5.4 EMPLOYEE expanded (shown in Ellis-Barker entity relationship notation) Figure 5.5 EMPLOYEE expanded (shown in UML class modelling notation) Figure 5.6 Unique identifiers on an Ellis-Barker entity relationship model Figure 5.7 The UML class Figure 5.8 The UML extended attribute notation Figure 5.9 The UML operations notation Figure 6.1 Relational tables Figure 6.2 The staff record form Figure 6.3 Normalisation form completed to UNF Figure 6.4 Normalisation form completed to 1NF Figure 6.5 Normalisation form completed to 2NF Figure 6.6 Normalisation form completed to 3NF Figure 6.7 The third normal form data model Figure 7.1 The model of the business scenario in Ellis-Barker notation Figure 7.2 The model of the business scenario in UML class model notation Figure 7.3 The model of the business scenario in IDEF1X notation Figure 7.4 The model of the business scenario in Information Engineering notation Figure 7.5 The model of the business scenario using Chen’s notation
44 44 45 46 47 47 47 51 53 54 54 57 58 59 60 60 61 62 62 65 67 68 71 72 73 76 76 78 83 85 87 88 91 93 94 100 101 102 105 106 xi
MODELLING BUSINESS INFORMATION
Figure 7.6 Figure 9.1 Figure 9.2 Figure 9.3 Figure 9.4 Figure 11.1 Figure 11.2 Figure 11.3 Figure 11.4 Figure 11.5 Figure 11.6 Figure 11.7 Figure 12.1 Figure 12.2 Figure 12.3 Figure 12.4 Figure 13.1 Figure 13.2 Figure 13.3
Comparison of the relationship notations An example of the replacement of roles by entity types The generic to specific continuum The cost-balance of flexible design The five dimensions of data model quality The data landscape Example data arranged in tables and columns The database chronology Hierarchical database schema Hierarchical database occurrences Network database schema Network database occurrences A multidimensional data model A typical ‘star’ schema for a data warehouse A ‘snowflake’ schema A ‘galaxy’ schema The original ‘workshop’ model The third normal form model The final model
107 115 116 119 120 127 128 131 131 132 134 135 140 141 143 143 151 152 153
Table 4.1 Table 8.1 Table A.1
Identified entity types or object classes Examples of formal attribute names Table of equivalences
52 111 158
xii
ABOUT THE AUTHOR
Keith Gordon was a professional soldier for 38 years, joining the Army straight from school at 16 and retiring on his 55th birthday. During his service with the Royal Armoured Corps, the Royal Corps of Signals and the Royal Army Educational Corps (now the Educational and Training Services Branch of the Adjutant General’s Corps) he gained a Higher National Certificate in Electrical, Electronic and Telecommunications Engineering, a Certificate in Education from the Institute of Education of the University of London, a Bachelor of Arts from the Open University and a Master of Science in Design of Information Systems from Cranfield Institute of Technology. The Master of Science course, held at the Royal Military College of Science, was unclear about what sort of information system the students were supposed to be designing. Was it a business system to be used in the non-operational world of the military? Was it a command and control information system to be used on the battlefield? Was it a real-time system to be used in areas such as weapon control? Or was it a management information system? The course did, however, cover some really useful stuff. On the technical side this included programming in Ada and Coral-66, which are languages designed for embedded and real-time systems. We also studied Soft Systems Methodology (the academic lead for the course had researched for his doctorate at Lancaster University under the supervision of Professor Peter Checkland) and we looked, in particular, at the work of Professor Brian Wilson specialising in the application of Soft Systems Methodology to the development of information systems. The Structured Systems Analysis and Design Method (SSADM) (now called ‘Business System Development’ and the impetus for the Business System Development scheme of BCS which includes the Business Analysis and Solution Development diplomas and used to include a Data Management diploma) also formed a substantial part of the course. Following the Design of Information Systems course, Keith spent three years as a consultant in the Army School of Training Support, where he looked into and procured computer systems for use in education and training – computer-based training (CBT). This role was part researcher and part business analyst. The next two years were spent as the Senior Education Officer in the Army’s apprentice college for the training of apprentice soldier chefs. In 1992, he was posted to the Ministry of Defence and joined a new team of four officers and a civil servant ‘doing data management’ for the Army. In 1995, he was promoted to Lieutenant Colonel and became the head of that team until he retired from the Army in 1998. xiii
MODELLING BUSINESS INFORMATION
He is now an independent consultant and lecturer specialising in data management and business analysis. As well as developing and teaching commercial courses, he was for a number of years a tutor for the Open University, tutoring general computing and database courses in the undergraduate and postgraduate programmes. He is a Chartered Member of BCS, The Chartered Institute for IT, a Member of the Chartered Institute of Personnel and Development and a Fellow of the Institution for Engineering and Technology. He holds the Diploma in Business Systems Development specialising in data management from BCS – formerly the Information Systems Examination Board (ISEB) – and he is now an examiner for the Business Systems Development scheme. He represents the UK within the international standards development community by being nominated by the British Standards Institution (BSI) to the international standards committee, ISO/IEC JTC1 SC32 WG2 (Information Technology – Data Management and Interchange – Metadata). In this role, he has contributed to the development of ISO/IEC 11179 (Metadata registries) and ISO/IEC 19763 (Metamodel framework for interoperability). For a number of years, Keith was the secretary of the BCS Data Management Specialist Group and, as a founder member, was a committee member of the UK chapter of DAMA International, the worldwide association of data management professionals.
xiv
FOREWORD
Business analysts have a curiosity about the business environment. They are keen to understand how processes can be improved, how customers can be given better service and, ultimately, how their organisations can be successful. As analysts though, if we want to understand how things work and can be improved, we can’t look at processes alone. The other key dimension that underlies all of these aspects, and provides a firm foundation for the organisation’s work, is the information that makes the business operate. Information is the lifeblood of organisations and the people working within them. Let’s think about what information offers: yy evidence for root cause analysis; yy a basis for decisions; yy measures for evaluating performance; yy tangible indications of opportunities; yy parameters for applying business rules. Information can address all of these areas and more, and provide a means of challenging assumptions and opinions. Surely this is a much better approach than employing gut feel or inventing ideas to suit personal agendas. However, if businesses require information to operate effectively, they need a clear understanding of their data. If processes are to be efficient and effective, decision-making is to be precise and customer service is to be of the highest standard, the data needs to be accurate, accessible and available. Data is at the heart of business. It forms the basis for providing essential information, including: yy who our customers are and what work they have done with us; yy the nature and characteristics of our products and services; yy the details regarding our financial situation and staff. If organisations understand the importance of data, and work with it effectively, they can succeed in today’s world of high expectations and intense competition. If organisations fail to acquire, record, manage and utilise data, then business failure will surely follow.
xv
MODELLING BUSINESS INFORMATION
Aside from the effective operation of the enterprise, there are also the opportunities that data can clarify or make available to forward looking, receptive organisations. Data can be interpreted to offer information about the changing nature of the business environment. For example, new service requirements, the demographic make-up of customers, areas where product customisation is desired; all of these can provide opportunities for the organisation to learn and grow. We often talk about the learning organisation, but to become one relies on the receipt of good feedback (the data) and acting upon it (the processes). Over the last couple of decades though, it has felt as if data was a secondary dimension with process improvement taking centre stage. There has been a move to almost ignore data requirements within organisations and place the focus on the business processes and the customer experience. Those of us who have worked as data analysts, modellers and managers, have long feared the effect this approach could have, predicting that the impact of a reduced focus on data would eventually be recognised and hoping that it wouldn’t be at too high a cost. The advent of popular memes such as ‘big data’ has certainly brought data back to the forefront but there is also the issue of smaller, every day data - the data that makes the wheels go around rather than completely reinventing the wheel. This data may not help us to predict the innovations of the future but without it the organisation can’t operate and will fail to identify where change is required. A popular misconception has been that analysing data is ‘difficult’ and ‘technical’, and should be the responsibility of those working as software architects or developers rather than the business analysts engaging with business stakeholders. The reality that information and data reflect business requirements seems to have been lost somewhere along the way. Yet, if organisations are to ‘learn’ and benefit from receiving informed feedback in order that they can respond and grow, they have to understand that data is important and should be handled with care. There needs to be an appreciation that the data reflects the operations, policies and rules of the organisation and while these may be embedded within software, they originate from people making decisions – including business managers and staff, external customers and suppliers, and regulatory agencies. In other words, data is not a technical domain, it is something everyone needs to appreciate, and the analysis of data needs to be conducted by those with business understanding. There should be people within the organisation who have the expertise and insight to elicit data requirements, analyse the structure and semantics of data, build clear models of the data and manage the data resource. We are in an insecure world where there is increasing recognition of the importance of data and the need to ensure data security and protection. Which brings me to this book. Those of us who have long-lamented that we regularly encounter a limited data focus should congratulate Keith Gordon for providing such a comprehensive, clear and practical resource in this book. The topics covered take us through the process of eliciting, modelling and validating data. The key approaches to representing and understanding data are explained, including the often-overlooked topic of data normalisation. All in all, the book provides extensive guidance that the business analyst – and anyone else requiring an understanding of data analysis – needs if they are to work effectively with data. The book helps anyone new to the world of data to learn the techniques and principles behind successful data analysis. The breadth of the xvi
FOREWORD
book also helps those with experience in data analysis to encounter new ideas, brush up and broaden their knowledge, and deepen their understanding. Modelling Business Information encourages readers to understand that data is not just about modelling for the technical solution, it is concerned with understanding the organisation, the rules it applies, and the information it needs. In other words, data analysis is a business discipline and the work to understand data should be performed by those with a business mindset. Organisations require business analysts who can help the business staff to articulate data requirements and ensure that information needs can be met. Tomorrow’s business world needs data to be collected, governed and analysed in order to be an effective resource for organisations. This book helps organisations to do this. You should read it and use it as a key business resource. Debra Paul Managing Director, Assist Knowledge Development June 2017
xvii
ACKNOWLEDGEMENTS
Understanding and documenting the information needs of the business are an essential part of data management, so my real introduction to modelling these concepts and things came when I joined the Army’s data management team. Among the people to whom I owe a debt of gratitude are my colleagues in that team, Ian Nielsen, Martin Richley, Duncan Broad and Tim Scarlett, as well as the consultants who helped us develop our ‘corporate data model’ and design the resulting database, David Gradwell, Ken Allen, Ron Segal (who is now in New Zealand), Elaine Senior and the wonderful Harry Ellis who has done so much to help the world know how to understand and document information requirements through his pioneering work on data modelling and data management. Later members of that team included, among others, Mark Thurlow, Lucy Finney and Peter Lawson. Over the last 20 years or so I have had many interesting conversations about information and data modelling (yes, it is possible!) with, among others: Bob Walker and Gene Simaitis, both formerly of the Institute for Defense Analyses in Washington, DC; Mike Newton and Steven Self from the Open University; Hajime Horiuchi and Masao Okabe from Japan and Ray Gates from Canada, all colleagues in ISO/IEC JTC1 SC32 WG2; Matthew West, author of Developing High Quality Data Models, formerly of Shell and now also a colleague in ISO/IEC JTC1 SC32 WG2; David Hay from Houston, USA, author of many books including Data Model Patterns, Conventions of Thought, Requirements Analysis and UML and Data Modelling, a Reconciliation; and Alec Sharp from Canada, conference speaker and co-author of Workflow Modeling, Tools for Process Improvement and Application Development. Special mention must go to David Beaumont from Stehle Associates, my constant sounding board for my ideas over the last 18 years. Homing in on this book, I need to thank Terri Lydiard, a fellow BCS examiner, who reviewed an early draft of Chapter 1 and Keith Hare, of JCC Consulting in Granville, Ohio, USA and convenor of ISO/IEC JTC1/SC32/WG3, who reviewed Chapters 11 and 13. Both provided valuable comments which I have tried to incorporate. Thanks are also due to Ian Borthwick and Rebecca Youé of BCS who have been responsible for getting this book into print. Finally, a massive thank you to the back-up team at home, my wife, Vivienne. She has found it difficult to understand why I am not wandering around a golf course or sitting in an armchair by the fire instead of enjoying myself running around the world attending meetings, teaching, examining, writing books and generally getting involved in things because of my total inability to say, ‘No’. I have promised Vivienne that I will look up the word ‘Retirement’ in the dictionary one day – but not yet. xviii
GLOSSARY
aggregation (Class modelling) A special form of association that specifies a whole–part relationship between an object class representing the aggregate (whole) and another object class representing the component part. alternate key (Relational data analysis) A candidate key that is not selected to be the primary key for the relation. artefact A diagram or supporting description providing a representation of the system of interest. association (Class modelling) A business link between two object classes. The link is required in order to navigate from one class to another. association class (Class modelling) An object class that has both association and object class properties. attribute (Entity relationship modelling and class modelling) A named characteristic of an entity type or object class whose values serve to qualify, identify, classify, quantify or express the state of an instance of that entity type or object class. big data A data set, or a collection of data sets, with characteristics (for example, volume, velocity, variety, variability, veracity) that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/ traditional technologies and techniques in order to extract value. candidate key (Relational data analysis) An attribute, or a set of attributes, that provides the ability to uniquely identify a tuple in a relation without referring to any other data, such that no two tuples in a relation can have the same value, or set of values, for their candidate keys. cardinality (Entity relationship modelling and class modelling) The degree of occurrence indicated on a relationship between two entity types or an association between two object classes. The cardinality reflects part of the business rules for a relationship or association. CASE Acronym for computer-aided software engineering – a combination of software tools that assist computer development staff to engineer and maintain software systems, normally within the framework of a structured method. Chen notation (Entity relationship modelling) The original entity–relationship modelling notation. xix
MODELLING BUSINESS INFORMATION
class model (Class modelling) A technique from the Unified Modeling Language (UML). A class model describes, using graphics and documentation, the classes in a system and their associations with each other. Within business analysis the classes are limited to the things of significance about which information needs to be held in support of business operations. composite identifier (Entity relationship modelling) A unique identifier formed from a combination of attributes. composite key (Relational data analysis) A candidate key comprising more than one attribute. composition (Class modelling) A form of aggregation which requires that a part instance be included in at most one composite at a time, and that the composite object is responsible for the creation and destruction of the parts. column The logical structure within a table of a relational database management system (RDBMS) that corresponds to the attribute in the relational model of data. conceptual data model A detailed model that captures the overall structure of organisational data while being independent of any database management system or other implementation consideration – it is normally represented using entity types, relationships and attributes with additional business rules and constraints that define how the data is to be used. corporate data model A conceptual data model whose scope extends beyond one application system. data A reinterpretable representation of information in a formalised manner suitable for communication, interpretation or processing. data analysis The process of understanding and documenting in a data model the information (or data) requirements of a business or business area; data analysis is a part of business analysis. data mining The process of finding significant, previously unknown and potentially valuable knowledge hidden in data. data model (i) An abstract, self-contained logical definition of the data structures and associated operators that make up the abstract machine with which users interact (such as the relational model of data). (ii) A model of the persistent data of some enterprise (such as an entity–relationship model or class model of the data required to support a business or business area). data modelling The task of developing a data model that represents the persistent data of some enterprise. data type A constraint on a data value that specifies its intrinsic nature, such as numeric, alphanumeric or date. data warehouse A specialised database containing consolidated historical data drawn from a number of existing databases to support strategic decision-making. xx
GLOSSARY
database (i) An organised way of keeping records in a computer system. (ii) A collection of data files under the control of a database management system. database management system (DBMS) A software application that is used to create, maintain and provide controlled access to databases. described domain (Entity relationship modelling and class modelling) A domain that is specified by a description or specification, such as a rule, a procedure or a range (i.e. interval); a domain that is not enumerated. domain (Entity relationship modelling and class modelling) A named pool (or set) of values from which an instance of an attribute must take its value; a domain provides a set of business validation rules, format constraints and other properties for one or more attributes Ellis-Barker notation (Entity relationship modelling) A modelling notation designed by Harry Ellis and Richard Barker while working at the consultancy company CACI with business users in mind so as to reduce interactions with those users. This notation was later used by the Oracle Corporation and by the UK Government’s Central Computer and Telecommunications Agency (CCTA) for its Structured Systems Analysis and Design Method (SSADM). entity (Entity relationship modelling) A named thing of significance about which information needs to be held in support of business operations. entity occurrence (Entity relationship modelling) A single instance of an entity within an entity type. entity relationship model (Entity relationship modelling) A data model based on entity types and their attributes and relationships. entity subtype (Entity relationship modelling) A subset of the instances of an entity type, known as the supertype, that share common attributes or relationships distinct from other subsets of the supertype. entity type (Entity relationship modelling) An element of a data model that represents a set of characteristics common to a collection of entities that are instances of the type. enumerated domain (Entity relationship modelling and class modelling) A domain that is specified by a list of all its permitted values. first normal form (1NF or FNF) (Relational data analysis) A relation is in first normal form if all the values taken by the attributes of that relation are atomic or scalar values – the attributes are single-valued or, alternatively, there are no repeating groups of attributes. foreign key (Relational data analysis) One or more attributes in a relation that implement a many-to-one relationship that the relation has with another relation or with itself (the reference); the values of the foreign key in a tuple must match the values of the primary key in one of the tuples in the referenced relation. xxi
MODELLING BUSINESS INFORMATION
hierarchic identifier (Entity relationship modelling) A unique identifier where at least one element of the identifier is a relationship; a hierarchic identifier may be either a combination of relationships or a combination of attribute(s) and relationship(s). hierarchic key (Relational data analysis) A candidate key comprising more than one attribute where part of the candidate key is a foreign key. IDEF1X (Entity relationship modelling) An entity relationship modelling notation from the family of ICAM (Integrated Computer-Aided Manufacturing) Definition Languages (IDEF) used by the US Federal Government. information (i) Something communicated to a person. (ii) Knowledge concerning objects, such as facts, events, things, processes or ideas, including concepts, which have a particular meaning within a certain context. information Engineering notation (Entity relationship modelling) An entity relationship modelling notation that is one of the techniques used in Information Engineering, a methodology developed by James Martin and Clive Finkelstein in the late 1970s. master data management The authoritative, reliable foundation for data used across many applications and constituencies with the goal to provide a single version of the truth. metadata Data about data – that is, data describing the structure, content or use of some other data. multiplicity (Class modelling) A statement, consisting of a lower-bound (or minimum) and upper-bound (or maximum) of the form ‘minimum..maximum’, of the number of elements that may exist in a collection; when applied to an association it represents the cardinality and optionality of the association and when applied to an attribute it represents the optionality of the attribute. multimedia data Data representing documents, audio (sound), still images (pictures) and moving images (video). normal form (Relational data analysis) A state of a relation that can be determined by applying simple rules regarding dependencies to that relation. normalisation (Relational data analysis) Another name for relational data analysis. object (Class modelling) A construct within a system for which a set of attributes and operations can be specified; an instance of a particular object class. object class (Class modelling) A definition of a set of objects that share the same attributes, operations and associations. object-orientation A software-development strategy based on the concept that systems should be built from a collection of reusable components called objects that encompass both data and functionality.
xxii
GLOSSARY
object subclass (Class modelling) A subset of the instances of an object class, known as the superclass, that share common attributes and associations distinct from other subsets of the superclass. ODMG Abbreviation for the Object Data Management Group, a body that has produced a specification for object-oriented databases. OLAP Acronym for online analytical processing – a set of techniques that can be applied to data to support strategic decision-making. OLTP Abbreviation for online transactional processing – data processing that supports operational procedures. operation (Class modelling) A set of actions performed on the data within an object. optionality (Entity relationship modelling and class modelling) The ability of an instance of an entity type or object class to exist without being linked to an instance of the related entity type or object class. The optionality reflects part of the business rules for a relationship or association. permitted value (Entity relationship modelling and class modelling) One of the explicit set of values that comprise an enumerated domain. primary key (Relational data analysis) The candidate key that is selected to enforce uniqueness of tuples in a relation. RDBMS Abbreviation for relational database management system – a database management system whose logical constructs are derived from the relational model of data. Most relational database management systems available are based on the SQL database language and have the table as their principal logical construct. relation (Relational data analysis) The basic structure in the relational model of data – formally a set of tuples, but informally visualised as a table with rows and columns. relational data analysis A technique of transforming complex data structures into simple, stable data structures that obey the rules of relational data design, leading to increased flexibility and reduced data duplication and redundancy – also known as normalisation. relational model of data A model of data that has the relation as its main logical construct. relationship (Entity relationship modelling) A named set of characteristics common to a collection of connections between instances of two or more entity types, or between instances of one entity type and other instances of the same entity type. schema A description of the overall structure of a database expressed in a data definition language (such as the data definition component of SQL). second normal form (2NF or SNF) (Relational data analysis) A relation is in second normal form if it is in first normal form and every non-key attribute is fully dependent on the primary key – there are no part-key dependencies. xxiii
MODELLING BUSINESS INFORMATION
simple identifier (Entity relationship modelling) A unique identifier formed from a single attribute. simple key (Relational data analysis) A candidate key comprising just one attribute. SQL Originally, SQL stood for structured query language. Now, the letters SQL have no meaning attributed to them. SQL is the database language defined in the ISO/IEC 9075 set of international standards, the latest edition of which was published in 2016. The language contains the constructs necessary for data definition, data querying and data manipulation. Most vendors of relational database management systems use a version of SQL that approximates to that specified in the standards. structured data Data that has a high level of organisation in that it conforms to specified data types and relationships and is managed by technology that allows for querying and reporting, such as data within relational databases and spreadsheets. surrogate identifier (Entity relationship modelling) An artificial (i.e. not real world) unique identifier formed from an attribute or a combination of attributes that are either system-generated or allocated by a user. table The logical structure used by a relational database management system (RDBMS) that corresponds to the relation in the relational model of data – the table is the main structure in SQL. third normal form (3NF or TNF) (Relational data analysis) A relation is in third normal form if it is in second normal form and no transitive dependencies exist. tuple (Relational data analysis) A construct in the relational model of data that is equivalent to a row in a table or an occurrence of an entity – it contains all the attribute values for each instance represented by the relation. Unified Modeling Language (UML) A set of diagramming notations for systems analysis and design based on object-oriented concepts. unique identifier (Entity relationship modelling) An attribute, a combination of attributes, a combination of relationships or a combination of attribute(s) and relationship(s) that provides the ability for each entity to be uniquely identifiable so that each instance of an entity type is distinctly identifiable from all other instances of that entity type. unstructured data Computerised information which does not have a data structure that is easily readable by a machine, including audio, video and unstructured text such as the body of a word-processed document – effectively this is the same as multimedia data. validation rule (Entity relationship modelling and class modelling) A statement of the validation that may be applied to a described domain; this statement may be a reference to a data type to be applied to attributes, a range of values, or a ‘format mask’, or any other expression that constrains the domain.
xxiv
INTRODUCTION
In my previous book,1 I looked at how information, and its cousin, data, should be managed as an enterprise-wide resource. In this book, I am looking at the role of business analysts in understanding and documenting the information that needs to be recorded in an information system or its supporting information technology (IT) system to meet the needs of the business for the storage and retrieval of information. The first part of the book (Part 1, The Basics, Chapters 1 to 6) covers the requirements for the Data Analysis certificate that is part of the scheme for the BCS Advanced International Diploma in Business Analysis.2 The book will, therefore, be of immediate interest for anybody who is studying for this certificate. It should also be of interest to all business analysts as I have tried to set out how an entity relationship model (also known as an entity relationship diagram) or a UML class model (also known as a class diagram) can help a business analyst understand the information needs of a particular business area and then help communicate that understanding, both to the business users and, finally, to the systems developers. The second part of the book (Part 2, Supplementary Material, Chapters 7 to 14) provides extra information that I believe should be of interest to business analysts. These chapters are followed by three appendices. Appendix A provides a table to show the equivalence between the concepts used in the various parts of the book. Appendix B provides a bibliography and Appendix C provides solutions to the exercises introduced in Part 1.
1 2
Principles ofaData Management: Facilitating Information Sharing, Second Edition (BCS, 2013). http://certifications.bcs.org/category/18428
xxv
PART 1: THE BASICS
The first part of the book (Chapters 1 to 6), which provides a general introduction to entity relationship modelling and UML class modelling, covers the requirements for the Data Analysis certificate that is part of the scheme for the BCS Advanced International Diploma in Business Analysis. Chapter 1, Why business analysts should model information, provides an introduction to business analysis, systems, information, data and modelling and why these topics come together within the development of requirements for an IT system. The notations used within the book, the Ellis-Barker entity relationship notation and the UML class diagram notion, are introduced. The chapter finishes with a discussion of data analysis. Chapter 2, Modelling the things of interest to the business and the relationships between them, introduces the basic modelling concept of the entity to represent something of interest to the business about which information needs to be recorded and the related concept of the entity types, the representation of a group of entity occurrences with common characteristics. The relationships between entity types are also introduced. Alongside the introduction of these concepts, the comparable concepts of object, object class and association are also introduced. Chapter 3, Modelling more complex relationships, explores some of the more complex relationships that can exist between entity types or object classes. The topics covered are the resolution of many-to-many relationships and associations (including the oddity known as the ‘Bill of Materials’ structure), mutually exclusive relationships and associations, which leads to generalisation and specialisation, and, finally, a quick look at aggregation and composition. Chapter 4, Drawing and validating information model diagrams, introduces a process for drawing an information model diagram. It then considers two techniques for validating an information model – the data navigation path and the Create-Read-Update-Delete (CRUD) matrix. Chapter 5, Recording information about things, introduces the related concepts of the attribute, the unique identifier and the domain, and their representation on both entity relationship models and class models. The object-oriented concept of the operation is also introduced at the end of the chapter. Chapter 6, Rationalising data using normalisation, involves a change of direction as we look at the process of relational data analysis (or normalisation). We need to look at the theory of the relational model of data – the ‘model’ that underpins all of the database 1
MODELLING BUSINESS INFORMATION
management systems that use the SQL database language. Having understood the theory, we then look at the process of normalisation and the production of a ‘third normal form’ model. These chapters should be read sequentially, from Chapter 1 through to Chapter 6. Revision exercises are provided at the end of Chapters 2 to 6.
2
1 WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
This chapter provides an introduction to business analysis, systems, information, data and modelling and why these topics come together within the development of requirements for an IT system. The chapter finishes with a discussion of data analysis.
WHAT IS BUSINESS ANALYSIS? Business analysis is a discipline that has been evolving for about 20 years. Its main purpose is to ensure that there is alignment between business needs and business change solutions. Many of these business change solutions involve the development of new – or the enhancement of existing – information technology (commonly abbreviated to IT) systems. There is no fixed route to becoming a business analyst. Some business analysts have a strong information technology background and have developed an understanding of business in general and their business organisation in particular. Other business analysts have a strong business background and, where a solution involving information technology is concerned, they need to have obtained an understanding both of the capabilities provided by information technology and of how an information technology system is developed. The word ‘system’ appears in the two preceding paragraphs because it is important for the business analyst to grasp hold of ‘systems thinking’. Whether the proposed business change solution involves the use of information technology or not, the business analyst is working with or specifying the requirements for systems. These systems may be business systems, information systems or information technology systems. So, what is a system? Professor Michael C. Jackson of the University of Hull has defined a system as a complex whole the functioning of which depends on its parts and the interactions between those parts.3 Using this definition, the term ‘system’ can be applied to a hard, designed system such as a central heating system or to a soft, or human activity, system such as a business organisation. A central heating system consists of a boiler, radiators, pipes and, importantly, a thermostat to keep the whole system under control. This is a ‘complex whole’, the functioning of which depends on all of those parts working together. 3
Systems Thinking: Creative Holism for Managers (2003), page 3, Michael C. Jackson.
3
MODELLING BUSINESS INFORMATION
A business, whether in the private, public or not-for-profit sectors of the economy, consists of people (employees, suppliers and customers), organisations (headquarters, branches, departments) and processes (including ordering and receiving goods and selling goods). All businesses require information to manage their people, organisations and processes and, for most businesses these days, there is information technology providing support for the management of that information. So, a business can be seen as another ‘complex whole’, the functioning of which depends on the parts, the people, the organisations, the processes, the information and the technology, working together to achieve the goals of the business. There will also be checks and balances to ensure that the business remains effective and efficient – the equivalent of the thermostat in the central heating system. Any business can, therefore, be considered as a system – a business system. Any system can have a number of subsystems, so a business system can also have subsystems. One of the important subsystems of a business system is the information system, or set of information systems, which supports the business by managing the business’s information. I define an information system as: a system that gets the right information to the right person in the right place at the right time. We need, therefore, to think about information systemically. If there is a requirement for Sue in production to receive details of an important sales order as soon as it arrives in the business, then we need to arrange for that to happen. It could be that the arrangement is for the salesman, John, who has just completed the sale, to walk along the corridor to production to tell Sue about the sales order. The right information (details of the sales order) is being delivered to the right person (Sue) in the right place (the production department) at the right time (immediately) without the use of any technology. We have a technology-free information system! Yes, that is possible, but that information system still needs defining, developing, implementing and maintaining. Most modern businesses require the information system to be supported by an information technology system – a collection of hardware, software and networks that function together to store and retrieve information. The business analyst needs to think in terms of three levels of system: the business system itself; the subsystem that handles the information for the business (the information system); and the sub-subsystem that provides the technology to support the information system (the IT system). This is shown diagrammatically in Figure 1.1.
Figure 1.1 Three levels of system BUSINESS SYSTEM INFORMATION SYSTEM IT SYSTEM
4
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
Information is a key business resource in all business, even if the senior managers of most businesses fail to recognise that fact.
INFORMATION AND DATA In my previous book, I explained the relationship between information and data. I repeat that explanation here, slightly edited, because this book is about modelling business information with a view to storing that information as data within an information system or an information technology system. An often-heard definition of information is that it is ‘data placed in context’. This implies that some information is the result of the translation of some data using some processing activity, and some communication protocol, into an agreed format that is identifiable to the user. In other words, if data has some meaning attributed to it, it becomes information. For example, what do the figures ‘190267’ represent? Presented as ‘19/02/67’, it would probably make sense to assume that they represent a date. Presented on a screen with other details of an employee of a company, such as name and address, in a field that is labelled ‘Date of Birth’ the meaning becomes obvious. Similarly, presented as ‘190267 metres’, it immediately becomes obvious that this is a long distance between two places but, for this to really make sense, the start point and the end point have to be specified as well as, perhaps, a number of intermediate points specifying the route. While these examples demonstrate the relationship between data and information, they do not provide a clear definition of either data or information. There are many definitions of data available in dictionaries and textbooks, but the essence of most of these definitions is the understanding that data is ‘facts, events, transactions and similar that have been recorded’. Furthermore, the definition of information is usually based on this definition of data. Information is seen as data in context or data that has been processed and communicated so that it can be used by its recipient. The idea that data is a set of recorded facts is found in many books on computing. However, this concept of data as recorded facts is used beyond the computing and information systems communities. It is, for example, also the concept used by statisticians. Indeed, the definition of data given in Webster’s 1828 Dictionary – published well before the introduction of computers – is ‘things given, or admitted; quantities, principles or facts given, known, or admitted, by which to find things or results unknown’.4 However, starting the development of our definitions by looking at data first appears to be starting at the wrong point. It is information that is important to the business, and it is there that our definitions, and our discussion about the relationship between information and data, should really start. We start by considering the everyday usage of information – something communicated to a person – and, with that, we can find a definition of data that is relevant to business analysts. That definition is found in ISO/IEC 2382-1 1993 (Information Technology – Vocabulary – Part 1: Fundamental Terms) stating that data is 4
See www.webstersdictionary1828.com
5
MODELLING BUSINESS INFORMATION
a re-interpretable representation of information in a formalised manner suitable for communication, interpretation or processing. There is a note attached to this definition in the ISO/IEC standard which states that data can be processed by human or automatic means; so, this definition covers all forms of data but, importantly, includes data held in information systems used to support the activities of an organisation at all levels: operational, managerial and strategic. Figure 1.2 The relationship between data and information Knowledge about objects, etc.
Subject of information
Information
Information
Representation of information
Interpretation of data Storage and Processing
Data
Data
Figure 1.2 provides an overview of the relationship between data and information in the context of an information technology system. The user of the system extracts the required information from their overall knowledge and inputs the information into the system. As it enters the system, it is converted into data so that it can be stored and processed. When another system user requires that information to be retrieved, the data is interpreted – that is, it has meaning applied to it – so that it can be of use to the user.
THE IMPORTANCE FOR A BUSINESS ANALYST OF UNDERSTANDING INFORMATION NEEDS Understanding the information needed by the business, and its representation, data, is vitally important if we are to develop effective information systems and the information technology systems to support them. In the BCS publication Business Analysis, now in its third edition, the need for a business analyst to take a holistic view of the business is stressed, where the holistic view is defined as encompassing people, organisations, processes, information and technology (my emphasis). Yet, as an examiner, 6
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
I am constantly coming across practising or aspiring business analysts who believe that understanding and documenting the information needed by the business is nothing to do with them. There appears to be a view that if the processes are sorted out the information will look after itself. I think this is wrong. We are, after all, concerned with information systems or information technology systems, not processing systems.
THE ROLE OF MODELS IN BUSINESS ANALYSIS Models in various forms play an important role in business analysis. As described later in the chapter, models help the business analyst understand and communicate requirements. Modelling is an essential competence for a business analyst. When trying to understand how a business is currently running, we could draw a rich picture (see Figure 1.3).
Figure 1.3 A rich picture
Manufacturing Ma
£ ££ External Suppliers
Sales Director
We are the experts
Managing Director
Finance Director
Call Centre Sales Force IT/IS Department
Trouble brewing
? New CRM System
Old Order Processing System
Where’s our order?
Customers
When trying to understand what a business should be doing, we can draw a business activity model (see Figure 1.4). Both of these valuable techniques are derived from Peter Checkland’s Soft Systems Methodology.5 5
Checkland, P. (1981) Systems Thinking, Systems Practice. John Wiley & Sons, Chichester, UK provides a useful insight into the use of Soft Systems Methodology. For a shorter read try Checkland, P., Poulter, J. (2006) Learning for Action: A Short Definitive Account of Soft Systems Methodology and its use for Practitioners, Teachers and Students, John Wiley & Sons, Chichester, UK.
7
MODELLING BUSINESS INFORMATION
Figure 1.4 A business activity model Define target customers Decide seasonal fashion range Manage staff
Define sales targets
Procure stock
Manage branch network
Distribute stock to branches
Sell fashion
Monitor sales volume
Monitor profit margins
Monitor staff performance
Take control action
When trying to understand the current business processes we will probably draw a set of ‘as-is’ business process models (see Figure 1.5). We will then draw a series of ‘to-be’ business process models and discuss those with the business. Figure 1.5 A business process model
Sales Ledger
Warehouse
Sales Administration
Salesman
Fulfil Order
8
Return order to client
Receive order
Check customer status
[status not satisfactory]
[status satisfactory]
Record Order
Despatch goods
Despatch invoice
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
As we move on to consider the functionality that an information technology system has to support, we might start drawing a use case diagram (see Figure 1.6). Figure 1.6 A use case diagram Sales Order Fulfilment
Salesman
Despatch goods
Receive order
Warehouse Team
Despatch invoice
Record order
Check customer status
Sales Administrator
Record payment
Sales Ledger
All of these models have two main roles. The first of these roles is to help the analyst understand the situation they are analysing as they carry out the analysis. Secondly, once developed, they help the analyst communicate that understanding back to the business (and, in the process, maybe demonstrate their misunderstanding, leading to a correction to the model) and, in the case of an information technology system development or enhancement, forward to the developers of that system. When an information technology system is being developed, the process of eliciting the requirements from the business, analysing those requirements to ensure that they are good quality requirements, validating the requirements with the business, managing and documenting the requirements is known as ‘requirements engineering’. This is shown in context in Figure 1.7.
Figure 1.7 Requirements engineering in context Models used here to communicate with the business
Business Information Requirements
Models used here to communicate with the developers
Requirements Engineering
System Development
As you can see, we can think of requirements engineering as the filling in a sandwich, with the business and its requirements on one side and the system developers on the other. 9
MODELLING BUSINESS INFORMATION
When eliciting requirements from the business, models can be used in workshops or other interactions with the business to aid understanding. It can sometimes be useful to sketch out a model on a whiteboard or flipchart while talking to the users. Models are vital when you are asking the business to validate a set of requirements – ‘a picture paints a thousand words’. Models are also essential when passing, or discussing, requirements to, or with, those who will have responsibility for the development of the system. It is not just that ‘IT people’ like to see models. Models will often express ideas and requirements much more clearly than is possible in text alone. The trick is, of course, to use the same models when validating requirements with the business and when passing those requirements to the developers. ‘Business people’ do not have the time to learn complicated modelling notations and syntax. For them the models must be easy to understand. On the other hand, the developers want the models to be complete, clear, concise and unambiguous, and this can lead to the use of a complex set of notational elements. It can also, in some circumstances, lead to the use of some particular technical constructs that, from a business perspective, are unnecessary. If we want to use the same models to communicate with the business and to communicate with the developers, we need to use modelling notations and conventions that are both easy for the business to understand and sufficiently detailed to completely, concisely, clearly and unambiguously convey the requirements to the system developers.
DATA MODELS AND DATA We have seen a range of models that are in the business analyst’s toolkit. None of those we have seen so far truly helps us to understand what information a system needs to hold to enable it carry out its functions. These information requirements are modelled with a model called, confusingly, a data model. In fact, data models can have two roles. Firstly, they can be used to specify what data (or information) the business needs recorded within the system. Here the model is a complete, concise and unambiguous statement of the information requirements of the business for the system under consideration. This model is the responsibility of the business analyst. Secondly, they can also be used to specify how the data is to be stored and organised within the system, so that it can be retrieved and analysed. Here the model is a specification for the design of the database of the system. This model (or, more probably, a set of models) is the responsibility of the database designer within the system development team. The what model is a conceptual model, which is also known as a Computer-Independent Model (CIM). It represents the things that are of interest to the business about which information needs to be recorded, the specific information about these things that the business needs recorded, and any business relationships that exist between those things of interest. A model at this level can be considered as encapsulating the rules of the business, for example:
10
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
yy ‘people or organisations cannot be recorded as customers until they have placed an order’; yy ‘customers can place many orders’; and yy ‘customers must be allocated a customer number and we must record their name and address’. The system development team then develop a series of how data models that lead to the design of the system’s database. The first model the developers will produce will be a logical model (often known as a Logical Data Model (LDM) or a PlatformIndependent Model (PIM)) and the final model will be a model that represents the actual physical design of the database (known as a Physical Data Model (PDM) or a Platform-Specific Model (PSM)).
ENTITY RELATIONSHIP MODELLING Data models have been around for well over 50 years. The earliest data modelling notation that I know of was for Bachman Diagrams. This notation was developed by Charles Bachman, one of the early database management system pioneers, to show the structure of a required database in the days before the advent of the modern ‘relational’ database. Entity relationship modelling, until recently the most common form of data modelling, was first introduced by Peter Chen.6 There are, however, many ‘flavours’ of entity relationship model. In this book, I am going to stick to just one of these entity relationship modelling notations: the notation developed by Harry Ellis and Richard Barker in the early 1980s when they worked for CACI, a UK-based consultancy company. Unsurprisingly, we will refer to this as the Ellis-Barker notation. I will, however, discuss some other common notations in Chapter 7. The Ellis-Barker notation was specifically designed with business users in mind. In fact, in Richard Barker’s own words, they were ‘striving for even greater accuracy in systems analysis, while minimising redundant interactions with the users’.7 The notation was used by the Oracle Corporation in its computer-aided software engineering (CASE) tool and was later adopted, in a truncated form, by the UK Government’s Central Computer and Telecommunications Agency (CCTA) for its Structured Systems Analysis and Design Method (SSADM). Richard Barker described the use of this notation in a book8 he wrote while working for the Oracle Corporation. An example of an entity relationship model drawn using the Ellis-Barker notation is shown in Figure 1.8. This model shows that the business concerned is interested in recording information about its products, its customers and the orders placed by those customers. Each of those orders has a number of ‘order lines’ (the items on the order) and a number of statuses. Chen, P.P.S. (1976) The Entity–Relationship Model: Toward a Unified View of Data, ACM Transactions on Database Systems. In the Foreword to Hay, D.C. (1996) Data Model Patterns: Conventions of Thought. Dorset House. 8 Barker, R (1990) CASE*Method: Entity Relationship Modelling, Addison-Wesley. 6 7
11
MODELLING BUSINESS INFORMATION
Figure 1.8 An example entity relationship model using the Ellis-Barker notation ORDER LINE (m) number (m) ordered quantity
PRODUCT
order for subject of
(m) code (m) designation (m) retail cost
part of
comprised of
ORDER STATUS (m) designation (m) effective date
ORDER
for described with
(m) number (m) date
CUSTOMER
placed by placer of
(m) number (m) name (m) address
There are a number of advantages to using the Ellis-Barker notation within business analysis. Specifically, the notation: yy was designed for use with business users – it names things in a way that the business will understand; yy avoids the use of technical components that have no relevance to the business user and, if included (as in most other notations), would confuse the business user; yy provides a limited, consistent set of symbols; yy with its supporting documentation, provides a complete, concise, clear and unambiguous statement of the information requirements. In addition, the Ellis-Barker notation is well known in the United Kingdom, having been adopted for use within the Structured Systems Analysis and Design Method (SSADM). Although no specific notation is mentioned in the syllabus for the BCS Data Analysis certificate, it is, I believe, the notation that the originators of that syllabus had in mind for entity relationship modelling.
CLASS MODELLING In the late 1980s and early 1990s there were a number of disparate modelling initiatives devised to cope with the design of systems based on the object-oriented programming paradigm. In the mid-1990s three proponents of their own notations, Grady Booch, Ivar Jacobson and James Rumbaugh (known as the three amigos) came together to create the Unified Modeling Language (UML)TM. In 1997 UML was adopted as a standard by the Object Management Group (OMG),9 and in 2005 UML was also published by the International Organization for Standardization (ISO)10 as an approved ISO standard. The current version of the UML standard includes 13 diagram types, but we are only interested in one of
9 See www.omg.org/ 10 See www.iso.org/
12
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
those – the ‘class diagram’, which is used to model information and data. We will refer to the models developed using this diagramming notation as ‘class models’. An example of a UML class model is shown in Figure 1.9. This example uses the same business scenario as that shown in Figure 1.8 using the Ellis-Barker notation.
Figure 1.9 An example of a UML class model CUSTOMER number [1..1] name [1..1] address [1..1]
ORDER STATUS
ORDER 1..1
1..* places
number [1..1] date [1..1]
1..1
0..* designation [1..1] effective date [1..1] describes
1..1 part of 1..*
PRODUCT code [1..1] designation [1..1] retail cost [1..1]
ORDER LINE 1..1
0..* number [1..1] ordered quantity [1..1] order for
USE OF DATA MODELS IN BUSINESS ANALYSIS Because this book is aimed at business analysts the focus will be on Computer Independent (or conceptual) Models – the models where the ‘what’, the information or data that is required, is specified. There will, however, be some consideration of the other, ‘how’, models. The shape and form of the ‘how’ models will be determined by the database management system that is to be used to manage the database. There are many types of database management systems available and that means that the business analyst should not assume any particular type of database when developing the ‘what’ model as part of requirements engineering. The job of the business analyst when modelling is to concentrate on developing a model that represents the information that the business needs to be recorded; the business analyst should not be including anything in that model that depends on a specific implementation. The example models at Figures 1.8 and 1.9 might give the impression that information or data modelling is a simple task leading to relatively simple databases. But these are just simple examples. In practice, information or data models can be very large – I have seen some models that completely cover the walls of a six-person office. Developing an information or data model is not a trivial task. Having said that, information or data modelling should be part of the ‘consultancy toolkit’ of all business analysts. Developing an information or data model does not mean that we are just documenting in ‘boxes and lines’ what we find in the business. We need to apply the business analyst’s 13
MODELLING BUSINESS INFORMATION
enquiring mind to analyse what we find. As with all requirements elicitation, as we develop the model we are going to find that there are unanswered questions that need to be answered. Through asking questions of the business about the model we can gradually refine the model so that the end result is useful to the business, and allows the needs of the business and its systems to be met. Developing an information or data model is an iterative process that helps us to understand the information that the business needs to record. This process also helps us uncover the business rules and, more importantly, any exceptions to the rules that need to handled.
WHAT MAKES A GOOD DATA MODEL? The simple answer to this is that a good model should express the totality of the information requirements of the business clearly, concisely and unambiguously. As with any set of requirements, all of the requirements should be included, there should be no overlapping or conflicting requirements, and no requirements should be hidden within other requirements. When modelling the processes of the business, the business analyst will think in terms of two sets of models: the ‘as-is’ models and the ‘to-be’ models. The ‘as-is’ models are the result of pure analysis: the documentation of the current situation using boxes and lines. When developing the ‘to-be’ models the business analyst is straying from pure analysis into the field of synthesis or design, albeit using the same set of box and line constructs as for the ‘as-is’ models. The ‘as-is’ models are developed to help us come up with the ‘to-be’ models. The final form of the ‘to-be’ models (the designs) will depend very much on the experience and creativity of the business analyst who is doing the modelling. When modelling information or data the same is true, although we do not normally produce an ‘as-is’ information or data model. The only time we would produce an ‘asis’ model is when we are reverse-engineering an existing database or carrying out relational data analysis on existing documents, reports or input screens. Even so, the purpose of the reverse-engineering or relational data analysis is to influence a model of the information requirements to be met by a future information system (or set of information systems). Such a model is a ‘to-be’ model. All information models that are the output of the requirements engineering process can, therefore, be considered as the start of the design process for the future information system or systems. The experience and creativity of the modeller will impact not only on the model itself but also on the design of any future database developed from the model. We will look at the subject of data model quality in more detail in Chapter 9.
INTRODUCING DATA ANALYSIS ‘Data analysis’ is a difficult term because it has two meanings. One meaning of the term is the analysis of an existing collection of data to find patterns, trends or hidden information. The result is some insight that will be useful to the business. 14
WHY BUSINESS ANALYSTS SHOULD MODEL INFORMATION
The other meaning is the analysis of a business domain to understand the information or data that needs to be recorded in an information system to meet business needs. The result is a data model that may lead to a database design. While some business analysts may be involved in analysing data under the first of those meanings, especially when that data could be used in strategic decision-making or as input into a business case, it is the second meaning that is of interest to us in this book. Business analysis helps us understand business requirements. Some of these requirements are information (or data) requirements. Data analysis helps us understand those information requirements, probably the most important of the overall business requirements for an information system. Data analysis is not separate from business analysis; it is an essential part of business analysis. Like many other analysis or modelling activities, data analysis and modelling can be approached from a top-down perspective or from a bottom-up perspective. The top-down approach involves starting with a blank sheet of paper (or, preferably, a clean whiteboard) and using an appropriate set of requirements elicitation techniques (interviews, workshops, observation, among others) to find out what information the business needs to achieve its goals. This then forms the basis of the information model, whether it is an Ellis-Barker entity relationship model or a UML class model. We will look at the development of Ellis-Barker entity relationship models and UML class models in Chapters 2 to 5. The bottom-up approach involves looking at existing ‘data sources’ – which could be existing databases, screens and reports for existing information systems or, in a paperbased system, the documents and records that are maintained – to build a model that represents the known information requirements. Business analysts should use both approaches and then compare the results. They will seldom match. Some detail, absolutely vital to the business, may have been missed when looking at things from the top down. Some new requirements not handled by the current system will probably have been missed when looking at things from the bottom up. Almost certainly, the analyst will need to develop a model that is a composite of the top-down model and the bottom-up model. When taking a bottom-up approach, we can use a formal technique called relational data analysis (which is also called normalisation) or we can just informally use our intuition. Most data modellers will use the formal technique when they start data modelling and then move to do things informally when they are more experienced. We will look at relational data analysis in Chapter 6.
15
2 MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
This chapter introduces the basic modelling concept of the entity, to represent something of interest to the business about which information needs to be recorded, and the related concept of the entity type, the representation of a group of entities with common characteristics. The relationships between entity types are also introduced. Alongside the introduction of these concepts, the comparable concepts of object, object class and association are also introduced.
ENTITIES AND OBJECTS Put simply, an information or data model is a model of the things of interest to a business about which information needs to be recorded. In Figures 2.1 and 2.2 are two models showing some of the things of interest to a vehicle hire company about which that company will need to record information. They show that the things of interest are the actual vehicles that the company has to hire, the ‘vehicle types’ that describe those vehicles, the people who are the hirers and the agreements made for the hire of the vehicles. Figure 2.1 is drawn using the Ellis-Barker entity relationship notation. Figure 2.1 The vehicle hire company using Ellis-Barker notation HIRE AGREEMENT
PERSON
made with hirer in
VEHICLE
made for hired through
described by description of
VEHICLE TYPE
16
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
Figure 2.2 is drawn using the UML class model notation.
Figure 2.2 The vehicle hire company using UML class model notation
Both notations are doing the same thing but in slightly different ways. In entity relationship modelling, the ‘things of interest to the business about which information needs to be recorded’ are known as entities. In UML class modelling, the ‘things of interest to the business about which information needs to be recorded’ are known as objects. These things can be tangible things such as people, places, buildings, equipment or assets. They can also be more abstract concepts such as contracts, agreements or transactions. In modelling, however, we are not thinking about the individual entities or objects. Instead we are thinking about the group or class of things that have the same characteristics. In entity relationship modelling, a group or class of entities that have the same characteristics is known as an entity type. The individual entities within the entity type are known as entity occurrences. In class modelling, a group or class of objects that have the same characteristics is known as an object class. For example: yy The vehicle with vehicle registration number KW64 CNV (an entity occurrence or an object) is an instance of the entity type or object class VEHICLE. 17
MODELLING BUSINESS INFORMATION
yy Miss Patricia Johnson (a second entity occurrence or object) who wishes to hire a vehicle is an instance of the entity type or object class PERSON. yy The agreement made with Miss Patricia Johnson to hire KW64 CNV (a third entity occurrence or object) is an instance of the entity type or object class HIRE AGREEMENT. There are many things or concepts within the real world about which information may be kept. If an organisation captured information on everything that existed, that organisation would find itself swamped with information, most of which it would never use. As business analysts, we are only interested in representing as entity types or object classes on models those things about which the organisation needs to keep information to support its role or purpose. In an enterprise whose core business is the hire of vehicles, any system to support that core business will need to record information about its vehicles, its hirers and the hire agreements. It will also need information about the suppliers of its vehicles. If it is a national or international enterprise it will also need to keep information about its branches and its employees. As it is hiring out vehicles it will need to maintain those vehicles. If that is carried out in-house it will need to keep information about the facilities and the mechanics. If the maintenance is out-sourced it will need to keep information about the garages or other maintenance facilities it uses. To support all of this, the company will have a number of accounts and information will be needed about the transactions passing through those accounts. The company that manufactures the vehicles will have a different set of things about which it needs to keep information: its products (the vehicles), the parts and raw materials that are used to manufacture the vehicles, its stock levels, the scheduling of work on its production line, its organisational structure, its employees and its accounts and transactions.
NAMING OF ENTITY TYPES AND OBJECT CLASSES For a model to be meaningful to the business users, the names of our entity types and object classes must also be meaningful to the business. To help with the communication with the business, these names should use the terminology of the business and should describe the thing of interest about which we will be storing information (the ‘what’). Names that describe ‘how’ we are storing that information, and which could end up being the solution, should be avoided. By convention they are always singular nouns (or noun phrases). Examples of acceptable entity type and object class names are: yy VEHICLE (not VEHICLES); yy VEHICLE TYPE (not TYPE OF VEHICLE); yy PERSON (not PEOPLE nor PERSON FILE nor PERSON TABLE nor PERSON RECORD); yy HIRE AGREEMENT (not HIRE AGREEMENTS nor HIRE HISTORY).
18
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
Most modellers show entity type names and object class names using upper case (HIRE AGREEMENT). Others use mixed case (Hire Agreement). Either is acceptable providing a consistent approach is maintained. On a Computer-Independent Model, or conceptual model, business analysts should use standard English with normal spaces. Naming using ‘technical’ formats, such as those that use underscores (HIRE_AGREEMENT or Hire_Agreement) or camel case (HireAgreement), should not be used on conceptual models as they do not follow normal business conventions. These technical approaches to naming should be the preserve of the system developers. While it is important to use names that are meaningful to the business, consistency in naming is also very important. As business users are presented with a number of models they will find it easier to understand those models if there is a consistent approach to naming.
INTRODUCTION TO RELATIONSHIPS AND ASSOCIATIONS Entity types and object classes represent business concepts that are the things about which the business needs to keep information. Within the business these ‘things’ will be associated in ways that are specific to that business. In our vehicle hire business, vehicles are associated with hire agreements and hire agreements are, conversely, associated with vehicles. Also, hire agreements are associated with the person who is hiring the vehicle and those people are, conversely, associated with hire agreements. These associations between our entity types or our object classes are important to the business as they can be considered as a reflection of the rules, conditions or constraints of that business. During the development of the model the business analyst must discover these associations between entity types and object classes. These associations form an important part of the information that needs to be recorded about each of the ‘things’ that we have identified as entity types or object classes. The choice of entity types and object classes is restricted to those things about which the business is interested (and only those things) and about which information needs to be recorded. The same applies to relationships and associations. There may be many ways in which the business concepts that are represented by entity types or object classes are related in the real world, but the only ones that should be included in the model are those that are of sufficient interest to the business that information about those relationships needs to be recorded. In an entity relationship model an association between two entity types is called a relationship. In a class model a relationship between two object classes is called an association. There is no conceptual difference between a relationship in an entity relationship model and an association in a class model. They are the same thing; the only difference is in the notation.
19
MODELLING BUSINESS INFORMATION
RELATIONSHIP NOTATION IN ENTITY RELATIONSHIP MODELS Figure 2.3 shows a part of our earlier entity relationship model. The line drawn between the entity types HIRE AGREEMENT and VEHICLE represents a relationship between those entity types.
Figure 2.3 A relationship in an entity relationship model HIRE AGREEMENT
VEHICLE
made for hired through
These models need to be interpreted both by business people, who are required to negotiate or approve the information requirements to be met by the system, and by technical people, who have to implement the system. It is important that the models are interpreted unambiguously and an important contribution to this unambiguous understanding is to have a formal method of ‘reading’ these relationships. For example, consider the relationship that is introduced in Figure 2.3. Reading this relationship from right to left – from VEHICLE to HIRE AGREEMENT – we have the sentence: Each VEHICLE may be hired through one or more HIRE AGREEMENTS Here text formatting, CAPITALS, underlining and italics, are used in this sentence to indicate the different elements of the sentence, which is constructed using the following rules: yy The word ‘Each’ is used because the box with the word ‘VEHICLE’ inside it is an entity type (that is, it represents all instances of the type – all the vehicles), but we want to refer to a single instance of the type (that is, a single vehicle). yy ‘Each’ is followed by the name of the entity type at the end from which we are starting the sentence – in this case, VEHICLE. yy The term ‘may be’ is used because not every vehicle has to be hired (some might be just purchased and not yet hired) – this is represented on the diagram by a dashed line at the VEHICLE end of the relationship; a dashed line is always read as ‘may be’. yy ‘hired through’ comes from the name of the relationship (which is also known as the ‘link phrase’) at the VEHICLE end of the relationship. yy The term ‘one or more’ is used because there is an inverted three-pronged arrow head (known as a crow’s foot) at the HIRE AGREEMENT end of the relationship; a crow’s foot is always read as ‘one or more’. yy The sentence ends with the name of the entity type at the end of the line; we make it plural so that the sentence reads easily, in this case, HIRE AGREEMENTS.
20
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
Each relationship should also be read in the opposite direction. Reading the relationship from left to right – from HIRE AGREEMENT to VEHICLE – we have the sentence: Each HIRE AGREEMENT must be made for one and only one VEHICLE This sentence is constructed as follows: yy ‘Each’ is used because we want to refer to a single instance of the type. yy HIRE AGREEMENT comes from the name of the left-hand entity type. yy The term ‘must be’ is used because every hire agreement has to be made for a vehicle – this is represented on the diagram by the solid line at the HIRE AGREEMENT end of the relationship and a solid line is always read as ‘must be’. yy ‘made for’ comes from the name (link phrase) of the relationship at the HIRE AGREEMENT end of the relationship. yy The term ‘one and only one’ is used because there is no crow’s foot at the far end, the VEHICLE end, of the relationship. The absence of a crow’s foot is always read as ‘one and only one’ – in this case we are making an assumption that there is a separate hire agreement for each vehicle that is hired (which should, of course, be confirmed with the business). yy VEHICLE comes from the name of the right-hand entity type. In data modelling parlance, this relationship is known as a one-to-many relationship. A vehicle can be associated with many hire agreements (Each VEHICLE may be hired through one or more HIRE AGREEMENTS) but a hire agreement can only be associated with one vehicle (Each HIRE AGREEMENT must be made for one and only one VEHICLE) through this relationship. The approach to the naming of relationships employed with this notation has advantages when this notation is used in models developed during business analysis. Firstly, to be able to interpret a model the business user has only to remember four facts: yy A solid line is always read as ‘must be’. yy A dashed line is always read as ‘may be’. yy The presence of a crow’s foot is always read as ‘one or more’. yy The absence of a crow’s foot is always read as ‘one and only one’. Secondly, this approach provides a complete, concise and unambiguous statement of the nature of the relationship. This enables each relationship to be considered as a representation of a business rule. A common failing is to produce a data model without naming the relationships. The names are there to describe the ‘nature’ of the relationship: a vehicle is hired through a hire agreement. A relationship without a name makes no more sense than an entity type without a name. Everybody involved in data modelling recognises that entity types should be named 21
MODELLING BUSINESS INFORMATION
(otherwise you have an empty box on your model – ‘What does that represent?’), yet models with unnamed relationships are quite common. Without naming relationships, there is a distinct possibility that the reader’s understanding of the nature of a relationship will differ from the modeller’s understanding of the nature of that relationship.
ASSOCIATION NOTATION IN UML CLASS MODELS The association in the class model in Figure 2.4 represents the same relationship.
Figure 2.4 An association in a UML class model VEHICLE
1..1 is hired through 0..*
HIRE AGREEMENT
In the same way that a relationship on an entity relationship model indicates that the business sees some business association between entities that are instances of the entity types involved in the relationship, an association on a class model indicates that the business sees some business relationship between objects that are instances of the object classes involved in the association. They are identical concepts. However, there the similarity ends as the only common feature of both the association and the relationship is the line, with an association shown as a line joining the two object classes. As with a relationship on an entity relationship model, it is good practice to give a name to an association on a class model, with this name representing the business nature of the association. The association joining the VEHICLE object class to the HIRE AGREEMENT object class is named ‘is hired through’. This name is annotated with a small triangle that points in the direction that this name should be read. Since the triangle points towards the HIRE AGREEMENT object class it shows that a vehicle is hired through a hire agreement. Note that in entity relationship modelling the convention is to use prepositional phrases (such as ‘composed of one or more PARTS’) or gerund phrases (such as ‘referencing one and only one PERSON’) that can follow ‘may be’ or ‘must be’, whereas the convention on class models is to use active verb phrases for association names (such as ‘is hired through HIRE AGREEMENTS’).
22
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
UML allows an alternative approach to the naming of associations. This alternative approach, illustrated in Figure 2.5, is to use ‘role names’ to annotate the associations, where these role names describe the role that the instances of the classes play with respect to the associations. Role names consist of nouns or noun phrases.
Figure 2.5 The use of role names
VEHICLE
hired vehicle 1..1 hiring hire agreement 0..*
HIRE AGREEMENT
Instead of using crow’s feet and solid and dotted lines, UML uses a concept called ‘multiplicity’ to annotate its associations. The ‘0..*’ multiplicity at the HIRE AGREEMENT end of the association shown in Figures 2.4 and 2.5 represents that each vehicle is hired through zero to many hire agreements. This is equivalent to ‘each vehicle may be hired through one or more hire agreements’. Each multiplicity has a minimum value (sometimes called the ‘lower bound’) and a maximum value (sometimes called the ‘upper bound’), with these values separated by two dots. In the case of our ‘0..*’, the minimum value or lower bound is ‘zero’, represented by ‘0’, and the maximum value or upper bound is ‘many’, represented by ‘*’. UML allows the ‘0..*’ multiplicity to be replaced by the symbol ‘*’. This leaves the reader to infer that the minimum value is zero. This is not good practice, especially when dealing with business users. The ‘1..1’ multiplicity at the VEHICLE end of this association represents that each hire agreement is for only one vehicle, equivalent to ‘each hire agreement must be for one and only one vehicle’. UML allows the ‘1..1’ multiplicity to be replaced by the number ‘1’. Again, this is not good practice as it leaves the reader to infer that the minimum value equals the maximum value.
23
MODELLING BUSINESS INFORMATION
DEGREES OF CARDINALITY AND OPTIONALITY Cardinality is a term used in mathematical set theory. The cardinality of a set is a measure of the number of elements of the set. When modelling information, it is important that we document for each entity occurrence or object, how many instances of the related entity type or object class can or must be related to that instance. In the relationship shown in Figure 2.3, for each vehicle there may be many hire agreements and for each hire agreement there can only be one related vehicle. The fact that there are ‘many’ instances at one end of the relationship and only ‘one’ instance at the other end is known as the cardinality of the relationship. When modelling information, only two measures of cardinality are normally used: ‘one’ and ‘many’. Hence, the relationship in the entity relationship model in Figure 2.3 between the VEHICLE entity type and the HIRE AGREEMENT entity type is known as a ‘one-tomany relationship’. UML, however, allows an integer, such as ‘3’, to be used instead of ‘1’ (one) or ‘*’ (many). The multiplicity ‘0..3’ means that the lower bound is ‘zero’ and the upper bound is ‘three’ – there cannot be more than three associated objects but there may be none at all. Similarly, the multiplicity ‘5..10’ means that there are at least five associated objects but there cannot be more than 10. The association in Figures 2.4 and 2.5 between the VEHICLE object class and the HIRE AGREEMENT object class is similarly known as a ‘one-to-many association’. Optionality is where we consider whether an instance of an entity type or object class can exist without being linked to an instance of the related entity type or object class. In the relationship in Figure 2.3 between HIRE AGREEMENT and VEHICLE we see that for each and every hire agreement there must be an associated vehicle – you cannot have a hire agreement without a vehicle. In our entity relationship notation, the relationship was ‘each HIRE AGREEMENT must be made for one and only one VEHICLE’. Hence, the relationship from HIRE AGREEMENT to VEHICLE is a mandatory relationship. The equivalent UML class model association in Figures 2.4 and 2.5 was read as ‘each HIRE AGREEMENT is for only one VEHICLE’. Similarly, the association from HIRE AGREEMENT to VEHICLE is a mandatory association. Considering the relationship shown in Figure 2.3 again, we see that it is not necessary for each vehicle to have an associated hire agreement – the hire company could have just purchased a vehicle and it has not yet been hired. In the entity relationship model the relationship was ‘each VEHICLE may be hired through one or more HIRE AGREEMENTS’. Hence, the relationship from VEHICLE to HIRE AGREEMENT is an optional relationship. The equivalent UML class model association in Figures 2.4 and 2.5 was read from VEHICLE to HIRE AGREEMENT as ‘each VEHICLE is rented through one or many HIRE AGREEMENTS’. Similarly, the association from VEHICLE to HIRE AGREEMENT is an optional association. 24
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
The cardinality and optionality of a relationship in an entity relationship model or an association in a UML class model reflect the business rules that the relationship or association is trying to represent. Relationships and associations can have a cardinality of ‘one-to-many’, ‘many-to-one’, ‘one-to-one’ and ‘many-to-many’. They can also be ‘optional–mandatory’, ‘mandatory– optional’, ‘optional–optional’ and ‘mandatory–mandatory’. Thus, there are 16 possible combinations for relationships and associations. Four of these are shown in Figures 2.6 to 2.9.
Figure 2.6 One-to-many (1:n) optional–mandatory relationship and association VEHICLE
HIRE AGREEMENT
1..1
VEHICLE
made for hired through
is hired through 0..*
HIRE AGREEMENT
As we saw earlier, the relationship shown in Figure 2.6 is read as: Each VEHICLE may be hired through one or more HIRE AGREEMENTS Each HIRE AGREEMENT must be made for one and only one VEHICLE The equivalent association is read as: Each VEHICLE is hired through zero to many HIRE AGREEMENTS Each HIRE AGREEMENT is for only one VEHICLE The one-to-many, optional–mandatory relationship or association is the most common form of relationship or association on information models. The relationship in Figure 2.7 is read as: Each ALBUM must be comprised of one or more RECORDED SONGS Each RECORDED SONG may be included in one and only one ALBUM The equivalent association is read as: Each ALBUM comprises one to many RECORDED SONGS Each RECORDED SONG is included in zero to one ALBUM
25
MODELLING BUSINESS INFORMATION
Figure 2.7 One-to-many (1:n) mandatory–optional relationship and association ALBUM
RECORDED SONG
0..1
ALBUM
included in
comprises
comprised of
1..*
RECORDED SONG
The one-to-many, mandatory–optional relationship or association is very rare but can be quite useful. It can be used when the ‘thing’ at the one end is an invented thing (such as an album), but the ‘thing’ at the many end can already exist (such as a recorded song).
Figure 2.8 One-to-one (1:1) optional–mandatory relationship and association PROJECT
PROJECT MANAGER
1..1
PROJECT
manager of manager by
manages 0..1
PROJECT MANAGER
The relationship in Figure 2.8 is read as: Each PROJECT may be managed by one and only one PROJECT MANAGER Each PROJECT MANAGER must be manager of one and only one PROJECT The equivalent association is read as: Each PROJECT is managed by zero to one PROJECT MANAGER Each PROJECT MANAGER manages only one PROJECT The one-to-one relationship or association is reasonably rare and further analysis will often lead to a one-to-many relationship or association or even a many-to-many relationship or association (see below). A mandatory–mandatory, one-to-one relationship or association is very rare and will nearly always turn out to be wrong. 26
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
Figure 2.9 Many-to-many (m:n) optional–optional relationship and association DOCTOR
PATIENT
0..*
DOCTOR
treated by treater of
treats 0..*
PATIENT
The relationship in Figure 2.9 is read as: Each DOCTOR may be treater of one or more PATIENTS Each PATIENT may be treated by one or more DOCTORS The equivalent association is read as: Each DOCTOR treats zero to many PATIENTS Each PATIENT is treated by zero to many DOCTORS There is a false perception that the many-to-many relationship or association cannot exist. This false perception comes from one particular database implementation (the relational or SQL database). The information models developed during business analysis should concentrate on what information is to be held and not how that information is to be held. Many-to-many relationships or associations are totally acceptable on business analysis information models. The optional–optional, many-to-many relationship or association often appears on a model during the early stages of analysis. Further analysis will generally suggest that the original optional–optional, many-to-many relationship or association is hiding some information that is required by the business and needs to be included on the model. This idea will be investigated in Chapter 3.
MULTIPLE RELATIONSHIPS AND ASSOCIATIONS It is possible for there to be more than one relationship or association between the same two entity types or object classes. For example, our vehicle hire company can allow one-way hires, where a vehicle is collected from one branch and delivered back at another branch. There are, therefore, two distinct relationships or associations between the entity type or object class HIRE AGREEMENT and a new entity type or object class called BRANCH: the relationship
27
MODELLING BUSINESS INFORMATION
or association to the branch from which the vehicle that is the subject of the hire agreement is collected at the start of the hire, and the relationship or association to the branch to which the same vehicle is delivered at the end of the hire. These will, in most cases, be the same branch, but in the case of a one-way hire, could be different branches.
Figure 2.10 Modelling the ‘one-way’ hire situation BRANCH HIRE AGREEMENT
for vehicle collected from
BRANCH
collection point for vehicle hired through for vehicle delivered to reception point for vehicle hired through
1..1 is for vehicle collected from
0..*
1..1 is for vehicle delivered to
0..*
HIRE AGREEMENT
This concept is shown in Figure 2.10. The relationships shown in Figure 2.10 are read as: Each BRANCH may be collection point for vehicle hired through one or more HIRE AGREEMENTS Each HIRE AGREEMENT must be for vehicle collected from one and only one BRANCH and, for the second relationship: Each BRANCH may be reception point f or vehicle hired through one or more HIRE AGREEMENTS Each HIRE AGREEMENT must be for vehicle delivered to one and only one BRANCH Similarly, the associations on the UML class model are read as: Each BRANCH is collection point for vehicle hired through zero to many HIRE AGREEMENTS Each HIRE AGREEMENT is for vehicle collected from only one BRANCH and: Each BRANCH is return location for vehicle hired through zero to many HIRE AGREEMENTS Each HIRE AGREEMENT is for vehicle returned to only one BRANCH
28
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
RECURSIVE RELATIONSHIPS AND REFLEXIVE ASSOCIATIONS Sometimes a business relationship will exist between instances of the same entity type or object class. The most typical business situation where this occurs is where one employee supervises a number of other employees, although there are many other situations where this can occur. Such a relationship in an entity relationship model is formally known as a recursive relationship while the equivalent in a UML class model is formally known as a reflexive association. To most modellers, however, this is known as a ‘pig’s ear’ relationship or association. This concept is shown in Figure 2.11.
Figure 2.11 Employee supervision
EMPLOYEE
supervises supervised by
0..*
EMPLOYEE supervisor of
0..1
The relationship is read as: Each EMPLOYEE may be supervisor of one or more other EMPLOYEES Each EMPLOYEE may be supervised by one and only one other EMPLOYEE Note the inclusion of the word ‘other’ to indicate that the relationship or association is to a different instance of EMPLOYEE. The class model association is read as: Each EMPLOYEE supervises zero to many other EMPLOYEES Each EMPLOYEE is supervised by zero or one other EMPLOYEE Note the optionality. In the entity relationship model it is optional at both ends, indicated by the dotted line, and in the class model the multiplicities are zero to one (‘0..1’) at one end and zero to many (‘0..*’) at the other end of the association. This is typical of a recursive relationship or reflexive association that represents management or supervisory relationships. There is always someone at the top of an organisation who is not supervised by anybody. Similarly, there will be employees at the bottom of the organisation who do not supervise anyone.
29
MODELLING BUSINESS INFORMATION
EXERCISES FOR CHAPTER 2 2.1 Write down the formal sentences that represent the relationship in the following diagram:
PROJECT
responsibility of
DEPARTMENT
responsible for
2.2 Redraw the following UML class model using Ellis-Barker entity relationship notation:
INDIVIDUAL
1..* is held by 0..*
ACCOUNT
2.3 Redraw the following Ellis-Barker entity relationship using UML class model notation, using role names instead of direct association naming:
COURSE
EMPLOYEE
taught by allocated as instructor for
30
MODELLING THE THINGS OF INTEREST TO THE BUSINESS AND THE RELATIONSHIPS BETWEEN THEM
2.4 Draw partial information models using BOTH Ellis-Barker entity relationship notation and UML class model notation to represent the following situations. a A sales person is always employed within a single sales region; a sales region can be staffed with many sales persons. b A flight must always be from one airport and must always be to an airport (which may, or may not be, the same airport); an airport can be the departure airport or the arrival airport for a flight. c Within business analysis we can think of processes at three levels: at the top level (the organisation level) are the processes; each process comprises a series of tasks; and each task comprises a series of steps. It is possible to identify a process without identifying the tasks within that process. Similarly, it is possible to identify a task without identifying the steps within that task.
31
3 MODELLING MORE COMPLEX RELATIONSHIPS
This chapter explores some of the more complex relationships that can exist between entity types and object classes. The topics covered are the resolution of many-to-many relationships and associations (including the oddity known as the ‘Bill of Materials’ structure), mutually exclusive relationships and associations, which lead to generalisation and specialisation, and, finally a quick look at aggregation and composition.
THE PROBLEMS WITH MANY-TO-MANY RELATIONSHIPS AND ASSOCIATIONS Although many-to-many relationships and associations are legitimate constructs to have on a model, they can pose a problem as they may be hiding some important information that the business should be interested in recording. It is the job of the analyst to test each of these many-to-many relationships and associations to see what information, if any, they are hiding. In our vehicle hire business example, consider the situation where each branch has a number of employees working at that branch and, during their time with the company, an employee can be located at a number of different branches. The model for this is shown in Figure 3.1.
Figure 3.1 Employees and branches BRANCH
BRANCH 1..*
location for employs assigned to
EMPLOYEE
32
0..*
EMPLOYEE
MODELLING MORE COMPLEX RELATIONSHIPS
The relationship between EMPLOYEE and BRANCH is read as: Each EMPLOYEE must be assigned to one or more BRANCHES Each BRANCH may be location for one or more EMPLOYEES The equivalent class model association is read as: Each EMPLOYEE is employed by zero to many BRANCHES Each BRANCH employs zero to many EMPLOYEES In this example, the entity type or object class EMPLOYEE can record information about each employee, such as their name, their address and their grade. Similarly, the entity type or object class BRANCH can record information about each branch, such as the name of the branch and the address of the branch. Hidden in this relationship or association is the need for some information that is neither specifically about employees nor specifically about branches; this is information about the assignments of the employees to the branches. Such information could include, for example, the start and end dates of the assignments, the role while at the branch and whether it is a permanent or temporary assignment. Effectively we need to record information about the relationship between the instances of the entity types or object classes rather than about the instances of the entity types or object classes themselves. The way out of this problem is to replace the relationship or association by a new entity type or object class, with appropriate relationships or associations, to hold the information we require. This is known within the modelling community as ‘resolving the many-to-many’.
RESOLVING ENTITY RELATIONSHIP MODEL MANY-TO-MANY RELATIONSHIPS Resolving the many-to-many relationship in Figure 3.1 means that we need to replace the relationship with a new entity type which, in this case, it makes sense to call ASSIGNMENT. This new entity type will then need relationships to both the EMPLOYEE and BRANCH entity types. The resolution is shown in Figure 3.2. The relationships shown in Figure 3.2 are read as: Each EMPLOYEE must be assigned through one or more ASSIGNMENTS Each ASSIGNMENT must be of one and only one EMPLOYEE and Each BRANCH may be location for employee assigned through one or more ASSIGNMENTS Each ASSIGNMENT must be to one and only one BRANCH
33
MODELLING BUSINESS INFORMATION
Figure 3.2 Introducing the ASSIGNMENT entity type
ASSIGNMENT
BRANCH
to location for employee assigned through
EMPLOYEE
of assigned through
We give a special name to entity types such as the ASSIGNMENT entity type. Such an entity type is known as an associative entity type. Associative entity types are sometimes (incorrectly) referred to by other names, such as ‘link entities’. The first thing we need to do when resolving a many-to-many relationship is decide on a name for the associative entity type. Sometimes that is easy, as in this case, because there is a business concept, an assignment, about which we need to record information. Other times it is more difficult, and then we would need to resort to an artificial name such as EMPLOYEE BRANCH ASSOCIATION. Having decided on the name of the associative entity type the next thing we need to do is consider the nature of the relationships from the associative entity type to the original entity types. In the original relationship, each branch could be the location for many employees, so now a branch may be associated with many assignments. Hence ‘Each BRANCH may be … one or more ASSIGNMENTS’. In fact, the optionality at the BRANCH end of the new relationship is exactly the same as the optionality at the BRANCH end of the original relationship. Also in the original relationship, each employee must be assigned to at least one branch, so now an employee must be the subject of at least one assignment, and over time, maybe more than one assignment. Hence ‘Each EMPLOYEE must be … one or more ASSIGNMENTS’. As before, the optionality at the EMPLOYEE end of the new relationship is exactly the same as the optionality at the EMPLOYEE end of the original relationship. For an assignment to exist there must be an employee being assigned and a branch to which they are being assigned. Hence the optionality at the ASSIGNMENT end of both relationships is mandatory and ‘Each ASSIGNMENT must be … EMPLOYEE’ and ‘Each ASSIGNMENT must be … BRANCH’. Now we have decided: yy the optionality at the BRANCH entity type end of the relationship from the ASSIGNMENT associative entity type (may be); yy the optionality at the EMPLOYEE entity type end of the relationship from the ASSIGNMENT associative entity type (must be); yy the optionality at the ASSIGNMENT associative entity type end of the relationship from the BRANCH entity type (must be); 34
MODELLING MORE COMPLEX RELATIONSHIPS
yy the optionality at the ASSIGNMENT associative entity type end of the relationship from the EMPLOYEE entity type (must be); yy the cardinality for the relationship to the ASSIGNMENT associative entity type from the BRANCH entity type (one or more); yy the cardinality for the relationship to the ASSIGNMENT associative entity type from the EMPLOYEE entity type (one or more). The only thing we now need to consider is the cardinality at the BRANCH and EMPLOYEE ends of the relationships from ASSIGNMENT. In each case, we need to consider whether more than one branch can be associated with a single assignment (highly unlikely) or whether more than one employee can be associated with a single assignment. While it is also unlikely that more than one employee can be associated with a single assignment it does depend on the business definition of an assignment. Can a group of employees all be assigned to a branch on the same day for the same length of time? Or is an assignment always of a single employee to a single branch? Since there is no crow’s foot at the EMPLOYEE end of the relationship from ASSIGNMENT we can assume that the definition of an assignment concerns a single employee being assigned to a single branch. If the definition of an assignment allowed for more than one employee to be involved in that assignment there would be another many-to-many relationship from ASSIGNMENT to EMPLOYEE and that would need to be analysed to see if it should be resolved.
RESOLVING CLASS MODEL MANY-TO-MANY ASSOCIATIONS The equivalent resolution of the relationship in Figure 3.1 is shown in the UML class model in Figure 3.3.
Figure 3.3 Introducing the ASSIGNMENT object class
BRANCH 1..1 is to 0..*
ASSIGNMENT 1..* is of 1..1
EMPLOYEE
35
MODELLING BUSINESS INFORMATION
UML allows the use of a special type of class, called an association class, with a manyto-many association to avoid the need to resolve that many-to-many association. The use of an association class in our example is shown in Figure 3.4. Please note the dashed association to the many-to-many association itself.
Figure 3.4 Introducing the ASSIGNMENT association class
BRANCH 1..*
ASSIGNMENT
employs 0..*
EMPLOYEE
THE ‘BILL OF MATERIALS’ STRUCTURE It is very common for recursive relationships or reflexive associations to be ‘manyto-many’ and, thus, to need resolving. The resultant structure is known as a bill of materials structure. Consider the recursive many-to-many relationship and the equivalent reflexive many-to-many association shown in Figure 3.5.
Figure 3.5 Introducing products within products PRODUCT
comprises comprised of
0..*
PRODUCT used within
0..*
The situation depicted here is a manufacturing enterprise, perhaps the manufacturer of some of the vehicles that our example company hires. This manufacturer has the products that are manufactured (such as the trucks, vans and cars) and the products that are 36
MODELLING MORE COMPLEX RELATIONSHIPS
used in their manufacture (such as the chassis, the engines and the wheels). Engines, in turn are made up of other products (valves, bolts and so on). And some of these smaller products will be used in more than one of the larger products. As well as selling the trucks, vans and cars, the manufacturer will also be selling the wheels, engines, valves, bolts and other components to organisations that are carrying out repairs and maintenance and even to the end-users who want to maintain their own vehicles. Hence, we have the many-to-many relationship or association. Applying the standard rules for resolving many-to-many relationships and associations, we get the structures shown in Figures 3.6 and 3.7.
Figure 3.6 The bill of materials structure in Ellis-Barker notation PRODUCT COMPONENT SPECIFICATION
PRODUCT
for specified with specifies use as component of specified as component in
Figure 3.7 The bill of materials structure in UML class model notation PRODUCT
1..1 comprises
1..1 is used within
0..*
0..*
PRODUCT COMPONENT SPECIFICATION
So, why is this familiar pattern known as a bill of materials structure? ‘Bill of materials’ is a term in common use in the manufacturing industry, where a bill of materials is a list of the raw materials, subassemblies, components, parts (and the quantities of each) needed to manufacture an end product.
37
MODELLING BUSINESS INFORMATION
This is exactly what the example illustrates. There are the instances of PRODUCT, the things you can actually touch, the finished products of the manufacturing process, the trucks, the vans and the cars, and the components used within them, such as the wheels and the engines. The choice of name for the associative entity type and object class is particularly difficult. There is a tendency to call it something like PRODUCT COMPONENT, or even just COMPONENT, but this would be wrong because both of these names suggest that instances are also tangible things. But you cannot touch the instances of the associative entity type and object class – it is purely a concept, saying that, for example, there is one engine in a car and a car has four wheels (or that could be five, if you count the spare). The information recorded in the associative is the fact that a particular type of wheel (an instance of PRODUCT) is included in a particular model of car (another instance of PRODUCT) and there are four or five wheels, as appropriate in that particular model of car. Hence, the choice of name (PRODUCT COMPONENT SPECIFICATION) for the associative entity type and object class in the example. The bill of materials structure in information modelling is not restricted to manufacturing. It can appear when modelling requirements in a number of different circumstances within organisations. For example, consider a business that is organised using a matrix structure.11 When a matrix organisation structure is used, each employee can have more than one supervisor at the same time. Under these circumstances the structure we saw earlier in Figure 2.11 needs to be adapted to recognise the many-to-many status of the supervisory relationship, as shown in Figure 3.8.
Figure 3.8 Employee supervision in a matrix organisation EMPLOYEE
supervises supervised by
0..*
EMPLOYEE supervisor of
0..*
If we need information about these supervisions, such as start and end dates and the status of the supervision, then that many-to-many relationship needs resolving. This is shown in Figure 3.9. Despite the fact that this has nothing to do with manufacturing and the ‘list of the raw materials, subassemblies, components, parts (and the quantities of each) needed to manufacture an end product’, this is still commonly called a ‘bill of materials’ structure within the modelling community.
11
An organisational structure in which the reporting relationships are set up as a grid, or matrix, rather than in the traditional hierarchy; for example, an employee may report both to a functional manager and to a product or project manager.
38
MODELLING MORE COMPLEX RELATIONSHIPS
Figure 3.9 Employee supervision in a matrix organisation resolved
EMPLOYEE SUPERVISION ASSIGNMENT
EMPLOYEE
EMPLOYEE of cited as supervised employee in
1..1 is supervised through
1..1 is supervisor through
0..*
0..*
EMPLOYEE SUPERVISON ASSIGNMENT
by cited as supervising employee in
MUTUALLY EXCLUSIVE RELATIONSHIPS AND ASSOCIATIONS Figure 3.10 repeats the Ellis-Barker model for the vehicle hire company that we first saw in Figure 2.1.
Figure 3.10 The vehicle hire company as shown in Figure 2.1 HIRE AGREEMENT
PERSON
made with hirer in
VEHICLE
made for hired through
described by description of
VEHICLE TYPE
In this model, all of their hirers were people, but what would happen if some of the hirers could be companies? Now we would have to introduce a new entity type, COMPANY, and have a similar relationship between HIRE AGREEMENT and COMPANY as we have between HIRE AGREEMENT and PERSON. Normally, if an entity type is related to two or more other entity types all relationships apply for all instances of that entity type. In this case, however, only one of the two relationships between HIRE AGREEMENT and COMPANY and between HIRE AGREEMENT and PERSON can apply for any one instance of HIRE AGREEMENT. A hire agreement is made with a person or a company but not with both. These two relationships are mutually exclusive. 39
MODELLING BUSINESS INFORMATION
We recognise that relationships are mutually exclusive by using an arc (called an exclusive arc) that crosses the ‘either/or’ relationship lines, as shown in Figure 3.11.
Figure 3.11 The introduction of an exclusive arc HIRE AGREEMENT
PERSON
made with hirer in
COMPANY
made with hirer in
VEHICLE
made for hired through
described by description of
VEHICLE TYPE
These relationships are read as: Each PERSON may be hirer in one or more HIRE AGREEMENTS Each COMPANY may be hirer in one or more HIRE AGREEMENTS Each HIRE AGREEMENT must be made with one and only one PERSON OR made with one and only one COMPANY In Figure 3.11 dots were placed where the exclusive arc intersects the relationship lines. These dots are not necessary in this simple situation but can be helpful when developing larger, more complex models to show which relationships are in which exclusive group of relationships represented by each exclusive arc. UML allows constraints between associations to be included on a class model and the most common form of constraint is the ‘exclusive-or’ constraint, which is indicated by an ‘{xor}’ symbol. The associations involved in the constraint are joined by a dashed line. This is shown in Figure 3.12. Although exclusive arcs and {xor} constraints are very useful, particularly in the early stages of modelling, they can be hiding some important facts that should be exposed on the model. In our vehicle hire company, for instance, there may be a need to record different information about a hire agreement made with a person from that needed for a hire agreement made with a company. This leads us on to a consideration of the concepts of generalisation and specification.
40
MODELLING MORE COMPLEX RELATIONSHIPS
Figure 3.12 The introduction of the {xor} constraint VEHICLE TYPE
describes
1..1 1..*
VEHICLE
PERSON
1..1
COMPANY
1..1 is made with
1..1 is made with
is hired through
{xor} 0..*
0..*
0..*
HIRE AGREEMENT
GENERALISATION AND SPECIALISATION IN ENTITY RELATIONSHIP MODELS Specialisation, and its reverse, generalisation, is where we recognise that the entity occurrences (the instances) of one of our entity types has a number of distinct subsets of those entity occurrences, with each of those subsets having some characteristics in common. This is illustrated in Figure 3.13 where we recognise that there are two subsets of hire agreement, those that are personal hire agreements and those that are company hire agreements. We have created a separate entity type for each of those subsets, known as a subtype, which is drawn within the supertype, the superset of the subsets. This ‘boxes within boxes’ depiction, currently unique to the Ellis-Barker notation, is very easy for business users to understand. Subtypes depict the way that the business classifies the different things of interest to the business. In our example, HIRE AGREEMENT is the supertype with two subtypes: PERSONAL HIRE AGREEMENT and COMPANY HIRE AGREEMENT. When we start with HIRE AGREEMENT and recognise that there are subtypes such as PERSONAL HIRE AGREEMENT and COMPANY HIRE AGREEMENT we say that we are employing specialisation. If we start with entity types such as PERSONAL HIRE AGREEMENT and COMPANY HIRE AGREEMENT and recognise that there is really a supertype called HIRE AGREEMENT we say that we are employing generalisation. Others call this a supertype–subtype hierarchy. 41
MODELLING BUSINESS INFORMATION
Figure 3.13 An example of a supertype–subtype hierarchy HIRE AGREEMENT PERSONAL HIRE AGREEMENT COMPANY HIRE AGREEMENT
PERSON
made with hirer in
COMPANY
made with hirer in
made for
VEHICLE
hired through described by description of
VEHICLE TYPE
Each instance of PERSONAL HIRE AGREEMENT is also an instance of HIRE AGREEMENT. Likewise, each instance of COMPANY HIRE AGREEMENT is also an instance of HIRE AGREEMENT. In the real world, in each case there is only one actual occurrence of a hire agreement, but it simultaneously exhibits the characteristics of both a HIRE AGREEMENT (the supertype) and either a PERSONAL HIRE AGREEMENT or a COMPANY HIRE AGREEMENT (the appropriate subtype). Each instance of HIRE AGREEMENT must, therefore, also be seen as an instance of PERSONAL HIRE AGREEMENT or an instance of COMPANY HIRE AGREEMENT. All of the properties of the HIRE AGREEMENT entity type (the supertype) apply, therefore, to each and every instance of the PERSONAL HIRE AGREEMENT entity type and to each and every instance of the COMPANY HIRE AGREEMENT entity type (the subtypes). We say that the subtypes inherit all of the properties of the supertype. Three fundamental rules we apply to supertype–subtype hierarchies when using the Ellis-Barker entity relationship notation are: yy Subtypes are disjoint (or mutually exclusive) – a personal hire agreement cannot also be a company hire agreement and a company hire agreement cannot also be a personal hire agreement. yy The hierarchy is complete – all instances of HIRE AGREEMENT are either instances of PERSONAL HIRE AGREEMENT or instances of COMPANY HIRE AGREEMENT. If there are other types of hire agreement they must be shown on the diagram as subtypes of HIRE AGREEMENT.
42
MODELLING MORE COMPLEX RELATIONSHIPS
yy Subtypes are immutable – a personal hire agreement cannot change into a company hire agreement and a company hire agreement cannot change into a personal hire agreement.It is sometimes said that the hierarchy is static. Although these rules can make life difficult for the modeller they do provide a discipline that has benefits if the rules are followed. This discipline will lead to clearer and less ambiguous models. These will lead to more stable and flexible database designs. An alternative depiction of supertypes and subtypes that uses mandatory–mandatory, one-to-one relationships and an exclusive arc is shown in Figure 3.14. This use of this depiction is now seldom used because of the superior ‘boxes within boxes’ depiction.
Figure 3.14 Alternative depiction of a supertype–subtype hierarchy PERSONAL HIRE AGREEMENT
COMPANY HIRE AGREEMENT
PERSON
made with hirer in
HIRE AGREEMENT
COMPANY
made with hirer in
Note that these one-to-one relationships are unnamed – the only circumstance where this is acceptable. These relationships are read as: Each PERSONAL HIRE AGREEMENT must be one and only one HIRE AGREEMENT Each COMPANY HIRE AGREEMENT must be one and only one HIRE AGREEMENT Each HIRE AGREEMENT must be one and only one PERSONAL HIRE AGREEMENT OR one and only one COMPANY HIRE AGREEMENT
GENERALISATION AND SPECIALISATION IN CLASS MODELS UML class models have a similar concept to entity relationship models, with object classes known as superclasses and subclasses, shown in Figure 3.15. The subclasses, PERSONAL HIRE AGREEMENT and company HIRE AGREEMENT, are linked to their superclass, HIRE AGREEMENT, by a branching arrow.
43
MODELLING BUSINESS INFORMATION
Figure 3.15 A UML superclass–subclass hierarchy VEHICLE TYPE
describes
1..1
PERSON
COMPANY
1..*
VEHICLE
1..1 is made with
1..1 is hired through 0..*
1..1 is made with
0..*
PERSONAL HIRE AGREEMENT
HIRE AGREEMENT 0..*
COMPANY HIRE AGREEMENT
UML also allows an alternative notation. Here each subclass is linked to the superclass using its own arrowhead. This is shown in Figure 3.16.
Figure 3.16 Alternative notation for a UML superclass–subclass hierarchy VEHICLE TYPE
describes
1..1
PERSON
COMPANY
1..*
VEHICLE
1..1 is made with
1..1 is hired through 0..*
1..1 is made with
0..*
PERSONAL HIRE AGREEMENT
HIRE AGREEMENT 0..*
COMPANY HIRE AGREEMENT
Using this alternative notation that uses one arrowhead per subclass can at times be confusing, especially as UML allows the use of multiple superclass–subclass hierarchies. An example of the use of multiple superclass–subclass hierarchies is shown in Figure 3.17. Here we see that an employee, represented by an instance of the EMPLOYEE class, may be classified by their gender, by their role within the company and by whether they are a branch manager or not (their status). To represent this, we have three specialisation hierarchies – UML calls these generalisation sets. The discriminators – ‘role’, ‘gender’ and ‘status’ – are used to distinguish between these generalisation sets. 44
MODELLING MORE COMPLEX RELATIONSHIPS
Figure 3.17 A UML class model with multiple superclass–subclass hierarchies FEMALE EMPLOYEE
MALE EMPLOYEE
gender {complete, dynamic}
EMPLOYEE role
ADMINISTRATOR
SALES PERSON status
BRANCH MANAGER
The rules applied by UML generalisation sets are different from the rules that apply when using the Ellis-Barker entity relationship notation. The UML rules for generalisation sets are as follows: yy Generalisation sets can be ‘disjoint’ or ‘overlapping’: when a generalisation set is disjoint the instances of any one subclass within a superclass are mutually exclusive with the instances of each of the other subclasses that are specified for that superclass; when a generalisation set is overlapping the instances of the subclasses within a superclass are not mutually exclusive. yy Generalisation sets can be ‘complete’ or ‘incomplete’: when a generalisation set is complete each of the instances of the superclass are also instances of one of the subclasses that are specified for that superclass; when a generalisation set is incomplete some of the instances of the superclass are not instances of any of the subclasses that are specified for that superclass. yy Generalisation sets can be ‘static’ or ‘dynamic’: when a generalisation set is static each of the instances of the subclasses within a superclass are immutable – that is, an instance of one subclass within a superclass cannot at some later date change to become an instance of another subclass within the same superclass. When a generalisation set is dynamic each of the instances of the subclasses within a superclass are not immutable – that is, an instance of one subclass within a superclass can change to become an instance of another subclass within the same superclass at some time in the future. The UML default for a generalisation set is ‘disjoint’, ‘incomplete’ and ‘static’. UML class models are annotated to indicate the nature of any generalisation set that is not ‘disjoint’, ‘incomplete’ and ‘static’. In the example, all employees are either male or female and the generalisation set is annotated as ‘complete’. Also, employees cannot have both male and female gender at the same time but it is possible for them to change their gender over time, so the generalisation set is annotated as ‘dynamic’. 45
MODELLING BUSINESS INFORMATION
AGGREGATION AND COMPOSITION Aggregation and composition are real-world concepts that have implications for future database design. They should be shown, therefore, on our information models. Despite the fact that in the non-technical world the words ‘aggregation’ and ‘composition’ are near synonyms – in Roget’s Thesaurus they are both shown as synonyms of ‘assemblage’ – and are often used interchangeably, they have specific and distinct meanings for us. The Ellis-Barker entity relationship modelling notation does not have any special symbols for aggregation and composition, so the fact that aggregation or composition is involved can only be shown through the choice of appropriate optionality and names for any relationship that is involved. UML does, however, have special symbols for aggregation and composition. Aggregation is a special form of relationship or association that specifies a whole–part relationship between the aggregate (whole) and its component parts, where those component parts can have an independent existence irrespective of whether the aggregate exists or not. So, for example, if the branches of our vehicle hire company are organised into areas under an area manager, the areas are the aggregates, which can be represented by a new AREA object class, and the branches are the component parts, represented by the BRANCH object class. An Ellis-Barker representation of the aggregation of branches within areas is shown in Figure 3.18.
Figure 3.18 Aggregation using Ellis-Barker notation BRANCH
AREA
part of aggregation of
In a UML class model, the fact that the instances of AREA are aggregates can be represented by the addition of a white diamond at the AREA end of the association to BRANCH. This is shown in Figure 3.19. Composition is a stronger form of aggregation where the life of the component parts is dependent on the existence of the whole. Consider a situation where a company is receiving orders for its products or services. We will call these sales orders, with the sales orders (the wholes) represented in Figure 3.20 by the SALES ORDER object class. Each of these sales orders can be for more than one product or service. The specification of the products or services ordered we will call ‘sales order items’ (the component parts) and these are represented by the SALES ORDER ITEM object class. The existence of a sales order item is dependent on the existence of the sales order of which it is a part. If the sales order is deleted, all of the sales order items must also be deleted.
46
MODELLING MORE COMPLEX RELATIONSHIPS
Figure 3.19 An example of the use of the aggregation symbol in a UML class model
AREA 0..1 is within 0..*
BRANCH
In a UML class model, the fact that the instances of SALES ORDER are composed of instances of SALES ORDER ITEM can be represented by the addition of a black diamond at the SALES ORDER end of the association to SALES ORDER ITEM.
Figure 3.20 An example of the use of the composition symbol in a UML class model SALES ORDER 1..1 is included on 1..*
SALES ORDER ITEM
An Ellis-Barker representation of the composition of sales order items within sales orders is shown in Figure 3.21.
Figure 3.21 Composition using Ellis-Barker notation SALES ORDER ITEM
SALES ORDER
part of composed of
47
MODELLING BUSINESS INFORMATION
Aggregation and composition, and the associated UML symbols, can be difficult concepts for business users to understand. These concepts should be used with care, especially on models that need to be discussed with business users. In fact, a good case can be made for business analysts not to use the UML symbols for aggregation (the white diamond) and composition (the black diamond) since some business users find their use confusing and they add little to the model that cannot be explained in other ways. These symbols are mentioned here for completeness – you might sometimes see a UML class model that includes these symbols in a model prepared by someone else.
EXERCISES FOR CHAPTER 3 3.1 Using Ellis-Barker entity relationship notation, resolve the many-to-many relationship in the following diagram:
EMPLOYEE
PROJECT
assigned to assigned with
3.2 Using UML class model notation, resolve the many-to-many association in the following diagram:
FLIGHT
1..* is booked on 0..*
PASSENGER
3.3 Using Ellis-Barker entity relationship notation, resolve the many-to-many relationship in the following diagram:
FOOD ITEM part of
ingredient within
comprised of
48
MODELLING MORE COMPLEX RELATIONSHIPS
3.4 Draw partial information model diagrams using both Ellis-Barker entity relationship notation and UML class model notation to represent the following situations. In each case, any many-to-many relationships or associations should be resolved. Any mutually exclusive relationships or associations should be shown. Any subtypes or subclasses should also be shown.
a A student must register for at least one course, but can register for more than one course; a course can have many students registered for that course.
b An actor can play roles in many films; a film can have many actors playing roles. c An employee can be assigned to a department or to a project; projects and departments can each have many employees assigned to them. d A bank provides business accounts, which can only be held by business organisations, and personal accounts, which can only be held by individuals; some personal accounts, joint personal accounts, may have an additional holder; business organisations or individuals can hold more than one appropriate account. 3.5 A business can receive orders from external organisations or from its own branches. Draw one or more partial information model diagrams using the Ellis-Barker entity relationship notation to represent this situation. Any mutually exclusive relationships or associations should be shown. Any subtypes or subclasses should also be shown. (Note: there are a number of solutions to this exercise depending on the assumptions that are made.)
49
4 DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
This chapter introduces a process for drawing an information model. It then considers two techniques for validating an information model – the data navigation path (sometimes called a query access path or an enquiry access path) and the CRUD (Create-Read-Update-Delete) matrix.
THE MODEL DRAWING PROCESS There are many different approaches to drawing an information model. The more experienced modellers will draw on many of these approaches during any set of modelling sessions. What all of these approaches have in common, however, is that none of them will allow you to arrive at a final model in one go. The very act of modelling will raise further questions, the answers to which will often, if not always, lead to some enhancement of the model. As an analyst, you should be doing more than simply documenting what you are told using ‘boxes and lines’ – you should be fully analysing the situation to determine the deep underlying nature of the information being used by the business. I recall that, back in the early 1990s, when I was relatively new to information and data modelling – well before the widespread use of drawing packages and tools to support modelling – I naively asked one of the consultants then working with us when I would know that a model was complete. The answer I got was ‘When the Tipp-Ex [correction fluid] on the model is so thick that it is able to stand on its edge’. The implication of this statement is that even an experienced modeller has to rework a model many, many times. This statement is as true today as it was 25 years ago. Which is all very well, but it does not really help someone who is new to modelling. What is needed is a ‘recipe’ from which a modeller can cook an acceptable model. Just such a recipe is illustrated in Figure 4.1. There are seven steps to this process: (1) Using a scenario or some other textual description of the situation, identify the entity types or object classes. (2) Identify the direct relationships or associations between the identified entity types or object classes. (3) Draw an initial diagram of entity types/object classes and relationships/ associations.
50
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
(4) (5) (6) (7)
Validate the diagram, probably using a data navigation path or a CRUD matrix. Update the diagram as necessary as a result of the validation. Revalidate the diagram. Iterate through steps (4) to (6) as many times as necessary until all of the information requirements have been identified.
Figure 4.1 The model drawing process
(7)
Iterate through steps (4) to (6)
(6)
Revalidate the diagram
(5)
Update the diagram
(4)
Validate the diagram
(3)
Draw an initial diagram
(2) (1)
Identify the direct relationships or associations Identify the entity types or object classes
IDENTIFYING THE ENTITY TYPES OR THE OBJECT CLASSES As we saw earlier, entities and objects are identifiable things of interest to the business about which information needs to be recorded. These entities and objects can be tangible things such as people, places, buildings, equipment or assets, or intangible concepts such as contracts or transactions. Our models do not, however, show entities or objects – they show entity types or object classes. An entity type is defined as a template for a set of entities with the same characteristics, while an object class is similarly defined as a template for a set of objects with the same characteristics. When we identify entity types we are also identifying object classes. To identify entity types or object classes we need to identify those sets of ‘things of interest to the business’ that have the same characteristics. Each of these things must be things about which the business needs to record information and each instance also needs to be individually identifiable. This identification could be because each instance of the entity type or object class has a unique name or serial number (within the business under consideration). 51
MODELLING BUSINESS INFORMATION
So, now let us consider our vehicle hire business replenishing its stock of vehicles for hire. The business will issue a purchase order to one of its suppliers. There is a business rule that information is not recorded about suppliers until a purchase order has been placed upon the supplier. Each purchase order is for a quantity of vehicles which may or may not be all the same model, with each vehicle type (or model) being documented as a separate item on the purchase order. Upon acceptance of a purchase order, the supplier will fulfil the order placed upon it by making a delivery, or a number of deliveries, of vehicles. On receipt of a delivery, each of the items delivered (the vehicles) will be checked to make sure that (i) it correctly corresponds with an item that was included in a purchase order, and (ii) it is acceptable – for example, there is no damage to the vehicle. So, what are the ‘things of interest’ in this scenario that will form our entity types or object classes? One simple way is look for the nouns, or noun phrases. This little scenario has seven potential entity types or object classes. At this point it is also useful to start developing a definition for each of our potential entity types or object classes – these help to clarify our understanding and prevent misunderstanding later on. See Table 4.1 for the identified entity types/object classes and definitions.
Table 4.1 Identified entity types or object classes Entity type or Object class
Definition
SUPPLIER
A legal entity (company or organisation) that supplies us with goods and/or services.
PURCHASE ORDER
An authorisation issued by us to a supplier indicating the types, quantities and agreed prices for the supply of goods or services.
PURCHASE ORDER ITEM
An essential component part of a purchase order being the specification on that purchase order of the type, quantity and agreed price for the supply of a particular good or service.
VEHICLE TYPE
A particular brand or model of vehicle sold by a manufacturer identified with a particular body type, a particular engine type, a particular transmission type and a particular first year of production.
DELIVERY
A formal hand-over of goods or services to us by a supplier or their agent.
DELIVERY ITEM
An essential component part of a delivery being a particular good or service that has been delivered as part of that delivery.
DELIVERY ITEM RECONCILIATION
An agreement, or otherwise, that a delivered item correctly corresponds with a purchase order item and that it is acceptable to us, being of appropriate quality and without damage.
52
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
IDENTIFYING THE RELATIONSHIPS OR ASSOCIATIONS Step 2 of the process is to identify the relationships or associations between the entity types and object classes. There will be many relationships or associations, but some of these will be what are called indirect relationships or associations – relationships or associations that link two entity types or object classes via another entity type or object class. We are not interested in those (from a modelling point of view). We are only interested in the direct relationships or associations – those relationships or associations that directly link one entity type or object class to another. One approach to the identification of relationships or associations is to produce a matrix, such as in Figure 4.2. This approach is only really feasible for relatively small models, but it does provide a starting point for a ‘new modeller’.
Figure 4.2 A ‘relationship matrix’ SUPPLIER PURCHASE ORDER PURCHASE ORDER ITEM VEHICLE TYPE DELIVERY DELIVERY ITEM DELIVERY ITEM RECONCILIATION
Every place there is a direct link between two entity types or object classes the appropriate cell in the matrix is marked. Sometimes it is useful at this stage to think of a verb or verb phrase that describes the business link that the relationship or association represents. This will later help us with the naming of the relationship or association. For example: yy Each purchase order is raised against a supplier (and only one supplier). yy Each purchase order item is part of a purchase order (and only one purchase order). yy Each purchase order item is for a vehicle type (and only one vehicle type). yy Each delivery is originated by a supplier (and only one supplier). yy Each delivery item is part of a delivery (and only one delivery). yy Each delivery item reconciliation is of a delivery item (and only one delivery item). yy Each delivery item reconciliation is against a purchase order item (and only one purchase order item). 53
MODELLING BUSINESS INFORMATION
DRAWING THE INITIAL DIAGRAM Step 3 of our process is to make a first attempt at drawing our model diagram. While this sounds easy, this is probably the most difficult step. Not only do we need to decide on a layout, but we also need to consider the cardinalities and optionalities which we generally ignored when we were identifying the relationships and associations. Our first attempt at the Ellis-Barker entity relationship model is at Figure 4.3 and the equivalent UML class model is at Figure 4.4. Figure 4.3 The initial Ellis-Barker entity relationship model PURCHASE ORDER ITEM DELIVERY ITEM RECONCILIATION
against
for cited in
part of object of
composed of
VEHICLE TYPE PURCHASE ORDER
of
raised against subject of
subject of
DELIVERY ITEM
DELIVERY
part of
SUPPLIER
originated by
composed of
originator of
Figure 4.4 The initial UML class model SUPPLIER 1..1
1..1
is raised against originates 1..*
0..*
DELIVERY
PURCHASE ORDER
VEHICLE TYPE
1..1 1..1
0..1 comprises
comprises
is for 0..*
1..* 1..*
PURCHASE ORDER ITEM
DELIVERY ITEM
1..1
1..1 is of is against 0..*
0..1
DELIVERY ITEM RECONCILIATION
54
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
At this stage, it is useful to consider the nature of each of the relationships and ensure that our selection of optionality and cardinality is correct. The relationship or association from SUPPLIER to PURCHASE ORDER: Each SUPPLIER must be subject of one or more PURCHASE ORDERS Each PURCHASE ORDER must be raised against one and only one SUPPLIER This agrees. We were told that a purchase order is issued to one supplier and that there is a business rule that information is not recorded about suppliers until a purchase order has been placed upon the supplier. The relationship or association from PURCHASE ORDER to PURCHASE ORDER ITEM: Each PURCHASE ORDER must be composed of one or more PURCHASE ORDER ITEMS Each PURCHASE ORDER ITEM must be part of one and only one PURCHASE ORDER A purchase order item was defined as an essential component part of a purchase order, so a purchase order must have at least one purchase order item and each purchase order item must be part of a single purchase order. The relationship or association from VEHICLE TYPE to PURCHASE ORDER ITEM: Each VEHICLE TYPE may be cited in one or more PURCHASE ORDER ITEMS Each PURCHASE ORDER ITEM may be for one and only one VEHICLE TYPE We were told that, when replenishing its vehicle stock, the business issues purchase orders for vehicles with each vehicle type (or model) documented as a separate item on the purchase order. On the other hand, a purchase order was defined so that it could include goods or services other than vehicles. So, not every purchase order item has to be for the ordering of vehicles, hence the optional nature of this relationship. The relationship or association from SUPPLIER to DELIVERY: Each SUPPLIER may be originator of one or more DELIVERIES Each DELIVERY must be originated by one and only one SUPPLIER We were told that a supplier will fulfil the orders placed upon it by making a number of deliveries (hence ‘one or more’). The relationship or association from DELIVERY to DELIVERY ITEM: Each DELIVERY must be composed of one or more DELIVERY ITEMS Each DELIVERY ITEM must be part of one and only one DELIVERY A delivery consists of a number of delivery items (the vehicles). Without at least one delivery item there is no delivery, hence each delivery must be composed of delivery items.
55
MODELLING BUSINESS INFORMATION
The relationship or association from DELIVERY ITEM to DELIVERY ITEM RECONCILIATION: Each DELIVERY ITEM may be s ubject of one and only one DELIVERY ITEM RECONCILIATION Each DELIVERY ITEM RECONCILIATION must be of one and only one DELIVERY ITEM We were told that, on receipt of a delivery, each of the items delivered (the vehicles) will be checked (reconciled) to make sure that (i) it correctly corresponds with an item that was included in a purchase order, and (ii) it is acceptable – undamaged. Each delivery item (a vehicle), therefore, has one reconciliation – this is a separate concept from the delivery item itself – but each delivery item reconciliation must be linked to a specific delivery item. The reconciliation may not take place until sometime after delivery has been taken. The relationship or association from PURCHASE ORDER ITEM to DELIVERY ITEM RECONCILIATION: Each PURCHASE ORDER ITEM m ay be object of one or more DELIVERY ITEM RECONCILIATIONS Each DELIVERY ITEM RECONCILIATION m ust be against one and only one PURCHASE ORDER ITEM We were told that one of the purposes of delivery item reconciliation is to check that the delivery item corresponds with an item that was included in a purchase order (purchase order item). Each purchase order item (an order for a number of vehicles of a particular type) can, therefore, be reconciled with many delivery items (individual vehicles), but each delivery item reconciliation must be against a single purchase order item. At any one time there can be purchase order items that are awaiting delivery or reconciliation.
VALIDATING THE DIAGRAM Now we come to the issue of validating the diagram – step 4 of our process. One approach would be to talk again to the business to confirm that we have correctly understood and documented the requirements. This could be done informally during the elicitation phase of the requirements engineering process or more formally as part of requirements validation. Before we move into requirements validation, however, we would hope that we have fully understood and documented the requirements. We would, therefore, have probably already used some other validation technique. We could look at the documentation used within the business and apply the relational data analysis process (see Chapter 6) to produce another model. This can then be used to compare with our model to highlight any deficiencies. Two other common validation techniques are the development of data navigation paths and the development of CRUD matrices.
56
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
Validating the diagram by using data navigation paths Providing we understand the queries that will be used against the recorded information, we can validate our model by envisaging those queries being applied to data in a database that has been designed using this model. These queries can be documented in a diagram that represents the navigation of the data. Such a diagram is called a data navigation path.12 Consider the following query: Determine the suppliers of vehicles of a particular vehicle type ordered during the month of November 2016. The data navigation path diagram, using a suitable notation, for this query is shown in Figure 4.5.
Figure 4.5 The first data navigation path vehicle type details
Set of VEHICLE TYPEs
VEHICLE TYPE
* vehicle type = required vehicle type
VEHICLE TYPE
o
VEHICLE TYPE
o
Set of PURCHASE ORDER ITEMs
*
PURCHASE ORDER ITEM
PURCHASE ORDER purchase order date within November 2016
o
PURCHASE ORDER
o
PURCHASE ORDER
SUPPLIER
Using the vehicle type details, we enter the set of VEHICLE TYPE instances and iterate through those instances (iteration is shown by an asterisk ‘*’ in the top right hand corner). As we iterate through the VEHICLE TYPE instances we query each VEHICLE TYPE instance to find the required vehicle type (the ‘o’ indicates ‘option’). We note from the model that there is a one-to-many association from VEHICLE TYPE to PURCHASE ORDER ITEM, which means that for each instance of VEHICLE TYPE there is a related set of PURCHASE ORDER ITEM instances. Having found our required instance of VEHICLE TYPE we can enter the relevant set of PURCHASE ORDER ITEM instances.
12
These are sometimes also called ‘query access paths’ or ‘enquiry access paths’.
57
MODELLING BUSINESS INFORMATION
We note from the model that there is a many-to-one association from PURCHASE ORDER ITEM to PURCHASE ORDER. For each instance of PURCHASE ORDER ITEM there is only one related instance of PURCHASE ORDER and, for each PURCHASE ORDER ITEM instance, we navigate to this related instance of PURCHASE ORDER. We inspect each of these instances in turn to see if the date of the purchase order is within November 2016. We note from the model that there is a many-to-one association from PURCHASE ORDER to SUPPLIER – for each instance of PURCHASE ORDER there is only one related instance of SUPPLIER. For each of the PURCHASE ORDER instances with a purchase order date within November 2016 we navigate to the related instance of SUPPLIER. Thus, we determine that we can answer the query ‘determine the suppliers of vehicles of a particular vehicle type ordered during the month of November 2016’. Now let us consider a second query: Determine the branches to which a particular supplier has made deliveries. The data navigation path diagram for this query is shown in Figure 4.6.
Figure 4.6 The second data navigation path supplier details
Set of SUPPLIERs
SUPPLIER
* supplier = required supplier
SUPPLIER
o
SUPPLIER
o
Set of DELIVERYs
DELIVERY
*
?
Using the supplier details, we enter the set of SUPPLIER instances and iterate through them. As we iterate through the SUPPLIER instances we query each one to find the required supplier. We note from the model that there is a one-to-many association from SUPPLIER to DELIVERY – for each instance of SUPPLIER there may be many linked instances of DELIVERY. We navigate to this set of DELIVERY instances that relate to the required supplier and iterate to each instance within this set to find the branch to which the delivery was made.
58
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
Now we have a problem because there is no related entity type or object class to represent the ‘branch’ concept. But ‘branch’ is almost certainly a concept that represents a thing of interest about which the business needs to record information. From this we determine that we need to add a BRANCH entity type or object class to the model with a relationship or association to DELIVERY so that: Each DELIVERY must be to one and only one BRANCH Each BRANCH may be recipient of one or more DELIVERIES Our revised Ellis-Barker entity relationship model is at Figure 4.7 and the equivalent UML class model is at Figure 4.8.
Figure 4.7 The revised Ellis-Barker entity relationship model PURCHASE ORDER ITEM DELIVERY ITEM RECONCILIATION
against
for cited in
part of object of
composed of
VEHICLE TYPE PURCHASE ORDER
of
raised against subject of
subject of
DELIVERY ITEM
DELIVERY
part of
originated by
composed of
SUPPLIER
originator of
BRANCH
to recipient of
Validating the diagram by developing a CRUD matrix Our second common validation technique involves the use of a special kind of matrix – a CRUD matrix. A CRUD matrix recognises that the four basic operations that can be applied to data within a database are operations that: yy create data; yy read data; yy update data; yy delete data.
59
MODELLING BUSINESS INFORMATION
Figure 4.8 The revised UML class model SUPPLIER 1..1
BRANCH 1..1
1..1
is raised against
is made to originates
1..*
0..*
VEHICLE TYPE
0..*
DELIVERY
PURCHASE ORDER
1..1 1..1
0..1 comprises
comprises
is for 0..*
1..* 1..*
PURCHASE ORDER ITEM
DELIVERY ITEM
1..1
1..1 is of is against 0..*
0..1
DELIVERY ITEM RECONCILIATION
The reason we develop a CRUD matrix is to compare the entity types or object classes on our model against the business processes to ensure that there are processes in place to create the information before it needs to be read or updated, and that, where appropriate, there are processes to delete the information. A CRUD matrix enables us to spot any deficiencies in our model. A partial high-level process map for our vehicle hire business is shown at Figure 4.9.
Figure 4.9 Partial high-level process map Open new branch Identify suppliers Order new vehicles Identify vehicles
60
Receive vehicle delivery
Return rejected vehicle Agree vehicle rental
Accept vehicle return
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
A CRUD matrix has the entity types or object classes along one axis and the processes along the other axis, with the cells annotated with the letters C, R, U or D as appropriate. The CRUD matrix of entity types and object classes against the relevant processes for the ‘vehicle replenishment’ scenario is at Figure 4.10.
Figure 4.10 Completed CRUD matrix Entity type Process
VEHICLE TYPE
PURCHASE ORDER
PURCHASE ORDER ITEM
SUPPLIER
DELIVERY
DELIVERY ITEM
DELIVERY ITEM RECONCILIATION
C
Open new branch C
Identify suppliers Identify vehicles
C
Order new vehicles
R
Receive vehicle delivery Return rejected vehicle
BRANCH
C,R
C
R
U
R
R
C,R
C,R
C
R
R
The completed CRUD matrix highlights two problems that we had not identified earlier. The first problem is that the original text said that there is a business rule that information is not recorded about suppliers until a purchase order has been placed upon that supplier. This was represented by the relationship: Each SUPPLIER must be subject of one or more PURCHASE ORDERS There is, however, a process called ‘Identify suppliers’ that is executed before the ‘Order new vehicles’ process. This implies that information is recorded about the suppliers before the purchase orders are raised and that the relationship should be: Each SUPPLIER may be subject of one or more PURCHASE ORDERS The second problem highlighted by the development of this CRUD matrix is that there is a process called ‘Return rejected vehicle’ that can be executed after the ‘Receive vehicle delivery’ process. This implies that there is a missing entity type or object class to handle the return of a vehicle that is rejected after the delivery item reconciliation. Further investigation reveals that this is indeed true – if a vehicle is rejected, the supplier is contacted and asked to arrange collection, and information needs to be recorded about this collection. Thus, a new entity type or object class, REJECTED VEHICLE COLLECTION, related to DELIVERY ITEM RECONCILIATION, is required. The relationship or association will be fully optional because not every instance of REJECTED VEHICLE COLLECTION will be initiated by an instance of DELIVERY ITEM RECONCILIATION, and not every instance of DELIVERY ITEM RECONCILIATION will give rise to an instance of REJECTED VEHICLE COLLECTION. Our final Ellis-Barker entity relationship model is at Figure 4.11 and the equivalent UML class model is at Figure 4.12. 61
MODELLING BUSINESS INFORMATION
Figure 4.11 The final Ellis-Barker entity relationship model REJECTED VEHICLE COLLECTION PURCHASE ORDER ITEM
caused by
for cited in
VEHICLE TYPE
cause of
DELIVERY ITEM RECONCILIATION
against
part of object of
composed of
PURCHASE ORDER
of
raised against subject of
subject of
DELIVERY ITEM
DELIVERY
part of
SUPPLIER
originated by
composed of
originator of
BRANCH
to recipient of
Figure 4.12 The final UML class model SUPPLIER 1..1
BRANCH 1..1
1..1
is raised against
is made to originates
0..*
0..*
VEHICLE TYPE
0..*
DELIVERY
PURCHASE ORDER
1..1 1..1
0..1 comprises
comprises
is for 0..*
1..* 1..*
PURCHASE ORDER ITEM
DELIVERY ITEM
1..1
1..1 is of is against
REJECTED VEHICLE COLLECTION
62
0..* 0..*
0..1 causes
0..1
DELIVERY ITEM RECONCILIATION
DRAWING AND VALIDATING INFORMATION MODEL DIAGRAMS
Once we have revised our model, we should revalidate and, if necessary, revise again. This process is carried on until we are satisfied that the diagram faithfully meets the information needs of the business area under investigation.
EXERCISES FOR CHAPTER 4 4.1 Draw partial information model diagrams using both Ellis-Barker entity relationship notation and UML class model notation to represent the following business situation:
The Fordland Optical Company provides spectacles to order for a number of clients, most of whom are high street opticians. Each job undertaken by the company is the result of an order from a client; the company does not make spectacles for stock.
The completion of each job is the responsibility of a single employee. When finished, each job comprises a frame, selected from a selection of frames held in stock, and two lenses. The properties of each lens are specified on a prescription; some lenses will be stock lenses, but others will need to be made by an employee, who may be a different employee from the employee tasked with the responsibility for the job.
4.2 The following model is a representation, using Ellis-Barker entity relationship notation, of some of the information needed by Tam Bruce Hardware Ltd, a chain of retail hardware stores located in the south and west of the United Kingdom, to manage their business. SALE TRANSACTION ITEM
for sale of subject of
RETURNED ITEM
STOCK ITEM
described as subject of
part of
STOCK ITEM HOLDING
composed of
SALE TRANSACTION
of subject of supplied by supplier of
SUPPLIER
LOCATION
at holder of
EMPLOYEE
verified by verifier of
employed at staffed with
authorised by authoriser of returned during location and time for
sold during
TILL SESSION
of logged on for
TILL
at location and time for
location for
located within equipped with
63
MODELLING BUSINESS INFORMATION
a Develop a data navigation path diagram to determine whether this model can answer the following query: Determine the employees who sold a particular stock item (SKU2756) on 20 October 2016. Also, determine the stores those employees were working at and the till they were using at the time. b Below is a partial high-level process map for Tam Bruce Hardware Ltd. Open new store Identify suppliers
Sell stock Receive stock
Order new stock
Receive returned stock
Verify stock holding
Issue annual report
Identify stock
64
Develop a CRUD matrix to show which information is created, updated or read in each process (ignore the deletion of information for this exercise).
5 RECORDING INFORMATION ABOUT THINGS
This chapter introduces the related concepts of the attribute, the unique identifier and the domain, and their representation on both entity relationship models and class models. The object-oriented concept of the operation is also introduced at the end of the chapter.
REVISITING ENTITY TYPES, OBJECT CLASSES, RELATIONSHIPS AND ASSOCIATIONS Figure 5.1 shows the type of model we saw at the end of Chapter 3.
Figure 5.1 The previous models HIRE AGREEMENT PERSONAL HIRE AGREEMENT COMPANY HIRE AGREEMENT
VEHICLE TYPE PERSON
made with hirer in
describes
COMPANY
made with
1..1
VEHICLE
1..1
VEHICLE
hired through described by description of
VEHICLE TYPE
COMPANY
1..1 is made with
hirer in
made for
PERSON
1..*
is hired through 0..*
1..1 is made with
0..*
PERSONAL HIRE AGREEMENT
HIRE AGREEMENT 0..*
COMPANY HIRE AGREEMENT
Until now we have only seen entity types or objects classes and the relationships or associations between them. Entity types or object classes represent the things of interest to the business or organisation about which the business or organisation needs to record information. Examples are: yy The vehicle with vehicle registration number KW64 CNV (an entity occurrence or an object) is an instance of the entity type or object class VEHICLE. yy Miss Patricia Johnson (a second entity occurrence or object) who wishes to hire a vehicle is an instance of the entity type or object class PERSON. 65
MODELLING BUSINESS INFORMATION
yy The agreement made with Miss Patricia Johnson to hire KW64 CNV (a third entity occurrence or object) is an instance of the entity type or object class HIRE AGREEMENT and also an instance of the entity type or object class PERSONAL HIRE AGREEMENT. Relationships or associations represent some link between those things that are of interest to the business, so that, for example: Each VEHICLE may be hired through one or more HIRE AGREEMENTS Each HIRE AGREEMENT must be made for one and only one VEHICLE or: Each VEHICLE is hired through zero to many HIRE AGREEMENTS Each HIRE AGREEMENT is made for only one VEHICLE This chapter introduces two other important concepts that we use in modelling information – the attribute and the domain.
INTRODUCTION TO ATTRIBUTES Relationships and associations provide some of the information that is required by the business or organisation: which hire agreements are made for each vehicle and which vehicle is cited in each hire agreement, for instance. But there is more information we want to know and record about the entity occurrences and objects. For example, we saw above that one of the instances of the entity type or object class VEHICLE has a vehicle registration number of KW64 CNV. Furthermore, all instances of the entity type or object class VEHICLE will have a vehicle registration number. There are also other things that the business will wish to record about all of their vehicles, such as the colour of the vehicles and the date that the vehicle was last serviced. These pieces of information, the vehicle registration number, the colour of the vehicles and the date that the vehicle was last serviced are known as attributes. An attribute is defined as: a property of an entity or object that is used to qualify, identify, classify, quantify or in some other way express the state of that entity or object.
All instances of an entity type have the same attributes as do all instances of an object class. Note that the attribute concept is common to both entity relationship modelling and class modelling, although the notations used are different. In the same way that we distinguished entity occurrences (the instances) from entity types (the group of entities with the same characteristics), we draw the same distinction between the ‘attribute occurrence (the value)’ and ‘attribute type’. The attribute
66
RECORDING INFORMATION ABOUT THINGS
occurrence KW64 CNV is the value taken by the attribute type ‘vehicle registration number’ of one of the instances of the entity type VEHICLE. From here on we shall use the terms ‘attribute’ (instead of ‘attribute type’) and ‘value’ (instead of ‘attribute occurrence’). Attributes should be shown on models that are to be discussed with the business. The inclusion of attributes makes a model easier to understand by non-technical business people. Figure 5.2 shows the entity relationship model that we previously saw at Figure 3.13, only now with attributes, the pieces of information we wish to record, added. This shows our vehicle hire agency with two types of hire agreement, the personal hire agreement, made with persons, and the company hire agreement, made with companies.
Figure 5.2 Attribute types shown on an Ellis-Barker entity relationship model HIRE AGREEMENT (m) start date time (m) due return date time (o) actual return date time
PERSONAL HIRE AGREEMENT
COMPANY HIRE AGREEMENT
PERSON
made with hirer in
(m) (m) (m) (o)
name correspondence address primary telephone number alternative telephone number
COMPANY
made with
(o) purchase order number (o) billing address
hirer in
made for hired through
(m) name (m) correspondence address (m) telephone number
VEHICLE
(m) vehicle registration number (m) colour (o) last serviced date described by description of
VEHICLE TYPE (m) (m) (m) (m) (m) (o)
manufacturer model name engine type transmission type first sale date last sale date
The ‘start date time’ attribute of the HIRE AGREEMENT entity type is annotated with the symbol ‘(m)’. This indicates that this attribute is mandatory, which implies that for each and every hire agreement there must be a value recorded for the ‘start date time’ attribute. In other words, the start date and time of the hire of the vehicle by this person or company for which this hire agreement was raised must always be known and recorded. The ‘actual return date time’ attribute of the HIRE AGREEMENT entity type is annotated with the symbol ‘(o)’. This indicates that this attribute is optional, which implies that for each and every hire agreement a value may be recorded for the ‘actual return date time’ attribute. The date and time that the vehicle will be returned at the end of the hire will not be known when information about a hire agreement is first recorded. 67
MODELLING BUSINESS INFORMATION
On entity relationship models, all attributes are ‘single-valued’ – an attribute can never take more than one value at a time. There can only be one value of the ‘start date time’ attribute for any instance of the HIRE AGREEMENT entity type. Despite the ‘single valued attribute’ convention (or rule, as some would say), in modern databases a value to be assigned to an attribute can be quite complex. For this reason, business analysts do not need to get too hung up on the detail. For example, there is a ‘name’ attribute in the PERSON entity type. Some would need to split this down so that they have attributes of ‘title’, ‘given names’ and ‘family name’, but this is unnecessary unless there is a specific business requirement to break names down in this way. A person’s name might actually be recorded as three separate values for three different attributes or could be one of the complex value types, but that decision is part of the how and that should be something for the system developers to consider. The same argument applies to the ‘correspondence address’ attributes in both the PERSON and COMPANY entity types. The equivalent attributed UML class model is shown in Figure 5.3.
Figure 5.3 Attributes shown on a UML class model VEHICLE TYPE manufacturer [1..1] model name [1..1] engine type [1..1] transmission type [1..1] first sale date [1..1] last sale date [0..1]
PERSON name [1..1] correspondence address [1..1] primary telephone number [1..1] alternative telephone number [0..1]
1..1 describes
1..1
COMPANY
1..*
name [1..1] correspondence address [1..1] telephone number [1..1]
VEHICLE
is made with
vehicle registration number [1..1] colour [1..1] last serviced date [0..1]
1..1 is made with
1..1 is hired through 0..*
HIRE AGREEMENT start date time [1..1] due return date time [1..1] actual return date time [0..1]
0..*
PERSONAL HIRE AGREEMENT 0..*
COMPANY HIRE AGREEMENT purchase order number [0..1] billing address [0..1]
When attributes are added to a class model the only requirement, according to the UML specification published by the Object Management Group, is that the attributes have names. It is, however, good practice to add a multiplicity for each attribute. 68
RECORDING INFORMATION ABOUT THINGS
The ‘actual return date time’ attribute of the HIRE AGREEMENT object class is shown with a multiplicity of ‘[0..1]’. The ‘0’, the ‘lower bound’ of this multiplicity, indicates that there can be zero values for this attribute. The ‘1’, the upper bound, indicates that there can only be a maximum of one value for this attribute. From this we can conclude that the ‘actual return date time’ attribute of the HIRE AGREEMENT object class is singlevalued and optional. The ‘start date time’ attribute of the HIRE AGREEMENT object class is shown with a multiplicity of ‘[1..1]’. The ‘1’ for the ‘lower bound’ indicates that there must be at least one value for this attribute. At the same time, the ‘1’ for the upper bound indicates that there cannot be more than one value for this attribute. From this we can conclude that the ‘start date time’ attribute of the HIRE AGREEMENT object class is single-valued and mandatory. The UML specification allows multiplicities such as ‘[1..2]’ and ‘[0..5]’ or even ‘[1..*]’ and ‘[0..*]’. In the PERSON object class in Figure 5.3 there are two attributes for telephone numbers. One is for the primary contact number. The multiplicity for this attribute is ‘[1..1]’, so we can see that this attribute is mandatory. There is also an attribute for an alternative contact number. This has a multiplicity of ‘[0..1]’, so we can see that this attribute is optional. An alternative way to model this would be to have a single attribute, called, say, ‘contact telephone numbers’, which would have a multiplicity of ‘[1..2]’. Since the ‘lower bound’ is ‘1’ the attribute is mandatory (it must take at least one value) and because the ‘upper bound’ is ‘2’ the attribute may take a second value.
THE NAMING OF ATTRIBUTES As with entity types and object classes, attributes are normally named with a noun or noun phrase. This name should clearly represent the business meaning of the value to be recorded for any instance of the attribute. By convention, an attribute name is normally singular. Again, by convention, the name of the entity type or object class containing the attribute is not normally included in the name of the attribute, but is used with the attribute name when referring to the attribute conversationally. Attribute names must be unique within the entity type or object class, but not necessarily unique within the model. So, we have the attribute ‘name’, not ‘person name’, within the entity type or object class PERSON and we also have an attribute ‘name’ within the entity type or object class COMPANY. In conversation, however, we would call these attributes ‘PERSON name’ and ‘COMPANY name’ respectively. It is advisable not to use abbreviations in the names of attributes as this can lead to confusion when discussing models with the business, especially where the abbreviation is not in common use. There is also a danger that abbreviations can be used inconsistently within a modelling team. A very common example of this is where ‘number’ is abbreviated inconsistently to ‘no’, ‘num’, ‘nmbr’ or ‘nbr’.
ENTITY TYPE, OBJECT CLASS OR ATTRIBUTE? Attributes exist to provide the ability to qualify, identify, classify, quantify or in some other way express the state of an entity occurrence or object. Entity occurrences and 69
MODELLING BUSINESS INFORMATION
objects are instances of entity types and object classes respectively. Both attributes and entity types or object classes are named using nouns or noun phrases. When, therefore, should a particular concept be modelled as an entity type or object class or as an attribute? There are two factors that will influence this decision: the circumstances of the business under review; and the experience of the modeller. There are, however, some simple steps a modeller can take to try to resolve this question. Firstly, if you find that you have an entity type or object class with too many attributes, be suspicious, for this normally indicates that there is more analysis to be done. In this case, you will almost certainly find that some of those attributes are really describing things of interest about which the business needs to record information so they could (or, even should) be an entity type or object class in their own right. How many is ‘too many’? One rule of thumb that can be applied is that more than eight attributes is probably too many. Secondly, you should ensure that all of the attributes of an entity type or object class really represent characteristics or properties of the thing (concept) that is represented by the entity type or object class that contains those attribute types. There is always the danger that we have an attribute of a concept that should really be an attribute of another concept. We could, for example, create an entity type or object class called EMPLOYEE with attributes such as ‘staff number’, ‘name’, ‘birth date’, ‘residence address’, ‘contact telephone number’, ‘alternative contact telephone number’, ‘national insurance number’, ‘employment start date’, ‘employment end date’, ‘next-of-kin name’, ‘next-of-kin relationship’, ‘next-of-kin address’ and ‘next-of-kin contact telephone number’. No doubt you can think of information that a business would wish to record about its employees, but there we have 13 attributes, which is, according to the rule of thumb, too many. So, let us consider those attributes and see where it takes us. The attributes ‘staff number’, ‘employment start date’ and ‘employment end date’ concern a period of employment, but all of the other attributes represent characteristics of the people who are employed. We need, therefore, an entity type or object class called PERSON. Each person who is employed could have more than one period of employment with our vehicle hire company, so there could be more than one value for both of ‘employment start date’ and of ‘employment end date’. This implies that we need a new entity type or object class called, say, PERSON EMPLOYMENT and there will be a one-to-many relationship from PERSON to PERSON EMPLOYMENT. A person’s name can change because of marriage, divorce, deed poll (a legal document that proves a change of name) or just a desire to be known by another name. This leads to another new entity type or object class called PERSON NAME, with a one-to-many relationship from PERSON to PERSON NAME. Attributes of PERSON NAME will probably be able to record the date of the change of name (‘effective date’) and the reason for the change (‘change reason’) as well as the name itself (‘name’). Of
70
RECORDING INFORMATION ABOUT THINGS
course, in our business we may only be interested in the current name of any people we employ or do business with and creating a PERSON NAME entity type or object class would, in these circumstances, be unnecessary. It is important to understand the business needs. There is an attribute of EMPLOYEE called ‘residence address’. More than one employee might live at the same address and the address (or rather the property represented by the address) could, therefore, become something of interest to the business about which information needs to be recorded. Also, an employee may change their residence. Now there is a many-to-many relationship to a new entity type or object class called PROPERTY with an associative entity type or object class to handle the dates that the employee was resident at that property. Finally, there are four attributes associated with a next-of-kin. A next-of-kin is a person, and could even be another person who is also employed by our vehicle hire company. Instead of these attributes there should be a structure to recognise that a person can only have one next-of-kin, but someone designated as a next-of-kin can be a next-of-kin for more than one person. Our entity type or object class of EMPLOYEE with 13 attributes could be shaken out into the Ellis-Barker entity relationship structure shown in Figure 5.4 if there was sufficient business need.
Figure 5.4 EMPLOYEE expanded (shown in Ellis-Barker entity relationship notation) PERSON NAME (m) name (o) effective date (o) change reason
PERSON EMPLOYMENT (m) staff number (m) start date (o) end date
PERSON NEXT-OF-KIN (m) relationship
PERSON
of named with
(m) (m) (o) (o)
birth date contact telephone number alternative contact telephone number national insurance number
of employed through
nominated by nominator of nominated by nominated as
PERSON RESIDENCE (m) from date (o) to date
of resides at property specified as role of cited as
PROPERTY (o) (m) (m) (o) (m) (m)
apartment number building name or number street locality town postcode
71
MODELLING BUSINESS INFORMATION
Figure 5.5 shows the equivalent UML class model structure. Figure 5.5 EMPLOYEE expanded (shown in UML class modelling notation) PERSON
PROPERTY
birth date [1..1] contact telephone number [1..1] alternative contact telephone number [0..1] national insurance number [0..1]
1..1 of
1..1
1..1
PERSON NAME name [1..1] effective date [0..1] change reason [0..1]
1..1
1..1 1..1
nominated for of
nominated by 0..*
apartment number [0..1] building name or number [1..1] street [1..1] locality [0..1] town [1..1] postcode [1..1]
role of
of 0..1
0..*
PERSON NEXT-OF-KIN relationship [1..1]
0..*
PERSON EMPLOYMENT staff number [1..1] start date [1..1] end date [0..1]
0..*
0..*
PERSON RESIDENCE from date [1..1] to date [0..1]
UNIQUE IDENTIFIERS An entity type represents a set of entities, where an entity is thing of interest to the business about which information needs to be recorded. An attribute of an entity type is a property that is used to qualify, identify, classify, quantify or in some other way express the state of an entity. If we wish to record information about an entity and then to later retrieve that information (for, why else would we be recording the information), that entity must be identifiable in some way. Each entity type must, therefore, have some characteristics that make each instance of the entity type uniquely identifiable. In that way, each instance of an entity type will be distinctly identifiable from all other instances of that entity type. It is useful, therefore, to identify those characteristics of an entity type that can be used to make each instance of the entity type uniquely identifiable. Not surprisingly, those characteristics are known as a unique identifier. The identification of unique identifiers is an essential part of the modelling of information. A unique identifier for an entity type can be defined as: an attribute, a combination of attributes, a combination of relationships or a combination of attribute(s) and relationships(s) that provides the ability for each entity to be uniquely identifiable so that each instance of an entity type is distinctly identifiable from all other instances of that entity type. 72
RECORDING INFORMATION ABOUT THINGS
In Figure 5.6 unique identifiers have been added to our attributed entity relationship model for our vehicle hire company:
Figure 5.6 Unique identifiers on an Ellis-Barker entity relationship model HIRE AGREEMENT (#) start date time (m) due return date time (o) actual return date time
PERSONAL HIRE AGREEMENT
COMPANY HIRE AGREEMENT
PERSON made with hirer in
(#) (m) (m) (m) (o)
identifier name correspondence address primary telephone number alternative telephone number
COMPANY
made with
(o) purchase order number (o) billing address
hirer in
made for hired through
(#) (m) (m) (m)
identifier name correspondence address telephone number
VEHICLE
(#) vehicle registration number (m) colour (o) last serviced date described by description of
VEHICLE TYPE (#) (#) (#) (#) (#) (o)
manufacturer model name engine type transmission type first sale date last sale date
The VEHICLE entity type has three attributes: ‘vehicle registration number’, ‘colour’ and ‘last serviced date’. The vehicle hire company will have many vehicles of the same colour, so the ‘colour’ attribute cannot be used as a unique identifier. Also, many vehicles can be serviced on the same date, so the ‘last serviced date’ attribute cannot be used. However, if it is a rule within the business that once a vehicle joins the hire fleet its registration will not change, the ‘vehicle registration number’ attribute can be used as a unique identifier. For the VEHICLE entity type, therefore, we can select the single attribute ‘vehicle registration number’ as the unique identifier. A single attribute used as a unique identifier is known as a simple identifier. We replace the ‘(m)’ symbol of the ‘vehicle registration number’ attribute to the ‘(#)’ symbol to indicate that it is the unique identifier. All elements of a unique identifier are mandatory, so the ‘(#)’ symbol subsumes the ‘(m)’ symbol. The VEHICLE TYPE entity type has a total of six attributes: ‘manufacturer’, ‘model name’, ‘engine type’, ‘transmission type’, ‘first sale date’ and ‘last sale date’. A Vauxhall Astra is a different type of vehicle from a Vauxhall Corsa. A Vauxhall Corsa with a petrol engine and manual transmission is a different type of vehicle from a Vauxhall Corsa with a diesel engine and automatic transmission. Furthermore, Vauxhall frequently updates the Corsa. From this we can deduce that we need, at least, the combination of the values 73
MODELLING BUSINESS INFORMATION
of the ‘manufacturer’, ‘model’, ‘engine type’, ‘transmission type’ and ‘first sale date’ attributes to uniquely identify a type of vehicle. All of these five attributes are annotated with the symbol ‘(#)’ to indicate that they are all part of the unique identifier. Where the unique identifier is a combination of two or more attributes it is known as a composite identifier. The PERSON entity type has four attributes: ‘name’, ‘correspondence address’, ‘primary telephone number’ and ‘alternative telephone number’. The ‘alternative telephone number’ attribute cannot be used as part of the unique identifier because it is optional – an attribute must be mandatory to be considered as part of a unique identifier. None of the other attributes, ‘name’, ‘correspondence address’ or ‘primary telephone number’, can be guaranteed to be unique for any one person. Equally, there is no combination of mandatory attributes that can be guaranteed to be unique for any one person. It is possible that there are two women known as Patricia Johnson, a mother and daughter, who both use the same address for correspondence. Under these circumstances, where there are no ‘real-world’ identifiers, we create a new attribute, in this case called ‘identifier’. A unique identifier created because there is no real-world identifier is known as a surrogate identifier. Surrogate identifiers are normally, but not exclusively, a single attribute called ‘identifier’. The HIRE AGREEMENT entity type has only two mandatory attributes: ‘start date time’ and ‘due return date time’. Neither of these is sufficient on its own as a unique identifier. We could create another surrogate identifier but unique identifiers normally reflect some rule within the business and surrogate identifiers should be avoided wherever possible. There is, however, a set of characteristics which can uniquely identify a hire agreement, and this is the combination of the vehicle and the date and time that the hire started. The unique identifier is, therefore, the combination of the ‘start date time’ attribute, now annotated with a ‘(#)’, and the mandatory relationship from the HIRE AGREEMENT entity type to the VEHICLE entity type, which is now marked with a bar to show that it is an identifying relationship. This is known as a hierarchic identifier. The identification of unique identifiers, and their agreement with the business, helps to clarify some of the rules of the business. There is no equivalent concept with information or data models created using the UML class model notation. This is because class models were originally devised to support the development of object-oriented systems where each object has a system generated ‘object identifier’ which is hidden from the user.
DOMAINS An attribute exists to provide values that help to qualify, identify, classify, quantify or in some other way express the state of the instances of an entity type or object class. Each attribute value is drawn from a domain, which can be defined as: a named pool of values from which an attribute must take its value.
74
RECORDING INFORMATION ABOUT THINGS
In other words, a domain provides a set of business validation rules, format constraints and other properties for one or more attributes. A domain can exist as a list of specific values, as a range of values, as a set of qualifications or any combination of these. In practice, for all instances of an entity type or an object class the values of any one attribute must be taken from a single domain. A domain may, however, be the provider of values for more than one attribute. Each domain can be classified as being of one of two types: the enumerated domain, a domain that consists of a fixed list of permitted values, and the described domain, a dynamic domain with new values being added and deleted as necessary – a described domain is sometimes called a non-enumerated domain. The permitted values that constitute an enumerated domain are normally values that are obtained from the business. For example, the enumerated domain called ‘VEHICLE COLOUR’ provides values for the ‘colour’ attribute of the VEHICLE entity type or object class. The values the business has decided it wishes to use for the specification of colours of the vehicles that it has available for hire are ‘black’, ‘white’, ‘silver’, ‘blue’, ‘green’, ‘red’ and ‘beige’. A described domain grows or shrinks as new values are added or deleted. For example, the ‘TELEPHONE NUMBER’ domain provides values for the ‘primary telephone number’ and the ‘alternative telephone number’ attributes of PERSON and the ‘telephone number’ attribute of COMPANY. When a telephone number is recorded for a new person or company the size of the ‘TELEPHONE NUMBER’ domain grows. When the record of a person or company is deleted, and the associated telephone numbers are also deleted, the domain shrinks. There may, or may not, be a rule, often called a validation rule, attached to a described domain to constrain the values of the domain. Domains represent a business concept. They are not a technical concept. It is important that domains are recognised and documented during business analysis. With entity relationship modelling there is no standard approach to the documentation of domains. This is also true of described domains in class modelling. There is, however, a standard way of representing an enumerated domain on a class model, using a special type of ‘class’ called an ‘enumeration class’. This is shown in Figure 5.7 for the ‘VEHICLE COLOUR’ domain.
THE UML EXTENDED ATTRIBUTE NOTATION A very rich full notation for attributes is available in the UML specification for class models. Most of this extended notation is not used by business analysts. It is, however, shown in Figure 5.8 and described below for completeness. As well as the name of the attribute and its multiplicity, an attribute specification on a class model can include a data type for the values, any default value, any constraints that might apply to the values and an indication as to whether the attribute is a derived attribute. The data type for the values for the attribute can be a primitive data type, a structured data type or an enumerated type. 75
MODELLING BUSINESS INFORMATION
Figure 5.7 The UML class
VEHICLE COLOUR
PERSON
VEHICLE TYPE manufacturer [1..1] model name [1..1] engine type [1..1] transmission type [1..1] first sale date [1..1] last sale date [0..1]
black white silver blue green red beige
name [1..1] correspondence address [1..1] primary telephone number [1..1] alternative telephone number [0..1]
1..1 describes
1..1
COMPANY
1..*
name [1..1] correspondence address [1..1] telephone number [1..1]
VEHICLE
is made with
vehicle registration number [1..1] colour [1..1] last serviced date [0..1]
1..1 is made with 0..*
1..1
PERSONAL HIRE AGREEMENT
is hired through 0..*
HIRE AGREEMENT
0..*
COMPANY HIRE AGREEMENT
start date time [1..1] due return date time [1..1] actual return date time [0..1]
purchase order number [0..1] billing address [0..1]
Figure 5.8 The UML extended attribute notation
VEHICLE COLOUR
PERSON
VEHICLE TYPE manufacturer : String [1..1] model name : String [1..1] engine type : String [1..1] transmission type : String [1..1] first sale date : Calendar Date [1..1] last sale date : Calendar Date [0..1]
name : String [1..1] correspondence address : Address [1..1] primary telephone number : String [1..1] alternative telephone number : String [0..1]
black white silver blue green red beige
1..1 describes
1..1
COMPANY
1..*
VEHICLE
name : String [1..1] correspondence address : Address [1..1] telephone number : String [1..1]
is made with
vehicle registration number : String [1..1] colour : Vehicle Colour [1..1] last serviced date : Calendar Date [0..1] {