Information Modelling: A Pragmatic Approach [1st ed. 2022] 9783030988043, 9783030988050, 303098804X

This textbook provides solid guidance on how to produce information models in practice. Information modeling has become

105 74 5MB

English Pages 238 [230] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1: Introduction
1.1 Aim and Scope
1.2 Approach
1.3 Key Strengths
1.4 Figures, Examples, Exercises and Solutions
1.5 Book Contents
1.6 Conclusion
2: What Is Information?
2.1 Introduction
2.2 Information Situations
Example: Emergency Response
Example: Online Grocery
Exercise: Driverless Cars
2.3 What Information Is and Is Not
Example
Example
Example
Example
Exercise
2.4 The Stands for Relation
Example
Exercise
2.5 Identifiers
Example
Example
Example
Example
Example
Exercise
2.6 Descriptors
Example
Exercise
2.7 Communicative Acts
Example
Example
Example
Example
Examples
Example
Example
Example
Example
Example
Exercise
Example
2.8 Patterns of Information Situations
2.9 Physical and Institutional (Social) Ontology
Example
Example
Example
Example
Example
Example
Example
Example
Exercise
2.10 Conclusion
2.11 Summary
3: Why Model Information?
3.1 Introduction
3.2 A Short History of Information Modelling
3.3 The Notion of a Model
Example
Example
Exercise
3.4 Information Models and Reality
Example
Exercise
Exercise
3.5 What Are Information Models for?
3.6 Investigating the Ontology of Domains
3.7 Conversations for Action
Example
3.8 Visualising Patterns of Information Situations
Example
3.9 Documenting a Pattern of Information Situations
Exercise
3.10 Conclusion
3.11 Summary
4: Information Modelling from First Principles
4.1 Introduction
4.2 Objects and Identifiers
Example
Example
Exercise
4.3 Classification and Instantiation
Exercise
Example
Example
Example
Example
Example
Exercise
4.4 Attribution
Example
Example
Example
4.5 Valuing an Object and Forming an Object Class
Example
Example
Example
4.6 Association
Example
Example
Example
Example
Example
Example
Example
Example
Exercise
4.7 Constraints upon Association
Example
Example
Example
Example
Example
4.8 Generalisation and Specialisation
Example
Example
Example
Example
4.9 Generalisation Hierarchies and Lattices
Example
Example
Example
4.10 Aggregation and Decomposition
Example
Example
4.11 Institutional Ontology as a Sign Lattice
4.12 Conclusion
4.13 Summary
5: Visualising an Information Model
5.1 Introduction
5.2 Why Visualise?
Exercise
5.3 Notations for an Information Model Diagram
5.4 Visualising Classes
Example
Example
5.5 Visualising Relationships of Association
Example
Example
5.6 Visualising Attributes
Example
Exercise
5.7 Visualising Constraints upon Association
Example
Exercise
5.8 Visualising Generalisation
Example
Exercise
5.9 Visualising Aggregation
Example
Exercise
5.10 Institutional Facts to an Information Model Diagram
Example
5.11 Conclusion
5.12 Summary
6: Composing an Information Model from Institutional Facts
6.1 Introduction
6.2 A Pattern of Information Situations
Example
6.3 Unpacking the Content of Messages
Example
Exercise
6.4 Generating Institutional Facts
6.5 Validating an Information Model
6.6 Revising Information Models
Example
Exercise
Example
Example
Example
6.7 Conclusion
6.8 Summary
7: Practical Issues in Information Modelling
7.1 Introduction
7.2 Class, Attribute or Relationship
Example
Example
Example
7.3 Repeating Attributes
Example
7.4 One-to-One Relationships
Example
Example
7.5 When to Generalise and Aggregate
Example
Example
Example
Example
7.6 Strong and Weak Classes
Example
Example
Example
7.7 Recursive and Ternary Relationships
Example
Example
7.8 Modelling Time
Example
7.9 Connection Traps
Example
Example
7.10 Information Model Patterns
Example
Example
Example
7.11 Conclusion
7.12 Summary
8: Information Modelling and Data Systems
8.1 Introduction
8.2 Data and Information
Example
Example
Example
Example
8.3 Data Structures
Example
Example
Exercise
8.4 The Ontological Status of Data Structures
Example
Example
Example
8.5 Data Models
Example
8.6 The Relational Data Model
Exercise
Exercise
Example
8.7 Normalisation
Example
8.8 Turning an Information Model into a Relational Schema
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
8.9 Visualising Data Structures
8.10 Identifiers and Candidate Keys
Example
8.11 Determinancy Diagrams
Exercise
Example
Example
Example
Exercise
Example
8.12 Conclusion
8.13 Summary
9: Information Modelling in Context
9.1 Introduction
9.2 The Place of Information Modelling Within Business Analysis and Design
9.3 Data and Metadata
Example
Example
Example
Example
9.4 The World Wide Web and Metadata
Example
9.5 Information Modelling and XML
Example
9.6 The Semantic Web
Example
9.7 Big Data
Example
Example
Example
Example
Example
Example
Example
Example
9.8 The Notion of a Data Science
9.9 Conclusion
9.10 Summary
10: Exercises
10.1 Introduction
10.2 Information Classes
Exercise 10.2.1
Exercise 10.2.2
10.3 Classification
Exercise 10.3.1
Exercise 10.3.2
Exercise 10.3.3
10.4 Relationships
Exercise 10.4.1
Exercise 10.4.2
10.5 Attributes
Exercise 10.5.1
Exercise 10.5.2
10.6 Identifiers
Exercise 10.6.1
Exercise 10.6.2
10.7 Constraints on Relationships of Association
Exercise 10.7.1
Exercise 10.7.2
Exercise 10.7.3
Exercise 10.7.4
Exercise 10.7.5
10.8 Generalisation
Exercise 10.8.1
Exercise 10.8.2
Exercise 10.8.3
10.9 Generalisation Hierarchies
Exercise 10.9.1
Exercise 10.9.2
Exercise 10.9.3
10.10 Aggregation
Exercise 10.10.1
Exercise 10.10.2
10.11 Visual notation
Exercise 10.11.1
Exercise 10.11.2
Exercise 10.11.3
Exercise 10.11.4
Exercise 10.11.5
Exercise 10.11.6
10.12 Strong and Weak Classes
Exercise 10.12.1
10.13 Recursive and Ternary Relationships
Exercise 10.13.1
Exercise 10.13.2
Exercise 10.13.3
10.14 Composing an Information Model
Exercise 10.14.1
Exercise 10.14.2
Exercise 10.14.3
Exercise 10.14.4
Exercise 10.14.5
Exercise 10.14.6
Exercise 10.14.7
Exercise 10.14.8
10.15 Modelling Time
Exercise 10.15.1
10.16 Connection Traps
Exercise 10.16.1
Exercise 10.16.2
Appendix: Solutions to Exercises
A.1 Introduction
A.2 Information Classes
Exercise 10.2.1
Exercise 10.2.2
A.3 Classification
Exercise 10.3.1
Exercise 10.3.2
Exercise 10.3.3
A.4 Relationships
Exercise 10.4.1
Exercise 10.4.2
A.5 Attributes
Exercise 10.5.1
Exercise 10.5.2
A.6 Identifiers
Exercise 10.6.1
Exercise 10.6.2
A.7 Constraints on Relationships of Association
Exercise 10.7.1
Exercise 10.7.2
Exercise 10.7.3
Exercise 10.7.4
Exercise 10.7.5
A.8 Generalisation
Exercise 10.8.1
Exercise 10.8.2
Exercise 10.8.3
A.9 Generalisation Hierarchies
Exercise 10.9.1
Exercise 10.9.2
Exercise 10.9.3
A.10 Aggregation
Exercise 10.10.1
Exercise 10.10.2
A.11 Visual Notation
Exercise 10.11.1
Exercise 10.11.2
Exercise 10.11.3
Exercise 10.11.4
Exercise 10.11.5
Exercise 10.11.6
A.12 Strong and Weak Classes
Exercise 10.12.1
A.13 Recursive and Ternary Relationships
Exercise 10.13.1
Exercise 10.13.2
Exercise 10.13.3
A.14 Composing an Information Model
Exercise 10.14.1
Exercise 10.14.2
Exercise 10.14.3
Exercise 10.14.4
Exercise 10.14.5
Exercise 10.14.6
Exercise 10.14.7
Exercise 10.14.8
A.15 Modelling Time
Exercise 10.15.1
A.16 Connection Traps
Exercise 10.16.1
Exercise 10.16.2
Bibliography
Recommend Papers

Information Modelling: A Pragmatic Approach [1st ed. 2022]
 9783030988043, 9783030988050, 303098804X

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Paul Beynon-Davies

Information Modelling A Pragmatic Approach

Information Modelling

Paul Beynon-Davies

Information Modelling A Pragmatic Approach

Paul Beynon-Davies Cardiff Business School Cardiff University Cardiff, UK

ISBN 978-3-030-98804-3 ISBN 978-3-030-98805-0 https://doi.org/10.1007/978-3-030-98805-0

(eBook)

# The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

I have been engaging with issues of data and its use and effects for over 40 years. During all this time, I have always cringed when people from all sectors of society, economy and polity use the terms data and information as synonyms. Surely, I have always thought, there must be a clearer way in which we can both define and utilise the notion of data as well as the notion of information. In recent times, I have been able to provide more focus to resolving my long-standing unease, and to do this, I have had to ask myself some strange questions, such as why we have data in the first place, what does data do in practice for us and why do people often experience problems with data, even in the age of mass communication and ubiquitous technology. In a recent book of mine (Beynon-Davies 2021b), I used the following allegory to highlight what I think is the central source of this ambiguity surrounding data and information. This allegory has been told many times in many different quarters, and it goes something like this. Two young fish are swimming along when they meet an older fish. The older fish greets them while passing by saying, ‘morning, how’s the water?’. Having swam a little further on, one young fish pauses, turns to the other young fish and asks, ‘what’s water?’. Both data and information are a bit like the young fish’s water—they are an inherent and important part of our surround-world. But because they are mundane and accepted, we all tend to assume that we understand what data is and how it relates to information. For many people, answers to the questions I pose about data seem obvious—we have data because it provides information for us, information enables us to make better decisions and the problems we experience with data are purely down to a lack of good organisation of data or poor processes of data collection. But let us pause for a moment. Are these accepted characterisations true? Data are represented in data structures, and data structures can certainly be used to inform but about what? Data structures can be used as collective memory of what has happened in some domain or what is happening, but they can also be used to make things happen in the future. All data structures in some sense mis-inform as well as inform, because in the very nature of creating a memory trace of something or someone, the maker of the data structure makes a decision about what is significant to represent and, as a consequence, what is not. Hence, data structures are not only memory traces—they are also deliberate acts of forgetting. Sometimes, data v

vi

Preface

structures are deliberately created to mis-inform in the sense they may be designed either explicitly or implicitly to portray a particular worldview, and such a worldview may be open to question by various groups and individuals in society. This means that data structures in many settings inherently carry with them the ‘politics’ of their creation. Hence, data structures and the way they are made should not always be seen to be inevitably beneficial because they can be used frankly for some very evil purposes. Many data structures are also not always useful in the sense that the making of such records serves to disable human performance in areas such as decision-making, as well as support such performance. These issues I have with the common sense understanding of data and information transpose over into an unease I have always encountered with information modelling or, as it used to be called, data modelling (another example of how we tend to treat data and information as somewhat the same thing). I have used information modelling many times within practical work in my engagement with industry and the public sector. However, it was only when I was required to teach this technique to students that I experienced difficulties in explaining to novices how I, as an experienced practitioner of the technique, arrived at a particular information model for a certain set of circumstances. To help resolve these difficulties both for myself and my students, I began investigating a better way of approaching information modelling. This led me in the 1990s (Beynon-Davies 1992) to publish a paper which explored the relationship of information modelling to semiotics—the doctrine of signs. Since that time, I have used the notion of a sign to better position notions of data and information and explain more clearly how they relate together. In essence, data are differences made in some substance by some actor. Information, in contrast, is an accomplishment made by some actor in his or her encounter with data. For information to exist, it is therefore necessary to have data—information is a set of differences which make a difference to some actor. But data is not ever-present in the world waiting to be ‘collected’ by the actor. Data involves the explicit creation of structures by actors, and in doing so, such actors make decisions not only about what to represent but how to represent it. Once such structures are created, there is no guarantee that information will be accomplished by other actors in their encounter with these data structures. For the relationship between data and information to be achieved, there must be a common ontology shared between actors, which enables them to accomplish information with certain data structures. Such an ontology amounts to a set of conventions established amongst a community of actors about what certain data structures communicate. This more accurate and nuanced understanding of how data relates to information is not just a theoretical exercise—it has practical consequences. It has led us to develop a much more productive way of doing a number of things with information modelling. First, it enables us to explain more clearly the purpose of information modelling—what this technique is actually meant to achieve. Second, it allows us to

Preface

vii

explain in a much more straightforward manner the key principles of this technique to novice users. Third, and finally, it enables us to provide much more productive guidance on how to undertake this technique in practice. But don’t take my word on this. To see if I am true to my word on these three points, read on. Rhondda, South Wales, UK 2022

Paul Beynon-Davies

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Aim and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Key Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Figures, Examples, Exercises and Solutions . . . . . . . . . . . . . 1.5 Book Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

1 1 2 3 4 4 7

2

What Is Information? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Information Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 What Information Is and Is Not . . . . . . . . . . . . . . . . . . . . . . 2.4 The Stands for Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Communicative Acts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Patterns of Information Situations . . . . . . . . . . . . . . . . . . . . . 2.9 Physical and Institutional (Social) Ontology . . . . . . . . . . . . . 2.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

9 9 10 14 17 18 21 22 28 29 33 33

3

Why Model Information? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Short History of Information Modelling . . . . . . . . . . . . . . 3.3 The Notion of a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Information Models and Reality . . . . . . . . . . . . . . . . . . . . . . 3.5 What Are Information Models for? . . . . . . . . . . . . . . . . . . . . 3.6 Investigating the Ontology of Domains . . . . . . . . . . . . . . . . . 3.7 Conversations for Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Visualising Patterns of Information Situations . . . . . . . . . . . . 3.9 Documenting a Pattern of Information Situations . . . . . . . . . . 3.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

35 35 35 37 39 41 42 45 47 50 53 54

ix

x

Contents

4

Information Modelling from First Principles . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Objects and Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Classification and Instantiation . . . . . . . . . . . . . . . . . . . . . . . 4.4 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Valuing an Object and Forming an Object Class . . . . . . . . . . 4.6 Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Constraints upon Association . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Generalisation and Specialisation . . . . . . . . . . . . . . . . . . . . . 4.9 Generalisation Hierarchies and Lattices . . . . . . . . . . . . . . . . . 4.10 Aggregation and Decomposition . . . . . . . . . . . . . . . . . . . . . 4.11 Institutional Ontology as a Sign Lattice . . . . . . . . . . . . . . . . 4.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

55 55 55 57 61 62 63 66 70 72 73 75 78 78

5

Visualising an Information Model . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Why Visualise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Notations for an Information Model Diagram . . . . . . . . . . . . 5.4 Visualising Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Visualising Relationships of Association . . . . . . . . . . . . . . . . 5.6 Visualising Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Visualising Constraints upon Association . . . . . . . . . . . . . . . 5.8 Visualising Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Visualising Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Institutional Facts to an Information Model Diagram . . . . . . . 5.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

81 81 81 82 84 84 86 88 89 91 92 96 96

6

Composing an Information Model from Institutional Facts . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 A Pattern of Information Situations . . . . . . . . . . . . . . . . . . . 6.3 Unpacking the Content of Messages . . . . . . . . . . . . . . . . . . . 6.4 Generating Institutional Facts . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Validating an Information Model . . . . . . . . . . . . . . . . . . . . . 6.6 Revising Information Models . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

99 99 100 102 108 109 114 117 117

7

Practical Issues in Information Modelling . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Class, Attribute or Relationship . . . . . . . . . . . . . . . . . . . . . . 7.3 Repeating Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 One-to-One Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 When to Generalise and Aggregate . . . . . . . . . . . . . . . . . . . .

. . . . . .

119 119 119 121 122 123

Contents

7.6 7.7 7.8 7.9 7.10 7.11 7.12

xi

. . . . . . .

124 125 127 128 131 132 133

8

Information Modelling and Data Systems . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Data and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 The Ontological Status of Data Structures . . . . . . . . . . . . . . . . 8.5 Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 The Relational Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 Turning an Information Model into a Relational Schema . . . . . 8.9 Visualising Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 8.10 Identifiers and Candidate Keys . . . . . . . . . . . . . . . . . . . . . . . . 8.11 Determinancy Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

135 135 135 137 140 142 143 146 148 155 156 157 162 163

9

Information Modelling in Context . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Place of Information Modelling Within Business Analysis and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Data and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 The World Wide Web and Metadata . . . . . . . . . . . . . . . . . . . 9.5 Information Modelling and XML . . . . . . . . . . . . . . . . . . . . . 9.6 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 The Notion of a Data Science . . . . . . . . . . . . . . . . . . . . . . . 9.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

166 168 172 173 175 177 181 182 183

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Information Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Constraints on Relationships of Association . . . . . . . . . . . . . 10.8 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 Generalisation Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

185 185 185 186 187 187 187 188 189 189

10

Strong and Weak Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . Recursive and Ternary Relationships . . . . . . . . . . . . . . . . . . Modelling Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connection Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Information Model Patterns . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 165 . 165

xii

Contents

10.10 10.11 10.12 10.13 10.14 10.15 10.16

Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Visual notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strong and Weak Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . Recursive and Ternary Relationships . . . . . . . . . . . . . . . . . . Composing an Information Model . . . . . . . . . . . . . . . . . . . . Modelling Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connection Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

190 190 191 191 193 196 197

Appendix: Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Information Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 Constraints on Relationships of Association . . . . . . . . . . . . . . . A.8 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.9 Generalisation Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . A.10 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.11 Visual Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.12 Strong and Weak Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.13 Recursive and Ternary Relationships . . . . . . . . . . . . . . . . . . . . A.14 Composing an Information Model . . . . . . . . . . . . . . . . . . . . . . A.15 Modelling Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.16 Connection Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

199 199 199 200 201 202 203 204 205 208 209 210 212 212 215 217 218

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

1

Introduction

1.1

Aim and Scope

The aim of this book is to provide a tutorial introduction to information modelling for use on undergraduate and postgraduate modules in information systems, information technology and computer science and even digitally focused modules within business and management. The book will also be of relevance to practitioners looking for a fresh and innovative approach to the design of data systems. Traditionally, information modelling has been important to technologists tasked with creating data systems of various forms. More recently, it has influenced practices in other areas such as building construction and architecture. The approach is increasingly relevant as an approach for understanding the active role that data plays within business and management and promoting the planning of business activity around the proper design and management of data systems. Information modelling has been around for some decades, but as evidence of the increasing importance of information modelling: • This technique is still very important to the contemporary designer of data systems of many forms. • The technique is also of much use to the modern data analyst/data scientist in establishing the proper context for data analytics. • Most contemporary academic courses in computer science, information technology and information systems worldwide cover information modelling somewhere in their curriculum. • The Association for Computing Machinery places information modelling within its guide curricula for software engineering and information systems. • The British Computer Society offers a qualification in data analysis and places information modelling at the heart of this endeavour. • The Data Management Association (DAMA) is a professional association for data managers and includes information modelling within its professional body of knowledge. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_1

1

2

1 Introduction

The book takes a fresh and innovative approach to information modelling based in the author’s previous research and consultancy work. This approach considers information modelling to be an exercise in building an account of the communicative practice important to a group of actors attempting to work in a coordinated manner within some institutional context. This means that the standard constructs of information modelling are located clearly within a solid theoretical background of what we refer to as information situations.

1.2

Approach

Information modelling has been an important technique within the armoury of the business analyst for over four decades. It began with the work of Chen in the 1970s, developed through the work of a range of people working on the so-called semantic data models in the 1980s and settled into the object-oriented frameworks of the 1990s through class modelling. Since that time, the technique has stabilised to form a major part of the toolkit of the contemporary business analyst. However, there is a key problem with information modelling, which was cogently summed up by David Hay way back in the 1990s. He stated that the central problem faced by practitioners of information modelling as ‘learning the basics of a modelling technique is not the same as learning how to use and apply it . . . [Information] modelling is particularly complex to learn, because it requires the modeller to gain insights into an organization’s nature that do not come easily’ (Hay 1996). Part of the attraction of information modelling, at least at its core, is that it uses relatively few constructs. These constructs are also readily imparted to newcomers to the technique. But students and practitioners when attempting to model using this simple approach find it extremely difficult to apply effectively when engaging with actual instances of institutional action. I make the claim within this book that many of the problems experienced with the conduct of information modelling in practice are due to a misconceived notion of the proper context for information modelling, which I refer to as information situations—situations in which information is accomplished by institutional actors. Within this book, I propose and demonstrate a way of thinking about the nature of information models in relation to information situations which helps resolve many of the difficulties experienced with conducting information modelling in practice. This leads me to describe an innovative approach to composing an information model which does justice to this different way of thinking about the relationship between an information model, communicative competence and institutional reality. I locate the basis for many of the practical problems experienced with this popular business analysis and design technique in the pragmatics of information models. Pragmatics as a term has at least two senses appropriate to understanding the nature of information models. In one sense, pragmatics is used to denote a sub-field of linguistics and semiotics, particularly concerned with the relationship between signs and their use within context. It is in this sense that we build a theory of information

1.3 Key Strengths

3

situations which does justice to a vast amount of literature in this area. In another sense, pragmatics denotes the application of pragmatism—that branch of philosophy associated originally with the work of American scholars Peirce, James, Dewey and Mead. The key principles of pragmatism are that human concepts are defined by their consequences, truth is embodied in practical outcome and learning is controlled inquiry, in which rational thought is interspersed with action. Although there is no direct relationship between pragmatics as a linguistic endeavour and pragmatism as philosophical orientation, there is evident common ground in the positioning of both knowledge and reality in the centrality of action. I locate problems experienced with practical information modelling in a misconceived understanding of the relationship between the constructs of an information model and their proper context—namely, institutional reality. This is an issue of pragmatics. But I also wish to focus upon the nature of information models as a way-station to institutional action. In this sense, I consider the pragmatic consequences of an information model to be critical to both its design and use. To understand any modelling technique, we need to understand three things: constructs, notation and principles of application. All three are described for information modelling in a tutorial manner within this book. The book begins by establishing the bedrock for the student by discussing the nature of information in terms of a theory of information situations. This leads to a discussion of why it is considered important to model information. An account is then created of the key constructs of information modelling—classes, attributes, association relationships, generalisation and aggregation—in terms of this bedrock theory. Various ways of visualising an information model are then discussed, followed by an account of our innovative way of composing an information model in practice based around an analysis of communicative patterns evident within some current or future domain of organisational action. This leads to a discussion of translating an information model into a design for some data system. As we shall see, this can be undertaken in both a top-down and a bottom-up manner. We then address certain practical issues that arise in the conduct of information modelling. Finally, we discuss the positioning of information modelling within certain areas that influence the modern digital landscape—that of metadata and the semantic web and the developing disciplines of data analytics and data science. Accompanying chapters provide a set of closely integrated exercises and sample solutions.

1.3

Key Strengths

Compared to existing literature in this area, this book has a number of key strengths. First, no prerequisite knowledge is assumed on the part of the reader. Students and practitioners are tutored in the development of information modelling from first principles. The book covers all the core principles of both entity-relationship diagramming and class diagramming—the two major approaches to information modelling.

4

1 Introduction

As we have mentioned, problems with information modelling experienced with traditional approaches are the result of a misconceived notion of the relationship between the constructs of an information model and institutional reality. It is my belief that spending some time unpacking the nature of both data and information up-front for the student and providing some solid theoretical basis for this distinction is critical to getting students to engage effectively with the intricacies of information modelling. Therefore, unlike existing texts in this area, which tend to be largely atheoretical, the proposed book builds a coherent account of information modelling based in strong theory. This theory is introduced in an informal manner and through a number of practical examples of information situations relevant to a range of institutional settings. There is nothing as practical as a good theory. This book therefore provides solid guidance on how to produce information models in practice. The text promotes a practical approach to information modelling based around the analysis of communicative practice within delimited domains of organisation. Numerous examples are peppered throughout the book to illustrate constructs and their application. Detailed exercises in information modelling with solutions are also provided. The author has over 30 years of experience in the field both in teaching the subject and in applying information modelling in practice. He has published a range of texts which impinge upon data and the design of data systems—his textbook on database systems went through three editions. The current text on information modelling forms a companion volume to his existing texts on Business information systems (3rd edition—2020), Business analysis and design (2021a) and Data and society (2021b).

1.4

Figures, Examples, Exercises and Solutions

It is important to recognise that although information modelling is not necessarily a visualisation technique, some form of visualisation is normally expected in its application. Therefore, given the nature of the subject matter, the book contains a substantial number of figures. All figures are drawn by the author. Numerous in-text examples of the concepts of information modelling and their application are included throughout the text. A separate chapter is devoted to a range of exercises which the reader can use to test understanding and application of the technique. A corresponding chapter of solutions is also provided to support learning.

1.5

Book Contents

In this section, we provide a quick overview of each substantive chapter within the book. The chapters are designed to be read in sequence. The early chapters build an account of information modelling from the bedrock of a theory of information situations. Later chapters discuss a number of practical issues concerned with the application of the business analysis and design technique. The conclusion

1.5 Book Contents

5

demonstrates a larger context for the application and importance of information modelling. Chapter 2: What Is Information? The central claim of this book is that to conduct information modelling effectively, both the student and practitioner need to understand the nature of information. Within this opening chapter, we build a theory of information in terms of situations in which information is accomplished by actors. This enables us to conclude that any successful attempt at information modelling must begin with a close understanding and analysis of the information situations pertinent to the domain in focus. This domain may be an existing domain of communicative action, or it may be an entirely new domain of communicative action. Chapter 3: Why Model Information? This chapter considers what a model is and how models relate to notions of institutional reality. Traditional approaches to information modelling, as we shall see, regard the relationship between an information model and reality as one in which reality is made up of things with properties and an information model is composed of formal statements which correspond to objective facts about such things. We shall show that this conception leads to certain problems with the conduct of information modelling. This leads us to present a contrasting account which we believe offers a more sophisticated and accurate representation of the relationship between an information model and reality. We shall show how our framing of an information model provides a better way of considering not only the true purpose of an information model but also how to approach the investigation of institutional domains. Chapter 4: Information Modelling from First Principles In this chapter, we build an account of the major constructs of information modelling using our theory of information situations as its bedrock. We start with the notion of an object referred to through an identifier. This leads us to consider the process of classification, which involves grouping objects that share common characteristics into an information class. Information classes are defined in terms of attributes held to be common amongst a group of objects, but they are also defined in terms of their relationships of association with other classes. Such relationships of association are further defined in terms of certain constraints, known as cardinality and optionality. We then look at two important processes of further abstraction sometimes considered important to modelling institutional ontology with classes—that of generalisation and aggregation. Chapter 5: Visualising an Information Model Within Chaps. 2, 3 and 4, we compose an information model using the canonical form of a series of binary relations, and we use such binary relation to represent a set of institutional facts about the content of communication relevant to the domain in question. However, information modelling originally developed as a diagramming

6

1 Introduction

technique meant to aid the work of analysts and designers of data systems of various forms. Within this chapter, we demonstrate various ways of building a visualisation of an information model from an established set of institutional facts. Chapter 6: Composing an Information Model Within this book, we view an information model as a model of important aspects of institutional ontology—a model of what actors within some domain deem to exist, how they communicate about such things and how they use such communication to coordinate joint activity. This way of thinking about both the content and the purpose of information modelling allows us to develop a clear way of composing an information model which does justice to some institutional ontology under investigation. Within this chapter, we demonstrate how to build information models either from an analysis of the instrumental communicative practices within some domain or by designing a set of communicative practices for some new domain of action. Chapter 7: Practical Issues in Information Modelling In this chapter, we examine a number of practical issues associated with the conduct of information modelling and how these may be resolved. We first consider the issue of interpretive flexibility—the fact that the modeller may choose to model the same thing as a class, attribute or relationship depending upon the institutional context under consideration. The same flexibility applies in the case of using generalisation and aggregation within information modelling. Then, we consider the distinction between strong and weak classes and notions of ternary and recursive relationships. This leads to a discussion of how to include time within an information model and the important problem of connection traps and how to avoid them. Chapter 8: Information Models and Data Systems Information modelling is typically directed at the design of some data system. The architecture of some data system is defined in terms of some data model, of which one of the most popular is that of the relational data model. The design of some relational database, which is referred to as a schema, is best understood through a visualisation technique known as dependency diagramming. This technique offers a straightforward route for conducting a process important to the design of a relational schema known as normalisation. Chapter 9: Information Modelling in Context Within this chapter, we consider the context of information modelling in a number of different senses. First, we consider how information modelling fits within the larger practice of business analysis and design. Second, we consider how information modelling has relevance not only to modelling data but also to the modelling of metadata. This leads us to discuss the way in which information modelling is relevant within the design of Web infrastructure. Third and finally, we consider how an understanding of information modelling is important to building a more

1.6 Conclusion

7

nuanced approach to big data as well as to the more overarching and emerging discipline of data science.

1.6

Conclusion

The late novelist Ursula Le Guin in her quartet of fantasy novels (Le Guin 1993) described the world of Earthsea in which magic is a reality. Magic is enacted by key actors in this world, namely, wizards. Such actors spend many decades in learning how to accomplish magic through the use of special words, and the use of such words by these actors allows them to manipulate things in the world of Earthsea. But we as actors in the worlds we build are also very much reliant upon the words we use. In fact, our use of words is not entirely remote from the wizard’s use of words in Earthsea. This is because words are key examples of signs and signs always have two faces. Our use of words as signs allows us to describe our institutional worlds but also to reflect upon these realities. But when we use words, we also construct major aspects of the reality we are describing. This means that we all engage in sign-magic on a daily basis. As we shall see, this book is very much about words or more generally signs. This is because information modelling by its very nature focuses upon the use of signs by actors within institutions of many different forms to get things done.

2

What Is Information?

2.1

Introduction

To model information, we need first to understand as clearly as we can the nature of information. In other words, to build any coherent account of information modelling and how to do it properly, we first have to know what information is and what it is not. This is not as easy as it seems. Part of the problem is that information as a concept is normally taken for granted by the disciplines which most deal with it, such as computer science, information systems, information science and information technology. This attitude of mundane acceptance has also migrated without questioning into the newer areas of big data, data analytics and data science (Chap. 9). Another part of the problem is that the concept of information, as we have argued elsewhere (Beynon-Davies 2013), has many different connotations in a multitude of different literatures. So, to help steer a clearer path through this conceptual murk, we have developed a theory of those situations in which information is clearly present. By using this theory as our guide, we can clearly see what information is and what information is not. This rendering of information situations provides us with a number of advantages over other literature. First, it provides us with a much clearer understanding of the context for information modelling in the sense that we can clearly identify why we are doing information modelling and for what purpose. Second, it allows us to build a much better account of the core constructs of information modelling. Third and finally, it provides for us a much more productive route for describing the proper conduct of information modelling. Using our approach, the newcomer to information modelling can clearly understand how to compose an information model from an analysis of the communicative competence appropriate to some domain of institutional action. Figure 2.1 is an attempt to visualise our theory, developed from a range of the authors’ previous work (Beynon-Davies 2021b). The eminent biologist Gregory Bateson (1972) usefully defined information as ‘. . . any difference that makes a difference’. Information, as we shall see, is the difference or set of differences that an # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_2

9

10

2

Articulation

What Is Information?

Coordination

S1

S1 M1

T1

T2 A2

A1

A2

S2

Communication

Environment

Fig. 2.1 A model of information situations

encounter with some structure makes to an actor. Information is an accomplishment made by an actor or actors and is very much bound up, as we shall see, with instrumental and formal communication used by actors within institutional settings to get things done. In this chapter, we provide a discussion of our theory’s key components and what such components mean in terms of information and demonstrate how an understanding of information situations provides a more resilient route into information modelling than that offered within traditional accounts of the subject.

2.2

Information Situations

Figure 2.1 is an illustration of what we refer to as information situations—situations in which information is accomplished. We contend that such situations always consist of a number of essential components—actors, structures, messages and actions—all taking place within some environment. The environment of some information situation is typically for the information modeller some domain of institutional action. Let us consider these components in more detail. Actors We use the term actor in a deliberately abstract way here to denote anything that can act. Actors transform their environment in some way and include not only humans but also other animals, machines and certain classes of artefact. Two human actors are indicated in Fig. 2.1, which we have labelled as A1 and A2. However, as we know, systems of information technology also form key actors within institutional settings. Structures Structures are things within the environment of actors which undergo a certain form of transformation. A structure is brought into existence by particular actors by

2.2 Information Situations

11

making differences within some substance evident in the environment. Not all structures are equal within information situations. Within this book, we focus upon structures explicitly produced and used to communicate things between two or more actors. This type of structure is known as a data structure. Records of all forms are data structures, as are lists, registers, ledgers and so on. Within Fig. 2.1, a data structure S1 is transformed by transformation T1, whereas transformation T2 is undertaken upon some undefined physical structure S2. Messages Data structures are transformed through acts of articulation by actors. Through the articulation, such as (T1), of data structures, such as (S1), messages can be conveyed as signals between one actor and another. This signalling of messages is the essence of communication. One actor creates or effects some articulation of a structure, and one or more other actors sense or read the changes made to the structure. Through this process, two or more actors commune—they arrive at a common understanding of something. In Fig. 2.1, M1 constitutes the message transmitted by actor A1 to actor A2 through the articulation of structure S1. Actions So, it should be evident that not one but three types of inter-related or coupled action are illustrated and labelled in Fig. 2.1. There is first the act which involves articulation of some data structure by some actor. Then there is the act of communing through this structure between one actor and another—of collectively agreeing as to what structures stand for—this is the essence of communication. And, finally, there is usually a responsive action on the part of the receiving or sensing actor, which may be to articulate some further structure within the environment. This domain of action we refer to as coordination because most of the structures that we focus upon within this book (data structures) are transformed with the intent of coordinating the joint activity of multiple actors working within some environment. Environment Information situations typically occur in a repetitive manner within delimited settings which we refer to as institutional domains or domains for short. An institutional domain is an environment constructed or reproduced from patterns of actions performed repeatedly by actors working to achieve joint activity in the fulfilment of established goals. An institutional domain may be the whole of or a coherent part of some private, public or voluntary sector organisation. Or it may refer to something larger such as systems of government, social care or policing within the nation-state. Sequence There is a necessary sequencing of action in any information situation—certain actions always occur before other actions. There is also a necessary temporal delay or lag between various forms of action. Articulation must always occur before communication, and communication must always occur before coordination. There is also a necessary lag between an actor articulating some data structure, that data

12

2

What Is Information?

structure communicating something to some other actor and that actor coordinating their activity. The lag or delay between articulation, communication and coordination may be a matter of seconds, but it can also be a matter of hours, days and even weeks. Example: Emergency Response

Now consider just one information situation from one particular institutional domain—that of medical emergency response. This is an institutional domain composed of many different information situations which we shall examine in some detail throughout this book. The information situation illustrated in Fig. 2.2 provides ‘flesh’ to the component elements from the more abstract Fig. 2.1. We must remember that this is an abstraction of only one situation extracted from a pattern of actual information situations that occur every hour of every day amongst working medical emergency organisations within the UK, which is used as our key example in this book. However, information situations such as this having similar characteristics occur between actors attempting to accomplish information in this manner throughout the world. The dotted arrows indicated upon Fig. 2.2 represent the distinct sequence of action evident in any information situation: a data structure must be articulated before it can be used as a message to some other actor and before this actor can take appropriate further coordinated action. Within this information situation, one actor, an ambulance dispatcher, creates a data structure known as a dispatch message, which is entered into the incident system of emergency response. This dispatch message as a signal is transmitted to an ambulance station where it is read (sensed) a few seconds later by another actor, an ambulance driver. This ambulance driver effects certain action in response to this message, namely, to drive his ambulance to the incident as indicated in the message. This whole pattern is likely to be enacted in a few minutes, but in times of high demand, there may be an appreciable delay of hours between the receipt of the message and the action of driving to the incident. ◄ Example: Online Grocery

Or consider another information situation from a different institutional setting or domain, this time in the private sector. Figure 2.3 illustrates an information situation repeatedly exercised within a current area of commercial activity— that of supermarket retail over the Internet and Web—sometimes referred to as online grocery (Beynon-Davies 2017). Here, a customer creates an online grocery order on some digital commerce website. This order communicates to some grocery operator some time later a request to pack and deliver the indicated groceries to a specified location on a specified date. Again, at a further point in time, this communication triggers a grocery delivery to the customer by another actor, namely, a delivery driver. ◄

T1: create dispatch message

S1: dispatch message

S1: dispatch message

Communication

Fig. 2.2 An information situation from emergency response

A1: dispatcher

Articulation

M1 A2: ambulance driver

DIRECT[Go to incident X at location Y

S2: emergency ambulance

T2: drive S2 to incident Y

Emergency response

A2: ambulance driver

Coordination

2.2 Information Situations 13

14

2

What Is Information?

Articulation

A1: customer

T1: create grocery order

S1: grocery order

S1: grocery order

M1

S2

A2: delivery driver

DIRECT[Deliver groceries S2 to household Y on date Z]

T2: deliver groceries S2 A2: delivery driver

Communication

A1: customer

Coordination

Online grocery

Fig. 2.3 An information situation from online grocery

Exercise: Driverless Cars Now consider an information situation in the near future where driverless electric cars may be shared amongst a pool of passengers in a large urban area such as metropolitan London. In this future information situation, a carpool passenger places a car transport request to the shared car indicating pickup and drop-off locations. This request communicates not to a human actor but to the car itself, causing it to automatically schedule the journey into its movements and execute the trip in the directed fashion. Try to visualise the component elements of this information situation in the manner of Figs. 2.2 and 2.3. In other words, identify the actors involved, the data structure articulated, the message communicated and the coordinated action taken.

2.3

What Information Is and Is Not

Our model of an information situation may at first glance seem rather abstract, but it is useful for highlighting a number of different ways in which information is defined in various literatures. Let us first review these different ideas of what information is. We shall then show how these different conceptions of information relate to various parts of our model of information situations. Information as Stuff One particularly common conception of information is one in which information is portrayed as fundamental ‘stuff’ which helps any physical system maintain organisation (Stonier 1994). As such, information is believed to be an objective phenomenon, independent of and the same for all actors. This conception underlies the classic approach to information, evident in the theory of Shannon (1949). Within

2.3 What Information Is and Is Not

15

‘information theory’, information lies in the signal which conveys the message and is associated with the degree of order (negentropy) evident in the signal. Example

In this perspective, as far as our model of information situations is concerned, M1 as a message transmitted as a signal ‘contains’ information. The ordered set of differences comprising this signal are taken to be the very stuff of information. ◄ Information as Interpretation Another particularly dominant perspective is to conceptualise information as the act of interpretation of some signal by some actor. In this sense, information is not inherently ‘contained’ in the message itself. Instead, it is seen to be created within an act of sense-making conducted upon the message by an individual actor (Boland 1987). In this guise, information is seen as a subjective phenomenon, bound to some actor. Here, information is associated with some notion of inner processing undertaken within the mind or psyche of actors—a process of information. Example

In this perspective, information requires an actor such as A2 to assign some meaning to the set of differences evident in message M1. The consequence of this is that the same set of differences made in some substance may be interpreted as meaning different things to actor A3 than to actor A2. ◄ Information as Intentionality More recently, information has been considered an inter-subjective phenomenon, reliant on the ‘negotiation’ of collective or shared intentionality (Searle 1983). As such, information is considered an inter-subjective accomplishment amongst groups or communities of actors. Here, information is related to the shared ways in which actors build an ‘aboutness’ between sensed structures evident in the environment and mental states. Example

In our model of information situations, such collective intentionality involves the aboutness between structures in the environment, such as S1, and some internal state which causes the actor to emit a certain message, such as M1. In turn, message M1 becomes a state of the world which causes some mental state in all receiving actors, causing them to effect certain actions, such as T2. ◄ Information Does Not Exist Finally, we should mention the most radical position which proposes that information does not exist—it is a null concept. Stimulated by the work of Maturana and Varela (1987) and their idea of an autopoietic (self-producing or self-organising)

16

2

What Is Information?

system, this viewpoint maintains that information is merely a convenience imposed by observers upon situations of behavioural coordination through structural coupling. In this sense, we observe patterns of order in some situation, such as certain patterns of messages and actions. But such patterns merely correspond to invariances between the actions of certain actors in relation to the environment. We impose upon such patterning the convenient idea of information being ‘conveyed’ or ‘communicated’ as a useful way of accounting for the behavioural coordination which corresponds to such invariances. Example

According to this view, the observer of some situation perceives that whenever actor A1 does something, such as transform the data structure S1, actor A2 does something in response, namely, transform the structure S2. The observer infers the presence of some information through the evident patterning of the behaviour of these actors. In other words, there is an observed invariance between the presence of S1 and the occurrence of T2. ◄ Information Arises Within Situations in Which All Elements Are Present The key consequence we take from our theory of information situations is that an accurate account of information must encompass all four viewpoints, but in one entangled whole. Information is objective because it relies upon the materiality of signals. In other words, a signal is always made up of a set of physical differences made in some substance, and such differences can be objectively observed by all. Information is subjective because it is built from the interpretation of structures by actors. This means that the differences made to some structure may be interpreted differently by different actors. Information may be inter-subjective when it amounts to the outward expression of a collective intentionality which associates certain states of the environment with certain mental states. This means that information only becomes possible when actors collectively agree that certain things stand for certain other things. Finally, in terms of each of these conceptions taken independently, information may not exist. We take this to mean that information is not a substantive concept solely reliant upon any component part of some information situation. Instead, it is better to propose that information must have all the elements of the situation illustrated in Fig. 2.1 present to be deemed to exist. Information is always a phenomenon which emerges from the continuous exercise or accomplishment of some pattern of actors, messages and actions all working within some environment. Exercise The next time you see a news report which uses a phrase such as ‘the information tells us that’, try to step back and think how the term information is being used in this context.

2.4 The Stands for Relation

2.4

17

The Stands for Relation

So, let us place our cards on the table. We make the central claim within this book that to understand how to conduct information modelling within effective practice, the modeller needs to understand how the constructs of this approach relate directly to information situations as we have described them. As we shall see, information modelling primarily focuses upon one crucial part of an information situation, namely, the messages generated and transmitted within information situations. Information modelling also focuses mainly upon the content of such messages. Finally, information modelling does not regard all messages made by actors as important. Instead, it tends to focus upon messages that attempt to get things done by actors within delimited institutional settings. Having said that information modelling focuses upon the content of messages does not mean that we can ignore other elements of information situations. Information situations always form the proper context for information modelling work, and a proper understanding of information situations by the modeller is critical to the effective construction of an information model. However, given such understanding, information modelling itself makes no attempt to cover and represent all the elements contained in information situations. To simplify somewhat, information modelling is particularly concerned with how messages are used to build intersubjective agreement (collective intentionality) about the meaning of things amongst a group of actors. We shall see in turn that an information model primarily concerns itself with a limited part of messages, which we refer to as the content of messages. The content of messages consists of the signs used to stand for things of importance within some institutional setting to actors within such a setting. An information model is an attempt to map the patterns of such content evident within some delimited domain of human and machine action. Information, as we have seen, fundamentally relates to acts of communication between people and between people and machines. But information is always directed at achieving coordinated action. The informed person makes decisions about appropriate action in particular situations. An information model is an attempt to represent a limited but important part of the patterns of information situations evident in some delimited domain of institutional action. Typically, an information model builds a representation of what actors normally communicate about in pursuit of instrumental action. This model, as we shall see, is built in terms of certain constructs which enable us to identify and describe the things communicated about. To identify and describe such things, we utilise the constructs of classes, attributes and relationships between classes. Classes, attributes and relationships are key examples of signs. Signs are important because it is through the application of signs that we as actors make sense (impart meaning) about the world. But we don’t just make individual sense. Through signs, we accomplish collective or inter-subjective meaning—we make sense between ourselves by achieving a common understanding of the aboutness of things—how one thing is about another thing. Through such collective meaning,

18

2

What Is Information?

we coordinate our joint activity. All this, of course, is the essence of information situations—the linkage between structures, messages, actions and actors. But let us become a little more analytic in our understanding of signs. According to the American philosopher Charles Sanders Peirce (Atkin 2016), a sign is (Α) some thing (Β) that stands to somebody (Γ) for some other thing. The three signs utilised here to segment out parts of the definition of a sign correspond to the first three letters of the Greek alphabet (alpha, beta and gamma). Example

Consider the simplest of signs—the pointing finger. The pointing finger is a classic example of a sign. Indeed, it is an embodied sign: a sign produced by manipulation or transformation of the human body. The pointing finger is a sign because it stands for something else. In this case, it directs our gaze to some other thing. A human smile is another common example of an embodied sign—the smile tells us something of the inner state of the actor making the smile. A smile is taken to stand for an inner state such as an emotion of happiness or joy. ◄ In the first example, a pointing finger is the thing that stands for the thing being pointed at to both the actor producing the gesture and to the person observing the gesture. In creating this collective stands for relation, we produce or accomplish meaning. Hence, to read written Greek, we need to agree on the meaning associated with the signs of the alphabet—what each sign stands for. The name for this collective building of such meaning amongst a community of actors is an ontology. Exercise Think of some other forms of embodied sign such as a clenched fist. If someone waves a clenched fist at you, what is this embodied sign meant typically to stand for?

2.5

Identifiers

Within information modelling, two of the most important types of signs are identifiers and descriptors. More commonly used terms for descriptors are properties and attributes. When a sign refers to something, they are said to be identifiers for things. Alternatively, signs may describe something, in which case they are descriptors—properties or attributes of something. We use the term thing here in an entirely neutral way to stand for anything that can be referred to or described by actors within some institutional setting. Let us first examine the use of signs as identifiers (we shall look at descriptors, properties or attributes later). So, identifiers are signs which refer to something. More precisely, an identifier is anything which can be taken to refer to some other thing

2.5 Identifiers

19

across time and space to multiple actors. Referring is a critical function within communication which allows a sender to specify one and only one thing to which the sign within a message applies while also providing the means for a receiver to identify the thing from the sign/message. Identifiers are particularly useful within acts of communication because they can refer to some instance of a thing without actually the need to describe it. They can also refer to this instance across many different information situations. Example

For instance, personal names such as ‘John Smith’ are typical identifiers, while a definite description of this person might consist of the phrase ‘the man with red hair and a pronounced limp’. Red hair and a pronounced limp can be taken to be descriptors of the person. ◄ Example

Consider a sign important to many institutional settings—the passport number as a personal identifier. Each country in the world is able to create its own form for such an identifier, and each passport identifier or number refers to one person: [ REFERS TO ] To take just one example of the use of such an identifier, within the UK, a passport number currently consists of nine digits: For instance: [109999555 REFERS TO John Smith] ◄ Note that we cannot actually represent or record as a fact the relationship between a physical thing and a sign directly. We actually have to use other signs as proxies. The fact we have just listed in relation to a passport number actually relates two identifiers. One is a ‘natural’ identifier and consists of a personal name; one is a ‘surrogate’ identifier, created by a particular institution (in this case, the UK Passport Office on behalf of HM Government) to uniquely refer to a certain person. Both natural and surrogate identifiers can refer to some thing, but surrogate identifiers enforce the uniqueness of reference across information situations. Example

Hence, the surrogate identifier 109999555 will always refer to one and only one British citizen. The natural identifier John Smith is sufficient to refer to this person in many contexts. However, in certain situations, the referring function will break down, potentially because there is likely to be more than one person named John Smith in the UK. ◄

20

2

What Is Information?

So, an identifier such as a passport number is a sign—an identifier which refers to an instance of a person. More generally, for Charles Sanders Peirce, any sign is a rule taking the form: [X stands for Y to Z in C] Some thing X stands for some other thing Y to some actor Z within some institutional context C. The terms X, Y, Z and C in this rule are placeholders, meaning that they refer to some as yet unspecified value of something. We can make this rule work for us by substituting specific terms into each of the placeholders. Example

So, the term 109999555 is an identifier (X) if it stands for a specific person John Smith (Y) to a specific actor or role such as a border control officer (Z) within some institutional context such as the domain of British citizenship (C). ◄ The implementation of any stands for relation is a convention. The relation between some thing and what it is taken to stand for relies purely upon the weight of precedent and is unlikely to emerge without the presence of such precedent—the idea that something occurs because it always has occurred. This means that most signs are not fixed but arbitrary in the sense that there is no inherent linkage between the terms X and Y to actors such as Z in the rule above. The stands for relation is merely accepted to be the case amongst some community of actors because it has always been the case within this institutional domain. Example

The fact that the spoken and written word Paul is taken to stand for me the author as a person is purely arbitrary in the sense that prior to my being named on a birth certificate, there was no formal identifier for me as a person or citizen of the nation-state into which I was born. I might have been identified with an entirely different name, such as Jack or Rhys. However, at the time of registering my birth, a convention was established through an act of communication known as a declaration—that from hereon, the constitutive rule X (Paul) shall be taken to stand for Y (me) to all actors (Z) in all institutional domains (C). ◄ But note that there is a certain magic happening with the use of signs such as identifiers within institutional settings. The assignment of identifiers to things serves not only to identify such things to the institution concerned; they bring these things into existence for the institution or more precisely to actors communicating within this institutional domain. Hence, the assignment of identifiers to things serves to help define an important part of the so-called ontology of the institution—its notion of what things exist or to put it another way what reality is. So rules of the form X

2.6 Descriptors

21

stands for Y to Z in C are best seen as constitutive rules in the sense that they serve to constitute (produce and reproduce) the ontology of some institutional domain—they create and recreate the very notion of what things are important within this domain to actors communing with each other (Searle 1995). Exercise You the reader will have a personal name, which is a natural identifier referring to you. But, try to think about all the other identifiers used to refer to you, and jot these down. For instance, if you are a student, what identifies you to the university? It is probably more likely to be a code rather than a personal name. What about identifiers used to refer to you by other organisations you interact with such as banks, supermarkets and so on?

2.6

Descriptors

As we have mentioned, things are not only referred to by identifiers; they are described by descriptors, sometimes called designators. Information modellers tend to refer to descriptors as properties or attributes of some thing, so we shall adopt this naming practice from hereon. Example

A passport as an identity token does not just, of course, contain details of the passport identifier which refers to a particular person. The passport number as the main identifier is not the only sign used on a passport. Hence, when a particular passport as a data structure is issued, it serves to declare a whole series of what we shall call institutional facts about the person, such as: [109999555 GIVEN NAME Joe] [109999555 SURNAME Bloggs] [109999555 DATE OF BIRTH 15/03/1957] [109999555 SEX male] [109999555 NATIONALITY British] ◄ The relations between signs in this example are all matters of description, designation or attribution. In other words, they all attribute particular values to an identified person. Attributing such values to an identified thing serves to describe that thing.

22

2

What Is Information?

Exercise A census of all the persons resident is normally taken in countries such as the UK every 10 years. See if you can find out both how citizens are identified and what descriptors are normally attributed to persons upon the national census.

2.7

Communicative Acts

One critical definition of the term to commune is that it involves the interchange of thoughts and feelings between actors. In terms of our model of information situations, an information model primarily concerns itself with what we refer to as the communication domain—the realm in which messages are used to commune between two or more actors. Example

Consider a simple example of an information situation in which an act of communication (a communicative act) takes place. Person A looks across at Person B, who is at the opposite end of a room. She holds up a hand and points a finger upwards, clenching her other fingers in a fist. What will B take this to mean? Will he take it perhaps as an insult, a command to provide one of something or a message that there is something stuck on the ceiling. ◄ In forming the shape of the pointing finger, person A is making a set of differences with a certain substance, namely, a certain part of her body. Person B encounters this data structure but must make a decision as to what he thinks is the most appropriate meaning to assign to this act performed by person A. When this decision is made, then it makes a difference to actor B in the sense that their future action will depend on this accomplishment. Perhaps, they will later meet at an agreed place at 1 o’clock. The most important element or component of an information situation as far as information modelling is concerned is the message. But there are actually two major parts to any message formed within an information situation—the content of the message and the intent of the message. The intent of a message establishes what the actor is trying to achieve with the message—the purpose of the message. In contrast, the content of the message consists of the things identified and described by the signs which make up the message. In the example of the pointing finger, the content of the message is the pointing finger as a sign which is taken to stand for a meeting at 1 o’clock. The intent of this message is what is referred to by the philosopher John Searle (1970) as a directive. Through this message, actor A is requesting or directing actor B to meet him at 1 o’clock.

2.7 Communicative Acts

23

Example

Consider the message in Fig. 2.2—DIRECT[Go to incident X at location Y]. The intent of the message here is to direct the actor receiving the message to do the things described in the message. The content of the message is composed of an emergency incident identified by X and located at a location identified by Y. ◄ Example

A communicative act from a manufacturing domain is illustrated in Fig. 2.4. Two critical actors within this domain are an inbound logistics controller and an inbound logistics operator. Many communicative acts are enacted by these actors within the daily business of their work. One such communicative act is illustrated here as a speech bubble. The inbound logistics controller is the sender of a message and the inbound logistics operator the receiver of this message. The elements within the square brackets consist of the content of this message—the things identified or described—in this case, a delivery and where it is to be placed. The keyword DIRECT refers to the intent or purpose of the message. Here, the inbound logistics controller is directing that the inbound logistics operator do something indicated by the message itself—namely, ensure that a delivery is checked after being moved to a manufacturing bay. ◄ So, information modelling is interested in acts of communication undertaken by actors within institutional domains, but it is not interested in all communication enacted within these settings. Information modelling is interested specifically in communication that gets things done. In other words, it is interested in communication that helps people, machines and other artefacts coordinate joint activity. Fig. 2.4 A communicative act

DIRECT[Delivery X needs to be unloaded for checking to bay Z]

Inbound logistics controller

Inbound logistics operator

24

2

What Is Information?

Example

You would not normally include within an information model a representation of the greetings between business actors, but you would possibly wish to represent aspects of communication relevant to the establishment and holding of business meetings. One particularly important piece of communication here will be the date and time at which the meeting is to be held as well as the place where it is to be held. Without this piece of communication, various business actors would not be able to coordinate their joint attendance at this meeting. ◄ The American philosopher John Searle (1970) calls such communication which gets things done speech acts. We shall refer to them as communicative acts, because as we know much communication within contemporary institutional domains is conducted not through human speech but via a range of different media, such as the creation of electronic records and the transmission of electronic documents, electronic mail, SMS messages and so on. Communicative acts that get things done are instrumental communicative acts. A communicative act is instrumental because it is designed by the sender of the message to influence the action of the receiver of the message. Examples

For example, within the domain of emergency response, if a caller asserts that a medical emergency has taken place, then another actor, a call-taker, is expected to take certain action, such as to alert yet another actor, a dispatcher, to dispatch an ambulance. Or, if a manager at a company offering technical courses instructs a booking clerk to prepare a schedule for a certain course, then both expect that this activity will be undertaken. ◄ John Searle (1970) maintains that there are five major ways in which people influence other people through communication—assertives, directives, commissives, declaratives and expressives. Each of these terms describes a different purpose for the message. Examples of these five different types of communicative act are provided in Fig. 2.5. These examples all relate to the case of medical emergency response. Let us look at each type of communicative act in turn. Assertives are communicative acts that explain how things are in a particular part of the domain being communicated about, such as reports of business activity. Such acts express the truth of the content of a message on the part of the sender of the message. For instance, within various business organisations, different assertives will be made on a regular basis using verbs such as report, confirm, deny, etc.

Call-taker

Fig. 2.5 Forms of communicative act

COMMISSIVE

Call-taker

COMMIT[An ambulance will be with you in X minutes]

Caller

Paramedic dispatcher

ASSERT[Medical emergency X is of form Y]

ASSERTIVE

Call-taker

DIRECT[Medical actions X, Y and Z need to be taken]

Caller

DIRECTIVE

Ambulance driver

DECLARE[ Incident X is now closed]

EXPRESS[I happy/ Control am centre unhappy with your annual manager performance]

Dispatcher

Call-taker

DECLARATIVE

EXPRESSIVE

2.7 Communicative Acts 25

26

2

What Is Information?

Example

Within emergency response, a call-taker might communicate to a dispatcher that a medical incident is of a certain degree of seriousness, in the sense of whether it is regarded as life-threatening or not. ◄ Directives are an attempt to influence receiver action through some message. Directives consist of any act of communication in which some direct response is required from the receiver of the message. Directives are communicative acts that express how one actor would like another actor to behave. They represent the senders’ attempt to get the receiver of a message to perform or take an action. Within business organisations, directives are evident in the use of certain verbs such as request, suggest, summon, recommend and prohibit. Example

So, within emergency response, a call-taker might issue certain advice about medical actions that need to be taken by the caller. ◄ Commissives commit a sender of some message to the future course of action detailed in the message. So actors may commit themselves or others to something happening in the future. Commissives are communicative acts that express how I as an actor intend to behave. Such communicative acts commit a speaker or sender to some future course of action. They are communicative acts that represent a speaker’s intention to perform an action at some time in the future. For instance, within business, promises, guarantees, acceptances and refusals are all examples of commissives. Example

Hence, within emergency response, a call-taker might promise a caller that an ambulance will be with them in a certain period of time. ◄ Declaratives are messages that change the state of some domain through the communication itself. The main difference between an assertive and a declarative is that when something is declared to be the case, it cannot be undone. For example, a judge may use words such as ‘I sentence you to. . .’ or your boss may utter the words ‘You are fired. . .’. Example

Within emergency response, an ambulance driver might declare that an incident is closed. This change of state means that the dispatcher releases the ambulance and its crew back into the pattern of action. ◄

2.7 Communicative Acts

27

People frequently motivate others through their expressions. Expressives are communicative acts that represent the speakers’ psychological state, feelings or emotions towards some proposition or state of affairs. Expressives represent the sender’s state, feeling or emotion about something. An evaluation of something or someone by somebody normally involves the use of expressives. Example

Within emergency response, a manager might express satisfaction or dissatisfaction with some person’s performance. ◄ Exercise Try to classify the following acts of communication as assertives, directives or commissives: • • • •

An employee reports on business activity to a manager. A person’s presence is requested at a particular business meeting. A guarantee document guarantees some action in the case of some problem. A person sends an email denying his or her participation in a particular affair. • A business plan suggests a course of action. • An agreement ensures that joint actions will be taken between participating actors. The importance of this distinction between intent and content is that different messages may have the same content but different intent. Example

For instance, assume within the domain of some manufacturing company that the content of a message is [Product X, Location Y]. The content of this message identifies a particular product and a certain production location. Now consider two different communicative acts that utilise the same identifiers for X and Y. ASSERT[Product X, Location Y] is an assertive message. It probably asserts the belief of a particular production worker or perhaps a production ICT system that a given product is currently placed at a given production location. DIRECT [Product X, Location Y] is a directive message. Here, a given actor, such as a production supervisor perhaps, is requesting another actor, such as a fork-lift truck driver, to move an indicated product to a designated production location. ◄

28

2.8

2

What Is Information?

Patterns of Information Situations

The point we wish to make here is that within any particular domain of institutional action, information situations do not of course exist in isolation—they form patterns. Consider Fig. 2.6 which represents an extract from a larger pattern of communicative acts relevant to the domain of emergency response. The sequence between the two communicative acts here is represented by a dotted arrow. The pattern begins when telephone operators take an emergency call. The caller’s area code or closest mobile phone cell is identified from the call, which is then routed to the ambulance control centre. At the control centre, a call-taker matches the call number with a physical address using a computerised map (or gazetteer) of the area covered by the service. These two communicative acts will be enacted at various times by many different callers and call-takers, but the essential features of these acts of communication will remain the same. Also, these acts form a clear sequence because the accomplishment of information in the first communicative act must always precede that of the second communicative act. In other words, the call-taker cannot look up the precise location of the emergency without first taking some details of this from the caller. Communicating where an emergency has taken place and involves a certain person is an assertion. The caller is communicating the belief that the content of his or her message is true. In contrast, when a call-taker uses the interface of the electronic gazetteer to enter the details of the emergency, she is requesting a response from this IT system. As such, this communicative act is a directive. As we shall see in a later chapter, there are many examples of communicative acts evident within a wider pattern of information situations that serves to constitute the domain of emergency response. For instance, dispatchers are regularly instructing an ambulance driver to ‘go to this location and attend this emergency incident’. Sometimes, this act will involve a radio message. Most often, it will involve an electronic message transmitted to an ambulance resource, received on some dashboard display and read by the ambulance driver. Or consider another example in which ambulance drivers are expected to assert to dispatchers when they have

Notification of emergency

ASSERT[A medical emergency has taken place at location X on person Y]

Caller

Call taker

Identifying locations

ASSERT[Caller X and emergency location Y]

DIRECT[Find caller and emergency Call taker location] Gazeteer

Fig. 2.6 Two communicative acts from a wider pattern of communicative acts

2.9 Physical and Institutional (Social) Ontology

29

‘arrived at the allocated incident’. This communicative act will consist merely of selecting an option on the ambulance’s dashboard display, which transmits a signal back to the incident IT system that in turn updates the display of the dispatcher. Finally, consider the case of paramedics communicating to a dispatcher detail not only of ‘the patient’s condition’ but also of ‘the treatment administered’. This is likely to comprise a complex and asynchronous dialogue conducted as a series of radio messages between paramedic and dispatcher.

2.9

Physical and Institutional (Social) Ontology

We have stated a number of times within this chapter that information modelling can be seen as an exercise in building a model of important aspects of some institutional ontology. We have also used the term ontology a couple of times within this chapter and the term institution many times. In doing this, we have relied upon your, the reader’s, conventional understanding of such terms. But, what more precisely do we mean by these terms and why is information modelling an attempt to model aspects of institutional ontology? The term ontology derives from the ancient Greek ontos for being and logos for study of. Ontology, as a branch of philosophy, is a theory of reality, being or what things are seen to exist. More recently, the term has been used within computer science and cognitive science to refer to a model for representing the world or more readily some specific domain within the world. Generally speaking, there are two types of things that make up or are seen to exist within any ontology—physical things and institutional things (Searle 2010). Physical things are things like people, mountains and rivers; institutional things are things like bills, payments and contracts. We take the position, first promoted by the philosopher Charles Sanders Peirce, that we cannot experience reality directly— we always experience reality through the mediating layer of signs. Hence, you will note that to refer to both physical things and institutional things, we have to use signs such as and . However, even though we need signs to both identify and describe both types of things, the main difference between physical things and institutional things is that physical things exist independent of the signs that actors use to refer to or describe them. In contrast, institutional things only exist because actors collectively agree that these things can be referred to and described in certain ways. Institutional things have no existence independent from the actors that signify them. Example

A physical thing such as a mountain will still exist even if actors choose not to refer it as such or describe it. Hence, the specific physical structure in North Wales or the physical structure in the West of Scotland will continue to exist even if we did not refer to these things as Snowdon and Ben Nevis. But the piece of paper or plastic in your hand with numerous bits of graphic printed upon it will

30

2

What Is Information?

only persist as a piece of paper or plastic until we collectively agree as a group of actors that this structure refers to a sum of money. When this collective agreement about what this structure stands for comes into play, then this structure becomes a ten pounds sterling banknote or a ten dollar bill. ◄ What actors deem to exist in their physical environment we might refer to as physical ontology. What actors deem to exist within the domain of their institution we might refer to as institutional or, more generally, social ontology. In a way, ontology provides to actors a way of making sense of facts about reality and through this to build or constitute reality. Given that there are two different types of things, then we would expect that two different types of facts arise from physical and institutional ontology—physical facts and institutional facts. Physical or brute facts are observer-independent. They exist independent of humans as observers of things. Brute facts are so-called because they are matters of ‘brute’ physics, chemistry and biology. Within a brute fact, the status of the thing (or things) referred to has an existence independent of institutions—even of the institution of language—although the expression of such facts relies upon systems of signs. Example

An example of a brute fact is that the sun is 93 million miles from the earth. In contrast, institutional facts are matters of culture and convention. They exist only within the context of human institutions, such as that the European Journal of Information Systems is considered a quality journal amongst the information systems academic community. ◄ Example

In terms of medical emergency response, brute facts constitute the ontology of physical things such as human beings and their medical conditions, the physical configurations of ambulances and medical equipment as well as the geographical layout of the area covered by the emergency service. Hence, it is a brute fact that an ambulance station can physically hold no more than five ambulances and that this ambulance station is situated 12.5 km from its nearest general hospital. ◄ Institutional facts are by their very nature dependent upon human institutions. They rely upon a declaration by certain actors that certain things are true. Institutional facts are matters of culture and convention and, as such, are observer-relative. Within an institutional fact, the status of the thing (or things) referred to depends upon a collective acceptance by the actors concerned that the thing has a certain function. This means that institutional facts exist only within the context of human institutions and are brought into existence through collective declaration of the conventional meaning of things by actors within such institutions. Institutional

2.9 Physical and Institutional (Social) Ontology

31

facts are important because they serve to constitute the institutional domain itself as a social ontology. Example

Hence, the following institutional facts might be established within the communicative action evident in a response to an emergence incident: • Ambulance resource 423 has been dispatched to incident 120453. . . • Ambulance resource 423 consists of two crew members D46 and P54 and equipment 24346 and 32895. . . • Crew member P54 is named Jane Bloggs and is a paramedic. . . All these facts rely upon collective acceptance of the meaning of key terms utilised in communicative acts by actors such as resource, incident, crew member and P54. The meaning of such terms is not just a matter of reference and attribution, but is constituted through the effects they have upon action. ◄ Example

Consider the different institutional domain of manufacturing and a thing familiar within the institutional context of manufacturing—that of a stillage. Stillages are physical things and as such have an existence independent of the institution. In other words, they can be described in terms of brute facts such as a stillage is a steel box being approximately 1 m in depth, height and width. These brute facts can be confirmed by any observer of such objects making such facts observerindependent. But what is the function of a stillage? A stillage may be a physical structure, but these physical structures are assigned a status within the institution concerned. A stillage is used to store various stages of finished product—‘stock’—within the context of the manufacturing plant. We might even frame the constitutive rule in this case as being: [A stillage (X) stands for a unit of stock (Y) to a group of actors (Z) within the manufacturing plant (C)]. ◄ Both physical facts and institutional facts are built using signs. Signs, as the primary constructs of communication, rely upon a shared ontology amongst a group of actors: the context within which a group of symbols is used in continuous communication by a social group or groups. Hence, a shared ontology is a necessary pre-condition for joint communication and effectively frames or controls such communication.

32

2

What Is Information?

Example

Plant biology relies upon a common ontology first established by Carl Linnaeus in which plants are given names, typically in Latin. Hence, a daffodil is named in this taxonomy as narcissus. This name, along with descriptors of its key features, is then used by plant biologists around the world to identify and describe this species of plant. ◄ An ontology provides a shared vocabulary, which can be used to ‘model’ a domain of organisation in terms of the type of objects or concepts that exist, as well as their properties and relations. We can use the term ontology to denote a common set of representations used by a group of communicants by which they transfer both content and intent from one actor to another. Ontologies are not fixed; they are continuing and communicative accomplishments and in effect emerge within any system of communication. Ontologies are important because they help support practical action. Example

Consider the case of medicine and its use of an ontology to get things done. When you visit a general practitioner (GP), she will use a number of ontologies to help her diagnose your illness, decide upon your treatment and record your prescription. For instance, a conversation about your symptoms will lead the GP to propose a number of possible illnesses you may be suffering from. Each such illness will have a generic name in a standardised, structured vocabulary of medical terms. These terms will have relationships with a range of possible medical treatments which will also have standard terminology. Medical treatments may involve prescribing certain drugs. Such drugs will be given a generic name within a formulary. This document lists not only the generic names but also possible proprietary names, the normal uses of the drug and likely side effects of drug use. ◄ Example

A key example of the way in which ontologies change is the continuing debate over gender identity. This is also a key example of the way in which ‘signs have politics’ (Beynon-Davies 2021b). Gender identity is typically defined as a personal and internal perception of oneself, which usually translates linguistically into the use of some gender category to classify someone. The consequence of this is that a person’s gender class may differ from their biological sex. One way of thinking about this is that classifying a person as having a certain sex is a brute fact. It is a biological fact whether one is born male or female. However, how a person chooses to define themselves in terms of gender is an institutional fact— reliant upon the collective agreement not only of the person but of a community of actors that this person should be referred to in certain ways. ◄

2.11

Summary

33

Exercise Consider a university domain. Name three things which everybody within the university would regard as physical things. Then name three things which are definitely institutional things.

2.10

Conclusion

We started this chapter with the claim that any modelling of information must start with an understanding of what information is and what it is not. This led us to make the case for the need to properly understand the component elements of information situations—situations in which information is accomplished. We made the claim that any situation in which information is clearly present must always consist of actors, (data) structures, messages and actions, all taking place within some domain of institutional action. The consequence of this is that any successful attempt at information modelling must begin with a close understanding and analysis of the information situations pertinent to the domain in focus. This domain may be an existing domain of communicative action, or it may be an entirely new domain of communicative action. In the next chapter, we provide an introduction to the constructs of information modelling and demonstrate clearly how these constructs relate to elements from our model of information situations discussed in this chapter.

2.11

Summary

• Information is always accomplished within information situations by actors. • An information situation is made up of actors, structures, messages and action all taking place within some institutional domain. • Information situations are situations in which information is accomplished by actors. Information emerges from the continuous exercise or accomplishment of some pattern of actors, messages and actions all working within some environment. • The proper context for information modelling is the pattern of information situations relevant to some institutional domain. The modeller needs to have an effective understanding of the information situations of relevance to build effective models. • Information modelling focuses primarily upon the messages transmitted between actors within information situations. Any message has both content and intent. An information model primarily concerns itself with the content of messages.

34

2

What Is Information?

• The content of messages transmitted between actors consists of a series of signs used to stand for things of interest to actors within the institutional domain under consideration. • Two types of signs are particularly important to document within an information model—identifiers and descriptors. When a sign refers to something, they are said to be identifiers for things. Alternatively, signs may describe something, in which case they are descriptors. • However, to properly understand the content of messages and how this content should be unpacked in an information model, the information modeller must understand the way in which such messages are repeatedly used for instrumental action within the institution in question. This demands an understanding of the purpose or intent of messages created by actors. • There are five major types of purpose associated with messages transmitted for instrumental communication. Assertives communicate the belief by the sender that the content of the message is true. Directives are an attempt to influence receiver action through some message. Commissives commit a sender of some message to the future course of action detailed in the message. Declaratives are messages that change the state of some domain of organisation through the communication itself. Expressives represent the sender’s state, feeling or emotion about something. • Hence, to effectively compose an information model, the modeller must first analyse the existing pattern of communicative actions relevant to some institutional domain. Alternatively, the information modeller must design a new pattern of communicative actions for some proposed domain of activity.

3

Why Model Information?

3.1

Introduction

Information modelling, as we have seen, is a technique employed primarily by the business analyst and the business designer. However, whereas most literature on information modelling covers the approach of building such models in fine detail, it does not adequately address the important prior activities of engagement with and investigation of institutional domains. It also does not adequately address the important question as to why we model in the first place and why information models are important. Within the current chapter, we attempt to address this gap. We first consider the idea of a model and how this construct relates to notions of institutional reality. This leads us to consider the deficiencies inherent in the way in which traditional approaches to information modelling regard the relationship between an information model and reality. We then consider the relationship between models and reality promoted in this book, which we believe offers a more sophisticated and accurate representation of this relationship. We shall show how our framing of an information model provides a better way of considering not only the true purpose of an information model but also how to approach the investigation of institutional domains. But first, let us start with a brief history of information modelling.

3.2

A Short History of Information Modelling

Information modelling has quite a long and established history within the discipline of business analysis and design. Interestingly, it originated not as a modelling technique but was developed in attempts to build more expressive architectures or data models for the database systems of the 1970s. It is for this reason that historically, when considered an analysis and design technique, it was largely known as data modelling rather than information modelling. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_3

35

36

3

Why Model Information?

During this period, two prominent architectures for managing data known as the hierarchical and network data models dominated. However, in 1970, Ted Codd published his landmark paper on the relational data model (Codd 1970), which was set to influence the design of database systems for many decades hence. During the 1970s and 1980s, many alternative data models to the relational data model were published which became known as the semantic data models (SDMs), because they attempted to provide formalisms for building more expressive meaning (semantics) into database systems. One of the most cited of such SDMs was the entityrelationship model (sometimes referred to as the entity-relationship-attribute model) proposed by P. P. S. Chen in the mid-1970s (Chen 1976). Although developed originally as an alternative to the relational data model, IT developers soon latched on to the usefulness of the graphic approach provided by Chen to express his data model and adapted it to the purposes of database design. A number of extensions to the approach were proposed over the years, particularly the inclusion of abstraction mechanisms such as generalisation and aggregation. This led many to propose the use of what became known as extended entity relationship modelling as the appropriate way of conducting what is known as conceptual modelling, as opposed to logical or physical modelling for database systems. In early 2000, the Object Management Group adopted many of the features of extended entity relationship modelling within its specification for class diagrams as part of the Unified Modelling Language (UML). Although various updates have been made to UML as an approach since that time, the conventions of class diagramming have remained consistent for well over a decade. As we shall see in Chap. 5, there are certain subtle differences between class modelling and extended entity relationship modelling, but the core of these techniques is essentially the same and covers all those constructs to be covered in Chap. 4. We have clearly made the explicit decision to refer to the technique in focus as information modelling rather than data modelling or database modelling within this book. This is a direct consequence of our discussion in Chap. 2 where we considered the distinction between and relationship between data and information. Information modelling is concerned with patterns of information situations evident within delimited domains of institutional action. Information modelling focuses primarily upon the messages transmitted between actors within information situations. An information model concerns itself with the content of such messages, and these messages consist of a series of signs used to stand for things of interest to actors within the institutional domain under consideration. Any sign is some thing that stands to somebody for some other thing. The thing which stands for some thing is typically some form of data structure. But information modelling is also interested in what things are being referred to or designated by data items, elements and structures. So information modelling must model more than just structures of data—it must model how such structures cause actors to accomplish certain things within institutional settings.

3.3 The Notion of a Model

3.3

37

The Notion of a Model

In Chap. 2, we considered what information is and is not. Clearly, to understand the basis of information modelling properly, the other term we need to unpack here is that of a model. So what is a model? All models are abstractions. Abstraction is necessarily a process of filtering—of ignoring certain things that we feel are unimportant and including certain other things which we regard as important. But what is the status of the things we choose to ignore or include? One view of models is that they are objective constructs which abstract from certain agreed features of some reality and represent these features in some standard form. Example

Hence, natural scientific theories, such as Darwin’s theory of evolution or Einstein’s theory of relativity, can be seen as models in this light. They all are attempts to describe and predict features of the material or physical world, which is the same for everyone. ◄ Another view is that models are subjective constructs, dependent upon a person’s position and how they view reality. According to social scientists, within the social world we all potentially may work with different models of reality. Example

For instance, it is possible to argue that managers always work with models of their business. But managerial models are rarely explicit models. They are subjective models which managers use to make sense of their own situation of organisation. These models are tested in the realm of management decisionmaking and the outcomes resulting from such decision-making. ◄ The approach promoted in this book treats models as tools which are reliant upon signs and which are directed at achieving collective agreement amongst a community of actors in the fulfilment of joint actions. In this sense, models are both objective and subjective, or we should more properly say inter-objective and intersubjective. Human activity is clearly undertaken by human actors, motivated towards the solution of some purpose and mediated by tools in collaboration with others. In this perspective, human beings never interact with the ‘world’ directly. Instead, such interaction is always mediated through use of tools. Tools may be physical or technical tools such as hammers or computers as well as psychological or social tools which help people form activity themselves or with others, such as symbols or maps. Both physical tools and psychological tools are mediators which help change the structure of activity. Tools therefore shape the way in which humans both perceive and interact with ‘reality’ since they reflect the experiences of other people

38

3

Why Model Information?

who have tried to solve similar problems. Any model in this perspective is therefore a ‘tool’ for debating or negotiating about the nature of some reality: the aim being to achieve mutual understanding and joint action. The critical tool of concern in this book is that of a sign. Models are constructed using systems of signs, and for signs to work, there must be some collective agreement about what the signs are, what they mean and how such meanings shape activity. Within this book, we use the idea of a model as a way of negotiating collective belief as to either how things are in some domain or how we, as a collective or community of actors, might like things to be in this domain. Exercise Consider a map such as the map of the London underground or the Paris Metro. In what way is such a map a model? Who are the community of actors that use such a model and for what purpose? Models need sign systems in the sense that models are created through signs and effectively act as an external communicative resource amongst a group of actors. This means that all such representations are models and all models are forms of representation. Natural languages, such as English and Welsh, are clearly the richest of sign systems with which we model or represent our ‘world’. For analysis and design of business organisation, more restricted and formalised sign systems are typically used. This is the reason that visualisation (Chap. 5) is much used as a means of presenting such models. Any model can also be used in a number of different ways in relation to the dimension of time. Models can be built of realities in the past, present and future. An information model, for instance, can be developed as a model of current ontology— what people currently communicate about in terms of classes, attributes and relationships within some delimited institutional domain. This type of information model we refer to as an AS-IS information model. Alternatively, we can develop an ontology of some future communicative pattern. Here, we are designing some new way of working with an associated understanding of the communicative practice that will be needed to support coordinated activity. Such a model we refer to as an AS-IF information model. Finally, we might use an information model to specify how communication will be structured and represented within data systems. This we refer to as a TO-BE information model. Models can also vary in terms of their level of abstraction. Within the database systems area, for instance, and as already mentioned, a distinction is frequently made between conceptual, logical and physical models. In this sense, an information model would normally be seen as a conceptual model since it documents, at a high level, the things of interest that actors within some domain need to communicate about. In contrast, a logical model translates these things into the constructs appropriate to the architecture of some data system. Finally, a physical model refers to the implementation of data structures within some actual data system. In Chap. 8, we

3.4 Information Models and Reality

39

shall illustrate how to turn an information model (conceptual model) into the design for some relational database (logical model). This design can then be implemented within some relational database management system (physical model).

3.4

Information Models and Reality

Let us explain our view of information models more clearly by contrasting it with the conventional view of what information models are. To do this, we need to examine and expand upon the differing notions of what reality is. Traditionally, most current practices of information modelling either explicitly or implicitly utilise a view of reality consistent with that evident in the work of the philosopher Mario Bunge. Bunge’s theory of reality proposes that the world is made of concrete things that possess properties and that properties can be conceived of as functions that map a thing onto some value. Yair Wand and Ron Weber (1990, 1995) utilised elements of Bunge’s conception of reality in their proposals for a way of conceptually modelling information systems. Wand and Weber (1990) built upon Bunge’s conception to propose that the things and their properties relevant to some domain may be modelled directly as constructs within an information model. They further proposed that classes can be modelled as things with common properties, whereas associations are suitable for modelling binding mutual properties shared between classes (Wand and Weber 1995). So, in terms of what has become known as the Binge-Wand-Weber conception of ontology, things, whether physical or institutional, comprise an organised collection which are objective—meaning that they are the same for everyone. The practice of information modelling is then seen to be a process by which things are identified and represented as statements in some formal ‘language’ and that these statements then correspond to objective facts about some domain of reality. We have referred to this framing of information models and reality elsewhere as the conventional view of information modelling (Beynon-Davies 2018). Given our discussion in Chap. 2, this way of thinking about an information model and its relationship to reality might at first glance seem entirely sensible. It is certainly evident in background assumptions made within many existing texts on information modelling. However, practitioners of information modelling often encounter substantial problems when they attempt to perform information modelling in practice (Bodart et al. 2001) using this view of reality. Novices in information modelling experience considerable difficulty in turning problem descriptions into the abstract representation of some information model (Batra and Davis 1992). The ‘quality’ of information models prepared by both novice and expert alike is frequently poor (Moody and Shanks 2003). Also, various stakeholders in the domains being modelled often have difficulty understanding information models. Even one of the founding fathers of the conventional worldview of information modelling—Ron Weber (Weber 2003)—has questioned the typical practices by which information modellers tend to model artefacts such as orders and invoices rather than the underlying phenomena on which such artefacts rely. More recently, issues have

40

3

Why Model Information?

been raised in relation either to the lack of theoretical foundation (Siau 2003) or to the meta-modelling assumptions underlying conventional information modelling (Eriksson et al. 2013). This echoes an emerging critique of the ontological foundations of information modelling. Allen and March (2006) argue that Bunge’s ontology is concerned with representing the world of material things that exist independent of human interpretation. It has little concern with the world of human intentions and meaning. This has led some to contrast the conventional view of information modelling with a worldview more sensitive to a view of institutional ontology based in communicative competence (Klein and Hirschheim 1984; Lyytinen 1985). It is quite easy to demonstrate some of the weaknesses of the conventional view of information modelling based in the Bunge-Wand-Weber ontology by considering a simple problem in information modelling taken from the domain of medical emergency response, which we have considered already in Chap. 2. Example

Suppose you are given the task of representing an emergency incident on an information model. The conventional worldview assumes that identifying and describing an emergency incident can be done through representing a series of objective facts, much in the same way that the physical existence of an emergency ambulance vehicle is an objective fact. However, the existence of an emergency incident is not an objective fact; it is a social or an institutional fact reliant upon acts of communication enacted by actors within the domain in question. A call made by a person reporting some happening to a call-taker within the control room of some ambulance service only becomes classified as an emergency incident through a process of triage which involves various actors interpreting the severity of the medical condition of the people reported about and communicating the appropriate classification to various actors such as paramedics, dispatchers and ambulance drivers. This means that an incident only becomes an emergency to the institution of emergency response, or more precisely to actors working within this institution, when it is classified as such by certain actors given the institutional authority to declare this status. ◄ Exercise Consider a higher education institution such as a university. Explain why the grading of student assessments are not objective facts but institutional facts. Explain also which actors are involved in the production of these facts and where classification is applied in the process of grading. Therefore, within this book, we promote an alternative view of information modelling which arises from our focus upon communicative competence. This means that we focus upon how actors within some domain communicate currently about the things of interest to them or wish to communicate in the future about

3.5 What Are Information Models for?

41

certain things. As we have seen, this view proposes that when actors engage in the process of identifying and describing things, they are engaging in social acts that help constitute an institutional reality which is inter-subjective. But institutional reality is always built upon a physical reality which is inter-objective. This view further suggests a framing of an information model as a specification of the structure of terms used within acts of communication relevant to some institutional domain. These terms, as signs, must refer to and describe both institutional things and physical things. Exercise Within a university setting, name three things that are commonly communicated about between lecturers and students. Are these physical things or institutional things?

3.5

What Are Information Models for?

It should be evident from our brief account of the history of information modelling that the primary purpose of this technique throughout its application has been as an analysis and design technique which aids in the development of data systems of various forms. We use the term data system to refer to an organised collection of data structures and associated processes of articulation that may be used to operate upon such data structures (Chap. 8). The classic example of a data system is that of a relational database system. A relational database system uses the data structure of a table or tuple, operated upon by processes of insertion, deletion, update and retrieval. We shall show how to translate an information model into a schema for a relational database in Chap. 8. But the concept of a data system is much wider than that of a database system. For instance, and as we shall see, there is a close relationship between the constructs of an information model and the constructs underlying the current infrastructure of the World Wide Web. In such a realm, information modelling is useful within exercises of metadata modelling—not only in terms of modelling the elements of XML schemas but more widely in proposals to build greater semantics into the World Wide Web. We shall examine this idea of a much wider context for information modelling in Chap. 9. There is even a much more wider purpose to information modelling than the design of specific data systems or the metadata concerned with such systems. Because of its usefulness for modelling aspects of institutional ontology, information modelling has a much wider range of application within business, the public sector and the voluntary sector. In previous work (Beynon-Davies 2021a, b), we have made the case for thinking of organisations of all forms in the modern world as being ‘scaffolded’ through patterns of information situations. The close coupling of data to action means that information modelling is useful in many organisational situations

42

3

Why Model Information?

for coming to a collective agreement about not only how things are but how we might want them to be. Let us consider just three areas where information modelling is important in this wider sense. Statistics is a branch of mathematics devoted to the collection, analysis, interpretation and presentation of masses of numerical data. Statistics is also now seen to be an important part of the infrastructure of data science. If you talk to any good statistician, you will establish the necessary truth that the analysis of data in any form must be based upon a firm understanding of the ways in which data structures are made. Any statistic is only as good as the data it is built upon, and indeed, it is impossible to interpret any aggregate measure generated through statistics properly without understanding the making of data structures which scaffold this analysis. This means, of course, that the whole notion of conventional statistics implicitly relies upon a form of information modelling. We would argue that the design of data sets used by statisticians as well as the investigatory techniques of questionnaires can be much improved through effective information modelling. The management of large-scale construction projects is now facilitated by so-called building information models. A building information model is a digital representation of the physical and functional characteristics of a building from inception through to design, construction and use. In a sense, such a building information model represents an agreed model of the things of interest to numerous actors such as architects, planners, builders and administrators. The building information model acts as a collective communicative resource important to all actors engaging with this artefact. Finally, we should mention that data as implemented in data systems is a critical resource not only to specific institutions but acts as important infrastructure across institutions. In this sense, information models are important to what is known as data administration, which attempts to develop and execute policies for data definition, control and protection. Data administration is the attempt to impose order upon the diverse data structures articulated currently across an institution while also planning the data required for future action. To achieve this, data administrators implement standards for the definition and storage of data. Administrators also create and monitor practices that define and control access to data resources. They ensure the integrity of the data resource and that it is secured from threats. This means implementing procedures to ensure that the organisation complies with any legislation concerning data privacy. Finally, data administrators encourage sharing of data across applications and promote the idea that data as a resource is independent of IT applications and its users.

3.6

Investigating the Ontology of Domains

Given that information modelling is an attempt to represent important elements of the communicative practice within some institutional domain, the question remains—how do we engage practically with such communicative practice? In

3.6 Investigating the Ontology of Domains

43

other words, how do we try to make sense of what people currently communicate about or how do we envision what actors will need to communicate about? Information modelling is a specialism within the wider discipline of practice known as business analysis or business design. The business analyst tries to make sense of existing domains of organisation, while the business designer envisions future domains of organisation (Beynon-Davies 2021a). Investigation within business analysis normally occurs in short periods of immersion within such domains, typically using some combination of investigation techniques. Various forms of representation are then constructed to communicate common understanding between the business analyst/designer and various actors with a stake in the domain under consideration. An information model is one such form of representation. There are a number of established ways of making sense of both existing and envisaged patterns of information situations within some domain of organisation. These ways include conversation, observation and participation. There is also the analysis of existing data structures such as records or documents. The investigation work of the business analysis is typically conducted as a systematic conversation between the business analyst and so-called stakeholders in some problem situation (Beynon-Davies 2021a). The focus of business analysis and design is typically upon some situation within institutional life which is regarded as in some way problematic—hence, the term problem situation. People who are interested in change to the situation are referred to as stakeholders in the problem situation. The purpose of systematic conversation is to build some common ground between the analyst and such stakeholders about the ‘shape’ of the problem situation. This common ground may either constitute an understanding of how things currently are or an understanding of how stakeholders would like things to be. As a form of investigation, the business analyst/designer and stakeholders can engage in conversations on an individual or a group basis. When conversations are led and controlled by the business analyst with specific, named individuals, they are referred to as interviews. Interviews are directed conversations, designed to achieve specified goals. When conversations are led and controlled by the business analyst with a representative group of stakeholders, they are likely to be referred to as a focus group, collaborative meeting or design workshop. Interviews are clearly not everyday conversations—they are systematic and directed conversations organised typically around pairs of questions and answers. Having said this, the degree of formality can differ between interviews. Informal or unstructured interviews are those in which questions are formulated by the business analyst within the flow of the interview itself. Formal or structured interviews are those in which a structure or protocol is devised prior to the interview and used by the business analyst to drive the flow of conversation. The consequence of this is that the actual questions asked within an unstructured interview will differ from one interview to the next. In contrast, a structured interview will deliberately involve asking the same questions of different actors. Unlike an interview which is one-to-one communication between the business analyst and a stakeholder, a focus group is a one-to-many communication between the business analyst and a range of stakeholders. A focus group is a discourse in

44

3

Why Model Information?

which a group of people are asked a series of questions about their perceptions, opinions, beliefs and attitudes towards something or some situation. These questions are asked in an open group setting where participants should be encouraged to talk freely with other group members and in doing so may formulate joint responses to the questions asked. The group nature of this form of investigation is seen to be important to generating consensus views of something or some situation. Interviews and focus groups are particularly good means of investigation where the objective is to develop some common basis of understanding about some problem situation. Meetings and workshops are vehicles particularly for decisionmaking in areas such as the prioritisation of issues to be addressed or requirements for a new information model. Within business analysis, workshops are typically vehicles for joint design of problem solutions. Workshops constitute sessions in which the business analyst and representatives of stakeholder groups get together in a structured situation to formulate thinking about either an existing domain of organisation or a new domain of organisation. One of the best ways to understand a set of practices is to engage in such practices yourself—this is what is meant by participation. There is nothing like practically attempting to ‘walk in someone else’s shoes’ for appreciating what is actually involved in doing some aspect of work. The key problem with participation as an investigative technique is that it is likely to take time. In contrast, observation usually involves being present in work settings but not directly participating in the pattern of action. Instead, the analyst will be involved in recording the detailed work behaviour of people and machines. One way to manage the observation of work is through shadowing, that is, following a particular worker around and observing all the tasks performed by this worker in the activity system in question. Another way of managing observation for business analysis systematically is to walk through a pattern of action with the people doing the job. Ideally, this should be done a number of times with different workgroups to tease out any differences in practices across business units. Most of the investigation techniques discussed so far involve the business analyst engaging with a domain through its human actors. However, as a direct consequence of our theory of information situations, it is useful to think of artefacts as acting, at least in a limited sense. Therefore, it is particularly important that the business analyst engages with such artefacts to help make sense of either current or envisaged domains. Within Chap. 2, we hinted that data utilised within some domain of organisation helps constitute institutional facts about this domain. Such data comes in many different forms. For instance, documents are a valuable resource in most organisations. Such documents may consist of paper forms used in work, reports generated from ICT systems or design documents of various forms. Documents are particularly important, for understanding the structure of data used in the support of work performance. Data structures act as an institution’s collective memory. Hence, sampling the records used in support of some domain of action and analysing such records is

3.7 Conversations for Action

45

particularly important for understanding what actors within the domain feel it is important to remember about. Records also act as key resources for communication between multiple actors, sometimes remote in time and space. Records are typically built for a particular defined purpose and are important to understand for the way in which they establish purpose and performance in some domain of organisation. In recent times, business data has become even more significant than in the past. The amount of business data represented in organisational ICT systems of various types has grown astronomically. Various technological approaches are now available to analyse large data sets which suggest patterns of action worthy of further investigation. Collectively, this approach has become known as ‘big data’ (Chap. 9). Technologies such as data warehousing, data mining and data analytics are being used by business analysts to determine hidden patterns—patterns which will probably need to be explored and detailed further using other methods of investigation such as observation, workshops and interviews.

3.7

Conversations for Action

In terms of information modelling, we are not interested in all conversations; we are interested in conversations for action. A conversation for action is one in which two or more actors accomplish information with the purpose of coordinating their activity. Therefore, one particularly important resource for the information modeller are the conversations for action that may be collected from the domain under investigation. A conversation is typically made up of a sequence of adjacent communicative acts (Clark 1996). In other words, one actor articulates an utterance, and another actor articulates an utterance in response; this leads to a further pair of communicative acts and so on. Pairs of communicative acts within a conversation tend to follow conventional patterns in which the articulation of a certain utterance generates a preferred response. One of the most typical patterns of adjacent communicative acts includes the question-answer pattern, such as ‘How many production units do you have?’ ‘Eight’. This is actually a type of communicative act which we called a directive in Chap. 2 followed by an assertive. Such a question-answer pattern forms the basis of structured conversations such as the interview. Another pair is the assertion-agreement pattern or assertion-disagreement pattern, such as ‘There is clearly a problem with stock flow’. ‘Yes, I think you are right’ or ‘There is clearly a problem with production scheduling’. ‘No, I think it has more to do with the way we manage stock’. This pattern is composed of an assertive followed by an assertive and is the basis of much group decision-making. There is also the summons-response pattern, such as ‘I’d like to talk to you about this issue on Tuesday’. ‘Yes, that should be fine’. This is a directive followed by a commitment and is the basis of communication used for control purposes within organisations.

46

3

Why Model Information?

Finally, there are two types of paired acts of communication that express the inner state of actors involved in the communication. There are the typical thanksacknowledgement pattern such as ‘Thank you for your contribution to this effort’. ‘No problem, I enjoyed it’ and the apology-acceptance pattern such as ‘I am sorry I raised this issue so abruptly’. ‘Your apology is accepted’. Example

Consider the conversation between a caller and a call-taker within the control room of an emergency response service. At the control centre, many different callers make calls to the call-takers working in shift patterns. These calls occur many times over a 24-hour period, 365 days a year. The call-takers are left unconstrained in holding a conversation with the caller, but there are certain important things that the call-taker must gather from the caller during the duration of the conversation. The conversation has an instrumental purpose—to accomplish sufficient information so that decisions can be made not only as to whether to dispatch an ambulance but where to and with what resources. Consider one instance of a call made to the control centre: Call-taker: You are through to the ambulance service . . . how may I help you? Caller: Please, can you send an ambulance . . . my mother has fallen out of bed and cannot get up off the floor. Call-taker: I see; are you calling from the house your mother has fallen in? Caller: Yes . . . she is groaning on the floor now . . . can you hurry please? Call-taker: I first need to take some details. . . can you tell me the name of your mother and the address you are at? Caller: Elsie Phillips and we are at 25, Halethorpe Road. Call-taker: Do you know the postcode for the property? Caller: No, sorry. . . Call-taker: No problem, I can find it on my map. . . yes we have the address. . . can I have your name please? Caller: I’m Joe Phillips and Elsie is my mother . . . Call-taker: OK. Is your mother conscious Joe? Caller: Yes. Call-taker: Are you able to speak to her Joe? Caller: Yes, she says her left hip is very painful and she cannot get up. Call-taker: Can she tell you how long she has been on the floor Joe? Caller: She says that she fell out of bed last night . . . so, she must have been lying on the floor for some hours. . . Call-taker: OK, try not to move her Joe but make her as comfortable as possible by perhaps putting a cushion under her head and putting a blanket over her . . . an ambulance will get to you soon . . . Caller: I will do . . . can you tell me how long the ambulance will be please?

3.8 Visualising Patterns of Information Situations

47

Call-taker: It should be with you in a matter of minutes Joe . . . but I’ll keep you updated on this number . . . Caller: thank you. . . ◄ This conversation consists largely of adjacent pairs of questions and answers or directives and assertives. The call-taker directs the caller to assert certain things about the situation. Through this instrumental pattern of communication, the calltaker is trying to establish a number of institutional and physical facts about the possible emergency incident, namely, who is calling, where the incident has taken place, who is involved in the incident and what is their likely medical condition. This will enable the call-taker to hold a further conversation for action with a paramedic dispatcher who will decide upon the appropriate level of response to the incident and may get back in touch with the caller to engage in further conversation which will direct the caller to do certain things until the ambulance arrives.

3.8

Visualising Patterns of Information Situations

Within this chapter, we have made much use of the idea of a pattern of information situations, first introduced in Chap. 2. A pattern is any regular set of differences (Bateson 1972) which is reproduced across more than one situation. The idea of pattern is central to many disciplines. For instance, the American architect Christopher Alexander (1964) proposed that architectural design is based on a number of archetypal patterns which encapsulate fundamental principles of building design. This idea has had much influence within other disciplines such as software engineering where design patterns are proposed as general solutions to programming problems. Hay even produced a set of patterns for common information models found within business (Hay 1996). In terms of any pattern, such as a pattern of information situations, it is important to make the distinction between a pattern in principle and a pattern in practice. A pattern in principle is the ideal or schematic form of a pattern and consists of roles undertaking commonplace action. A pattern in practice consists of definitive actions undertaken by specific actors in specific places and at specific times. The term scenario is often used to refer to some representation of a pattern in practice, consisting of actual and observed patterns of action performed by particular actors within some domain. From the analysis of various scenarios, the business analyst may devise a pattern in principle which abstracts the common features of actors and actions. This distinction between patterns in principle and patterns in practice applies to overall patterns of information situations as well as to individual patterns of articulation, communication and coordination. Example

The conversation we have seen between a caller and a call-taker is a pattern of communication in practice. By studying recorded conversations as patterns of

48

3

Why Model Information?

communication in practice, it becomes evident that certain common features are evident in all conversations made between persons enacting the roles of callers and call-takers within the domain of emergency response. These common features can be used to form a pattern of communication in principle. ◄ But how should we represent such patterns of communication in principle? In previous work, we have found comics useful as a means of visualising patterns of information situations (Beynon-Davies 2021a) in general or patterns of communication in particular. This is for a number of reasons. Comics are highly visual, and we know that ‘a picture paints a thousand words’. We deliberately use only a few constructs within a pattern comic to help us think differently about organisation. Comics are deliberately freeform in nature—you can add to the core constructs of comics with ease. Most people, with little prior training, find it reasonably straightforward to ‘read’ a comic. We use comics to focus upon patterns of action by actors. In other words, we put actors and action at the centre of our representations of institutional domains. Finally, such comics can be used not only to document common understanding about what people do or think they do; they can be used to document ways of improving some domain. We refer to these visualisations as pattern comics or business pattern comics. The technique is deliberately informal and open-ended but is typically used to visualise some system of action. To do this, we need ways of describing actors taking action within a defined chronology of events, and for this purpose, the set of elements illustrated in Fig. 3.1 is useful. A typical comic is made up of a series of panels, with each panel consisting of one or more cells. The finite set of descriptive states for the domain in question is visualised as a finite collection of comic cells, each cell typically describing one state of action within the overall business pattern. The sequencing of cells is represented as dotted arrows linking cells. Therefore, each cell is generally used to represent a snapshot of action (event) within an overall plot, and a linked series of such cells is used to narrate the storyline. Human actors are represented by stickpersons or named mannequins within comic cells. Machine actors such as ICT systems or artefacts such as data structures are represented by appropriate icons. When actors are represented, speech bubbles (to indicate external communication) and thought bubbles (to indicate internal communication) are attached to pictured characters—particularly within patterns of communicative action. Captions are also attached in a more free-form way to cells and are used to convey additional message content over and above that conveyed by visualisation. Typically, pattern comics are primarily used as an analysis tool—as a way of making sense of what is going on currently within some domain. In this mode, we have tended to use comics as a way of representing observed action. The comic is drawn and then used as a focus for discussion with representative actors from the pattern of action being analysed. This serves to validate and verify observations. It also serves to establish common ground about ways of doing amongst a community of actors. The documentation of routine work in practice as one or more pattern comics is then used as a resource for the production of a representation of the routine in principle. This comprises an abstraction of observed action. Again, this

Fig. 3.1 Elements of a pattern comic

Cells can represent acts of articulation, communication or coordination.

Each panel is made up of more than one cell.

Panel

A comic is made up of more one or more comic panels.

Role

Role

Thought bubbles are used to represent internal dialogue.

Cell

Articulation act

Caption

Speech bubbles are used to represent external dialogue.

Role

Role

Actors can be humans, machines or artifacts.

Communicative act

Caption

The chronology of the narrative is established through sequencing of cells.

Artefact

?

Caption

Role

Option

Coordination act

Cells are generally used to represent action performed by identifiable actors.

Caption

Role

Captions are sometimes used to provide a third person accounting of the context for the action in a cell.

Decision points can be used to indicate choices undertaken by actors to change the flow of action.

Connector symbols are used to indicate both the start and end of a particular pattern and to connect between patterns.

3.8 Visualising Patterns of Information Situations 49

50

3

Why Model Information?

representation of the routine in principle can be validated and verified with participating actors. Through this process, it becomes possible to use such comics as a high-level representation of the situation AS-IS within some domain of organisation.

3.9

Documenting a Pattern of Information Situations

We suggested in Chap. 2 that the primary purpose of any investigation of an institutional domain for the purposes of information modelling is to understand and document patterns of information situations that serve to constitute the domain in question. Let us demonstrate what this means in terms of a domain in which a piece of information technology is utilised in caring for the elderly in their own homes. We shall show how we derive a pattern in principle of information situations for this domain and then use this to develop a set of institutional facts which identifies and describes the key things of interest to actors within this domain. These facts can then be used to compose an information model and build a visualisation of this model. We shall merely introduce the process of composition here, which we cover in much more detail in Chap. 6. Elderly persons falling in their own homes is one of the most common reasons they get admitted to hospital. The use of personal alarms with associated telecare systems is increasingly important to the care of the elderly in their own homes. A personal alarm is a small device with a single button worn on the wrist or around the neck of the person at all times. When the button is pressed, a monitoring system is alerted, and help is sent. To investigate such a domain, the investigator might interview some of the key stakeholders within the domain, such as service providers, carers and the elderly people themselves. The investigator might also visit the home of a number of elderly persons and see the operation of personal alarms in action. She might also participate as a call handler at the control room of the service provider. A focus group of carers and elderly persons might gather much useful insight into the needs of these stakeholders. Finally, a workshop might help stakeholders within the service providers to determine areas where the system of personal alarms might be improved in the future. Figure 3.2 documents the pattern of information situations relevant to this domain based upon a close analysis of how this technology is used in practice. This pattern consists of an assemblage of acts of articulation, communication and coordination undertaken by multiple actors. The sequence of actions in narrative form is as follows (numbers relate to those on the figure). An elderly person makes a telephone call to the service provider (1). This act of articulation communicates to the service provider a directive detailing the details of the person required to be registered. The next step is that the service provider holds a follow-up call with the elderly person/ carer (3). This call is used to discover the list of contacts to be contacted in the case of an alarm being raised (4). These acts of communication trigger an act of coordinated action (5), namely, that an installer installs the fixed position receiver for the alarm

Make

Contact

Contact

call

Confirmation call

Yes

COMMIT[I shall/ am unable to visit person X]

Service provider

Call handler

Contact call

Contact

Contacts call

14

Service IT system

DIRECT[ End of list]

Confirm visit

Call handler

DIRECT[Can you confirm you have visited person X]

No

No

Elderly person

Call handler

ASSERT[I have visited person X]

Commitment to visit?

11

9

Make contact call

Service provider

Contact

Service IT system

DIRECT[ Contact commits]

Registration call

DIRECT[A visit needs to be made to person X at location Y]

13 confirmation

Contact visits elderly person

Contact

Service provider

DIRECT[I would like to order a personal alarm for person X]

2 Order personal alarm

Fig. 3.2 A pattern of information situations for personal alarms

Elderly person

10

Raise contact

Visit elderly person

12

Carer

Registration call

1

Make registration call

3

Yes

End of contacts?

8

Service IT system

DIRECT[ Visit made]

Yes Yes

Assert incident

Contacts call

15

Emergency call

4

Make call to emergency response

16

Emergency signal

Assert medical emergency

18

7

Signal alarm incident

Elderly person

Installer

Elderly person

Emergency call

ASSERT[A possible medical emergency has occurred with person X] at location Y

Ambulance control

Ambulance control

Service IT system

Service provider

Communicate contacts

DIRECT[I would like contacts X, Y and Z added to my list]

Visit confirmed

No

Callhandler

Emergency signal

ASSERT[An incident has occurred on person X at location Y]

17

Call handler

Service provider

Make contacts list call

5

Attend emergency incident

19

Elderly person

Install personal alarm

Ambulance crew

Any health issues?

20

Carer

6

Paramedic

DIRECT[ Health issues]

PRESS personal alarm

A&E staff

21

Elderly person

Ambulance crew

Take patient to nearest general hospital

No

Elderly person

3.9 Documenting a Pattern of Information Situations 51

52

3

Why Model Information?

within the home of the elderly person and trains the elderly person/carer in the use of the alarm. The elderly person wears the alarm, and the button is pressed at some point (6). This triggers a signal which is received by a call handler in a service control centre (7) and asserts to this person that an incident has occurred (8). The call handler then contacts, by fixed or mobile telephone call, the first contact on the nominated list of the person (9). This call requests the contact to commit to visiting the elderly person and check his or her status (10). If no response is obtained from the contact or the contact is unable to commit to checking on the elderly person (11), then a further call is made to the next contact on the list. If the contact visits the elderly person (12), the call handler makes another call (13) to confirm this (14) with the contact. If the visit is confirmed, then the pattern ends. If no such confirmation is obtained, then a call is made to emergency response (16). Such a call is also made in the case of the call handler running out of contacts to contact (17). The emergency call asserts that a possible medical emergency has occurred for the elderly person (18). Ambulance crew then attend the person (19). If the elderly person has any health issues (20), then the patient is taken to the nearest general hospital for assessment (21). Otherwise, the pattern comes to an end. The key actors in this pattern are therefore elderly persons, their carers, the service providers and people to be contacted in an emergency, which may potentially be members of the emergency response service such as ambulance drivers and paramedics. We should not also forget that the ICT system of the service provider is a key actor in this pattern. Clearly, most of the indicated actions upon Fig. 3.2 are communicative acts (Chap. 2). Communication occurs to enable installation of equipment—personal alarms and receivers. Communication is important to ensure coordination of actors in response to an emergency. The diamonds upon this pattern comic are used to indicate decision points that change the flow of action through the pattern. What is also documented within this pattern is the way in which communication happens between actors. This consists mainly of fixed line telephone or mobile calls as well as electronic signals transmitted to the service provider IT system. This pattern of information situations provides for the investigator a solid basis for constructing an information model which documents what is communicated about in the current situation. This is clearly an AS-IS information model. We shall examine in some detail the constructs of an information model in the next chapter, and then we shall consider how to compose an information model in Chap. 6 from a close understanding of information situations. This involves the investigator unpacking the content of communicative acts and translating the things of interest into classes, relationships and attributes. Here, we introduce in overview both the constructs and the key principles of this composition process. For instance, consider the communicative act within cell 8 upon Fig. 3.2 which asserts that [An incident has occurred to person X at location Y]. This communicative act in principle abstracts from a range of communicative acts in practice. From this communicative act in principle, we can infer a number of things of interest, namely, an incident, a registered person and a location. These are what we refer to as information classes.

3.10

Conclusion

53

We can also infer a number of associated institutional facts from this communicative act, namely, that persons and locations must be uniquely identifiable to the service provider’s IT system and the call handler that uses it. We can probably say that an incident is timestamped in the sense that the signal sent from a personal alarm will be registered at a particular time and date. We can also assume that a person is associated with a designated location and that an incident occurs at a certain location on a registered person. This means we can write the following list of institutional facts in a base form of notation known as binary relations. [P101 REFERS TO ] [L101 REFERS TO ] [P101 ISA Registered person] [L101 ISA Registered location] [Registered person located-at Registered location] [Registered Location locates Registered person] [Incident involves Registered person] [Registered person involved-in Incident] [Incident HASA time] [Incident HASA date] The first two facts here define two identifiers used in this domain and indicate that they refer to instances of things or object within this domain. The third and fourth facts define that the identifiers identify persons with personal alarms. The fifth to eighth facts establish relationships between a location, an incident and a person registered with the system provider. Finally, the last two facts detail the fact that an incident class can be described in terms of the date and time it is recorded. Using such institutional facts, we have a starting basis for forming elements of the information model. We can add to this understanding by examining each communicative act and expanding upon institutional facts felt important for supporting coordinated action within this domain. Exercise Take one other of the communicative acts detailed on Fig. 3.2, and try to generate further institutional facts from it.

3.10

Conclusion

Within the current chapter, we have examined why models are important and how such models relate to reality. We have proposed that information models are a special type of model focused on an understanding of the communicative competence of actors within some domain of institutional action. Such understanding may

54

3

Why Model Information?

be obtained by engaging in many different forms of investigation, such as interviews, observation and participation. In Chap. 4, we move from issues of investigation to a discussion of the key constructs used in the representation of information models. We set our coverage of these constructs firmly within the theory of information situations which was introduced in Chap. 2. Business analysis and design tend to utilise visualisation as a means of communicating understanding, not only between business analysts and technical personnel but also to the stakeholders in the domain in question. This is why in Chap. 5 we describe how to turn an understanding of the institutional facts pertinent to some domain into an information model diagram.

3.11

Summary

• Information modelling has been used as a practical business analysis and design technique for at least 40 years. • Information models are clearly models. A model is a way of negotiating collective belief as to either how things are in some domain or how we, as a collective, might like things to be in this domain. • Most current practices of information modelling either explicitly or implicitly utilise a view of reality as being made of concrete things that possess properties. Information modelling is then seen to be a process by which things are represented as statements in some formal ‘language’ and that these statements correspond to objective facts about some domain of reality. • We promote an alternative worldview of information modelling which focuses upon how actors within some domain communicate about the things of interest to them. According to this worldview, an information model is a specification of the structure of terms used within acts of communication relevant to some institutional domain. These terms may refer to and describe both institutional things and physical things. • Information modelling has been used primarily as an analysis and design technique which aids in the development of data systems of various forms. But information models have a much larger range of application. • Information modelling relies upon a prior investigation of some institutional domain. Such investigation may comprise interviews with key actors, participation in institutional activity, observation of such activity or analysis of data structures used to support such activity. • We suggest that the primary purpose of any investigation of an institutional domain for the purposes of information modelling is to understand and document the patterns of information situations that serve to constitute the domain in question. From such patterns, an understanding of the institutional facts important to actors can be gleaned, and an information model can be composed.

4

Information Modelling from First Principles

4.1

Introduction

In this chapter, we shall cover most of the constructs relevant to contemporary information modelling. But we shall build an account of these constructs from first principles, using the theory of information situations established in Chap. 2. We shall portray the use of these constructs in combination as a way of modelling important aspects of what we shall refer to as institutional ontology. We start with the notion of an object referred to through an identifier. This leads us to consider the process of classification, which involves grouping objects that share common characteristics into an information class. Information classes are defined in terms of attributes held to be common amongst a group of objects, but they are also defined in terms of their relationships of association with other classes. Such relationships of association are further defined in terms of certain constraints, known as cardinality and optionality. We then look at two important processes of further abstraction sometimes considered important to modelling institutional ontology with classes—that of generalisation and aggregation. Generalisation can be considered the process of extracting from one or more information classes the description of a more general class. Generalisation is used to build a class hierarchy of super- and sub-classes. Aggregation is an abstraction in which a relationship between objects is considered a higher-level object. An aggregation relationship relates a whole to its parts.

4.2

Objects and Identifiers

We used the term ‘thing’ quite a lot of times in both Chaps. 2 and 3, when we were discussing the issue of ontology. The common English term ‘thing’ is typically used in a very general way to represent any unit of existence or more precisely something a set of actors within some institutional domain takes to exist. Information modellers # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_4

55

56

4

Information Modelling from First Principles

would refer to such things as objects and think of the abstraction of such objects as classes of object or object classes. So, objects are the component units of some ontology—some set of beliefs common to a set of actors about what reality is. An object is an instance of some thing of interest to two or more actors within some domain. Objects, as we have seen, may be physical things such as customers, products, houses and cars—all these objects have a material form. But objects may also be events such as a house sale, a customer order, a customer payment or a car service—these events are all timestamped happenings. Finally, objects may be purely institutional things, such as orders, sales, contracts and deeds. Institutional objects may take a material existence such as a paper form or an electronic record, but the objects themselves do not depend on their material form as such. Instead, they rely upon a collective acceptance amongst a group of actors that these objects are deemed to exist. A fundamental characteristic of any object is that it must be distinguishable from other objects by a community of actors. Actors within some institutional domain must be able to sever some physical or conceptual space and say what is a certain object and what is not that object. In this sense, we might define an object as some aspect of a domain which can be distinguished from other aspects of the domain: something that makes a difference to actors. To differentiate one object from another, we typically assign an identifier to the object (Chap. 2), and to effectively discriminate objects, each identifier ideally should be unique within the domain in question. Example

So, assume the domain of interest is a manufacturing plant. We might have a list of identifiers for objects of interest to this domain as follows: [5342] [6634] [9982] ... As identifiers, these signs refer to some distinct object—physical, institutional or an event—as we discussed in Chap. 3: [5342 REFERS TO ] [6634 REFERS TO ] [9982 REFERS TO ] ... ◄ Through assigning an identifier to some object in this manner as a form of reference, we bring that object into existence for some institutional ontology. Such

4.3 Classification and Instantiation

57

rules of reference serve to constitute major aspects of the ontology for the actors communicating within this domain. Example

Take the example of a large and dispersed institution such as the UK National Health Service. The objects of particular interest to actors working within this institutional domain are patients. Such patients are typically referred to by a common surrogate identifier, known as an NHS number. An NHS number is a 10-digit number such as 485 777 3456 and serves to uniquely refer to one and only one patient of this institution throughout their life. ◄ You will note in this example that we substituted the term stands for with refers to. As you will see, within information modelling, we formalise or specialise a number of distinct stands for relations in this manner. In the next section, for instance, we shall examine the ISA relation which is critical to understanding processes of classification and instantiation. Exercise Conduct a small investigation of a domain known to you. Determine what identifiers are used on a regular basis. How many of such identifiers are surrogate identifiers—artificial identifiers created purely to uniquely refer to objects? If they are not surrogate identifiers, do the identifiers allow actors to readily discriminate between the objects referred to or are there any problems of unique identification?

4.3

Classification and Instantiation

Bowker and Leigh-Star (1999) within their important exploration of the use of categories within standards established for professional practice make the important point that classification helps order human interaction. For them, ‘to classify is human’. Classification schemes form an important part of the data infrastructure underlying much human activity, and, as such, they are frequently invisible or tacit to actors performing such activity. They are ready-at-hand parts of the way in which actors approach objects. It should therefore come as no surprise to find that classification as the process of assigning classes to phenomenon is a critical part of information modelling. So, let us look at what classification means in terms of institutional ontology.

58

4

Information Modelling from First Principles

Exercise Conduct a brief investigation of a classification scheme such as the British National Formulary or the Dewey Decimal System. What are these classification schemes used for and by whom? If you remember from an earlier section, a constitutive rule is a rule which implements the very notion of a sign and always has the form: [X stands for Y to Z in C] where (X) is some thing which stands for some other thing (Y) in some institutional context (C) to some actor or group of actors (Z). Identifiers are the types of sign we introduced in Chap. 2 and which we have dealt with so far in this chapter. But identifiers of course do not describe. For this, constitutive rules need to work within a process which Charles Sanders Peirce refers to as infinite semiosis—semiosis being the process of sign-use. This process is theoretically infinite because one sign may stand for another sign, which in turn stands for another sign, and so on. In other words: [A stands for B; B stands for C; C stands for D. . .] The process of infinite semiosis is particularly evident in the way in which actors use signs to abstract. The idea of classification or instantiation is a key example of abstraction and involves the definition of a series of common properties applicable to a group of objects. As a constitutive rule, classification can be expressed as: [X ISA Y to Z in C] The relation ISA (Brachman 1983) here may be taken as yet another special type of stands for relation. Within this rule, X is normally a placeholder for some identifier of an object, while Y is a class or category to which the thing identified by X applies. C denotes the institutional context or domain in which this particular classification rule holds for some actor Z. Example

So, we might instantiate (assign some instances to) the identifiers previously listed within the domain of a manufacturing plant in the following manner: [5342 ISA Product] [6634 ISA Product] [9982 ISA Product] ... ◄

4.3 Classification and Instantiation

59

The Y term in the constitutive rule we have just seen is an object class, or class for short, and forms an abstraction of a group of instances or objects. This means that there are normally many objects that correspond to an object class, as is the case with our example of the class product. If there are not many instances of something within the domain in question, it is probably not worth actors making and using the abstraction of an object class within communication. Generally, a class or more accurately an information class may be defined as some ‘thing’ which actors within some institutional domain recognise as important and communicate about currently or wish to communicate about on a regular basis.. Other terms used to refer to such categories for things of interest are entity and entitytype. To communicate about such things, the group of actors must be able to distinguish instances of some class from instances of some other class. Therefore, a class is an abstraction from the complexities of some domain. When we speak of a class, we normally speak of some aspect of the domain which can be distinguished from other aspects of the domain, again, something that makes a difference. But now we are working at a higher level of abstraction than an object. An object class is a sign which stands for an object, or more likely a set of such objects, and serves to categorise or classify such objects. Example

Take, for instance, a university as an institutional domain. Universities need to communicate about a number of things to help in the activities of teaching and learning. These things include students, lecturers, courses and modules. In this example, all these things would be valid information classes. ◄ Note that different institutional domains will have different things of interest depending upon the perspectives of actors within the domain. Hence, they will need to have a different set of information classes to communicate about. Example

Therefore, a university will be interested in students, lecturers, courses and modules as information classes. An insurance company will be interested in a totally different set of classes such as customers, policies and claims. A manufacturing company will be interested in deliveries, products, jobs and dispatches. ◄ An information class, of course, is also a clear example of a sign. When actually writing of a product such as a galvanised steel lintel or a person such as a lecturer or an event such as a business visit, we are inherently using signs as classes. As we indicated, to speak or write, or generally to communicate, about some object, we

60

4

Information Modelling from First Principles

need an identifier to refer to the specific thing we are referring to or identifying. In such terms, an identifier, as we have seen, is one of the most important of signs. Example

Paul Beynon-Davies is a natural identifier for me: it singles out or refers to me as an object of interest. Lecturer, consultant, academic and author are all signs for classes which apply to me: they are designators for certain concepts that encapsulate a certain space of objects. Or alternatively, Module might be a class, whereas Information modelling might be an instance or object of the Module class. ◄ What we use such classes for is to chunk up the world so that we can communicate about it. Classes are categories that enable actors to discriminate between things and describe such things. Hence, when we write: [9982 ISA Product] We are defining an object (identified as 9982) as being a member of the class product. In one direction, from object to class (9982 to product), we are classifying an object as being an instance of a class: this is the process of classification. In the other direction (product to 9982), we are engaging in instantiation: instantiating (making an instance of) a given class, by listing an object that is encompassed by or covered by the class. Example

Within the university domain, Lecturer or Professor may be a class, whereas Paul Beynon-Davies is an instance of this class. Paul Beynon-Davies is an object. Or alternatively Module might be a class, whereas Business Analysis might be an instance or object of the Module class. In the case of a manufacturing organisation, L1200 will be a specific instance of a steel product (class). In the case of the emergency ambulance service, Jane Smith might be an instance of the class patient. ◄ Exercise Gather a small range of actual communications in a domain familiar to you. For instance, collect a range of similar emails sent you in some domain of organisation. In terms of this collection, think about what is regularly being communicated about. What is being classified or categorised in such repetitive communication? What instances can be abstracted into what information classes?

4.4 Attribution

4.4

61

Attribution

It is clear from the previous discussion that an object class is an abstraction of the common features of a group of objects. Such features are defined in terms of relationships between the class and its properties or attributes. This is the process of attribution, which involves the use of signs to describe objects. A class is also defined as well in terms of its relationships with other classes. This is the process of association, which refers to how we tend to communicate about certain things through signs always in relation to other signs denoting other things. Let us first concentrate on the process of attribution. Example

In a university domain, we normally define a class such as Module or Student because we wish to communicate about such things and eventually to record some data about the occurrences of these things. To do this, we use the properties or attributes of a class. For instance, students have names, addresses and telephone numbers; modules have titles and credit points. ◄ A class is given shape through its properties or attributes. When we define the properties of some class, we engage in a process of attribution. Attribution is the process of defining a class in terms of its properties or attributes. Example

Consider the institutional domain of manufacturing again. Within this domain, manufacturing products are key things of interest that are communicated about on a regular basis. Take one class of manufacturing product, perhaps steel lintels. This product is a class defined by its attributes or properties such as product length and product weight. ◄ The constitutive rule for attribution might be written as: [X HASA Y to Z in C] where X is a class and Y is an attribute of the class within the institutional domain C. HASA is thus another special type of stands for relation which allows us to define a class in terms of a listing of its attributes. Example

For example: [Product HASA Product Length] [Product HASA Product Weight] ◄

62

4

Information Modelling from First Principles

This way of defining a class through its attributes, properties or features is referred to as an intensional definition of the class. Classes are by their very nature interesting things because information classes are normally used to define logical groupings of data, otherwise known as a data structure. So, as we shall see in Chap. 8, the classes defined upon some information model will normally turn into the data structures used by some data system. Hence, one rule of thumb or heuristic to apply in identifying suitable classes appropriate for a given institutional domain is the following: If you need to store and access data about many properties or attributes of some thing, then that thing is likely to be an information class.

4.5

Valuing an Object and Forming an Object Class

The opposite of classification, as we have seen, is instantiation. Classification involves determining the group of properties common to a set of objects. Instantiation means defining an individual object by assigning values to the properties of a class. Implicitly when discussing instantiation, we are starting to make a connection here between the objects and classes on an information model and the data structures of a data system. Any data structure can be seen to be made up of a number of data elements, and each data element is made up of a number of data items. A datum, a unit of data, is used to represent a fact relevant to some institutional domain. Typically, a datum is formed by making a data item correspond to the attribute of some class and assigning some value to this data item. This means that a data element is typically used to collect together a set of cognate attributes of some class and through so doing builds an instantiation of the class—it represents an object of the defined object class. Example

So, we can build a data element or object for the product class by listing its attributes—product weight and product length—and assigning a value to these attributes, such as: [5342 Product Length 10] [5342 Product Weight 20] ... ◄ This means that the entire listing of objects as data elements serves to form a complete data structure. And this data structure serves to represent, through a complete listing of objects, the object class. This way of defining an object class in terms of its objects is said to be an extensional definition of an object class.

4.6 Association

63

Example

Hence, we might provide a complete extensional definition for our product class by building a list such as: [5342 Product Length 10] [5342 Product Weight 20] [6634 Product Length 20] [6634 Product Weight 40] [9982 Product Length 60] [9982 Product Weight 60] ... ◄ Another way of putting this is that values assigned to attributes are used to distinguish one instance or object of a class from another. Example

To distinguish one instance of a student from another, we give them a different name, address and so on. Or more likely to distinguish between objects with the least effort, we assign a unique identifier to each student—typically a surrogate identifier. ◄

4.6

Association

However, classes are not only defined by their attributes but also in terms of their associations with other classes. An association is typically a defined relationship between two distinct object classes—this is said to be a binary relationship. Example

In analysing an institutional domain, we might express the fact that a customer places a sales order or a supplier handles a purchase order. Or alternatively, within a university domain, we might wish to represent the fact that students enrol on modules and that lecturers or professors teach modules. In these phrases, customer, supplier, sales order, purchase order, student, module and lecturer/professor are information classes. Places, handles, enrols and teach(es) are signs we might use for relationships of association between these classes. ◄ Example

Let us examine an example we have seen before from our manufacturing domain. Within this domain, there are likely to be associations between a stillage (a container for a set of product) and a manufacturing location. We first need to

64

4

Information Modelling from First Principles

define stillage and location as classes with objects; we have of course already defined the product class. We refer to stillage objects through a stillage code and location objects through a production location code. For example: [26641 ISA Stillage] [26643 ISA Stillage] [24536 ISA Stillage] ... [PL0102 ISA Location] [PL0103 ISA Location] [PL0104 ISA Location] ... ◄ We then need to build a series of associations between these three classes. This means associating the product class with the stillage class and the stillage class with a location class. Example

For instance: [Stillage CONTAINS Product] [Stillage LOCATED AT Location] [Stillage MOVE TO Location] ◄ The terms CONTAINS, LOCATED AT and MOVE TO within these binary relations here are signs we use to refer to linkages between objects in the class stillage and objects in the class product, as well as objects in the class stillage and objects in the class location. In other words, we can define relationships of association by extension through building lists of named pairs of object identifiers. Example

Hence, we might have a contains list such as: [26641 CONTAINS 5342] [26643 CONTAINS 6634] [24536 CONTAINS 9982] ... Next, we might build a stock location list, such as: [26641 LOCATED AT PL0102] [26643 LOCATED AT PL0102]

4.6 Association

65

[24536 LOCATED AT PL0102] ... or a stock movement list: [26641 MOVE TO PL0103] [26643 MOVE TO PL0103] [24536 MOVE TO PL0104] ... ◄ It should be noted that since there are two classes involved in an association relationship, we can call a relationship by different names depending on the direction of naming. Another way of thinking about this is that each class plays a distinct role within any association and that each role can be given a different name. Example

The association between a stillage and a product is named as CONTAINS if we make Stillage the first term in the triple of some institutional fact. The first term, which is known as the subject in a triple, defines the class whose role is being played. [Stillage CONTAINS Product] If we make Product the first term, then the name of the relationship will need to subtly change to read or be communicated properly. In doing this, we are naming the role being played by Product in this relationship: [Product CONTAINED IN Stillage] ◄ Hence, any one association relationship can have two potential names, each name being a role of the class playing out in the relationship. It is also noteworthy that two classes, such as stillage and location, can be associated together by more than one association relationship. The classes will play different roles in each of the association relationships involved. Example

The classes House and Person can be related by ownership and/or by occupation. Hence, we might express this in the following manner: [House OWNED BY Person] [Person OWNS House] [House OCCUPIED BY Person] [Person OCCUPIES House] ◄

66

4

Information Modelling from First Principles

This helps explain why in constructing institutional ontology, we need a layer of abstraction over and above the layer of institutional facts. The abstraction layer provides context to the institutional facts and represents the collective understanding or acceptance of the facts by actors within the domain. This is the essence of institutional ontology and the reason we try to make the important parts of this explicit through an information model. Example

For instance, within our manufacturing domain, it is impossible for actors to be informed by the fact [26641 LOCATED AT PL0102] without a collective understanding that 26641 refers to a stillage, PL0102 refers to a production location and the term LOCATED AT stands for an association between the two objects. ◄ Not every class upon an information model will be related to every other class. In theory, having identified a set of say 6 classes, up to 15 association relationships could exist between these classes. In practice, it will usually be quite obvious that many classes are quite unrelated. Furthermore, the goal of information modelling is to document only direct relationships of association: that is, association relationships between two classes, with no intervening class. Example

Direct relationships exist between the classes Parent and Child and between Child and School. The relationship between Parent and School is indirect; it exists only by virtue of the Child class. ◄ Exercise Each trailer arrives from a customer and might be loaded with a number of different types of steel product. Each batch of such products is therefore labelled with a unique order number. As a whole, each trailer is given its own delivery advice note detailing all associated batches on the trailer. Identify classes and relationships of association from this short snippet of an interview conducted with a manufacturing organisation.

4.7

Constraints upon Association

To each relationship of association, we can add two types of business rule or constraint, which expresses for the modeller how a given institutional domain works currently or should work with its associated information classes. One type

4.7 Constraints upon Association

67

of rule is known as a cardinality rule, while the other type of rule is known as an optionality rule. Cardinality establishes how many instances of one class are related to how many instances of another class. Any association relationship may be typed as either a oneto-one (1:1), one-to-many (1:M) or many-to-many (M:N) relationship. If we state that the relationship is one to one, then one instance of a class is always associated with one instance of the other class. Specifying a relationship as one to many means that one instance of a class is associated with more than one instance of the other class. If we state that the relationship is many to many, then many instances of one class are associated with many instances of another class. Example

In terms of the cardinality of the places relationship between customer and sales order, we ask ourselves the question: how many sales orders can be placed by one customer and how many customers appear on a particular sales order? If the answer to any of these questions is many, we say that the cardinality of that class in the relationship is many; if not, it is one. Hence, in the case of customer places sales order, customer is likely to have a cardinality of one and sales order a cardinality of many. ◄ The concept of cardinality can best be understood by using an occurrence or instance diagram. These diagrams are based on mathematical visualisations known as Venn diagrams and illustrate how occurrences or instances of information classes inter-relate. The circles or ovals on the diagrams are meant to represent sets of instances. Each information class therefore is represented as a set of instances, and each instance/object is given a unique identifier. The relationship of association between two information classes also comprises a set (drawn as circles or ovals with dotted lines on the diagram) and includes the set of associations drawn between instances of both classes. Example

Consider Fig. 4.1 which illustrates the cardinality between two classes appropriate to a university setting. Three instances of a lecturer class are identified as well as three instances of an academic department or school. The line drawn between ‘Computer Science’ and ‘234’ indicates that the lecturer with the identifier 234 is employed by the Computer Science department of this university. Note that the cardinality of the association relationship in Fig. 4.1 is one to many (1:M). The department ‘Computer Science’, for instance, has two lecturers or professors associated with it. Hence, we can express the facts of this case as: [Lecturer EMPLOYED BY Department] [Department EMPLOYS Lecturer]

68

4

Information Modelling from First Principles

234

Computer science

237

Biology

123

Business

Lecturer

Department

EMPLOYS

Fig. 4.1 Instance diagram—one-to-many relationship

[EMPLOYED BY Cardinality one] [EMPLOYS Cardinality many] In contrast, in Fig. 4.2, Lecturer to Student is a many-to-many (M:N) relationship. Lecturer 237, for example, teaches students 34698 and 37798. This is expressed as: [Lecturer TEACHES Student] [Student TAUGHT BY Lecturer] [TEACHES Cardinality many] [TAUGHT BY Cardinality many] ◄ In defining the cardinality of an association relationship, we are actually making two assertions about the domain we are modelling. In essence, the information modeller is selecting between four possible options for cardinality that might apply to any one association relationship. Example

In terms of any two information classes, there are at least four ways in which cardinality might be expressed. So, in terms of the situation between Lecturer/ Professor and Module, as far as teaching is concerned, we might choose between one of the four cardinality rules.

4.7 Constraints upon Association

69

TAUGHT BY 34698

234

37798

237

34888

123

Lecturer

24988

TEACHES

Student

Fig. 4.2 Instance diagram—many-to-many relationship

(1:1) A lecturer may teach at most one module and a module is taught by at most one lecturer—[TEACHES cardinality one]; [TAUGHT-BY cardinality one]. (1:M) A lecturer may teach many modules but a particular module is taught by at most one lecturer—[TEACHES cardinality many]; [TAUGHT-BY cardinality one]. (M:1) A lecturer teaches at most one module but a particular module may be taught by many lecturers—[TEACHES cardinality one]; [TAUGHT-BY cardinality many]. (M:N) A lecturer may teach many modules and a module may be taught by many lecturers—[TEACHES cardinality many]; [TAUGHT-BY cardinality many]. ◄ In contrast to cardinality, optionality establishes whether all instances of a class must participate in a relationship or not. Hence, each class participating in a relationship is either mandatory or optional in that relationship. If a class is

70

4

Information Modelling from First Principles

mandatory in the relationship, then all instances of that class must participate in the relationship. If a class is optional in a relationship, then at least one instance of the class need not participate in the relationship. Example

Hence, in the case of customer places sales order, the optionality is mandatory both for customer and sales order in the places relationship. This means that we make two further assertions about the business situation: [Customer PLACES Sales order] [Sales order PLACED BY Customer] A customer must place at least one sales order to constitute being a customer of the company—[PLACES optionality mandatory]. A sales order must always be associated with an existing customer— [PLACED BY optionality mandatory]. ◄ Optionality is also best illuminated through an instance diagram. Example

The class Lecturer has mandatory participation in the relationship illustrated in Fig. 4.1, while Department has optional participation. Biology, for instance, is not associated with any lecturers or professors currently. In Fig. 4.2, the optionality of Lecturer in the teaches relationship is optional, as there is at least one lecturer not teaching any students; Student is mandatory indicating that all students have to be taught by some lecturer. [EMPLOYED BY optionality Mandatory] [EMPLOYS optionality Optional] [TEACHES optionality Optional] [TAUGHT-BY optionality Mandatory] ◄

4.8

Generalisation and Specialisation

Much information modelling can be conducted solely with the constructs of classes, attributes and relationships of association. The original technique of entityrelationship or entity-relationship-attribute diagramming can be undertaken just with these constructs. However, over the last couple of decades, it has become important to add two other relationships of abstraction to an information model, where it is deemed necessary.

4.8 Generalisation and Specialisation

71

So far, we have only moved one step up in the process of semiosis by classifying some object or attributing properties to an object or associating one class with another class. We next consider moving further up the hierarchy of semiosis through the process of generalisation. Generalisation is normally used in tandem with classification to build an abstraction hierarchy. Classification, as we have seen, involves grouping objects that share common characteristics (attributes and relationships) into an information class. The main difference between classification and generalisation is that while classification relates an object class with its objects, generalisation relates an object class with another object class, and this object class is at a higher level of abstraction. In this manner, generalisation can be considered as the process of extracting from one or more information classes the description of a more general class. The special constitutive rule for generalisation here is expressed as: [X AKO Y to Z in C] where X is an object class, described as the sub-class, and Y is its super-class, meaning it is a more general or abstract class than Y. The AKO (short for a kind of) relationship represents a generalisation relationship or its opposite specialisation. In one direction, from sub-class to super-class, we are generalising from one level of abstraction to another. Example

Hence, when we state that: [Lintel AKO Product] [Crash barrier AKO Product] ... We are expressing two sub-classes of the product class, namely, lintel and crash barrier. ◄ In the other direction, from super-class to sub-class, we are reducing the level of abstraction or specialising a class. Example

Within the institutional domain of financial trading, Stock and Share might be seen as sub-classes or specialisations of a Security class. Likewise, Debenture and VariableStock might be considered sub-classes or specialisations of Stock. ◄ Generalisation, through hierarchies, can be used to provide a more economical representation of some institutional domain than would be available by merely using the construct of an information class. The important point here is that sub-classes

72

4

Information Modelling from First Principles

inherit the properties and relationships of their super-class. The analogy being made is between the transfer of traits through genes amongst organisms and the transfer of properties down through a hierarchy of classes through specialisation. Example

In terms of our example manufacturing domain, we know that a product class can be specialised as a lintel or a crash barrier. Hence, by declaring a lintel to be a kind of product means that we can assume that it has a weight and length and also that it is stored in a stillage at a production location and moved between production locations within the manufacturing plant. ◄ Generalisation is particularly important to many professional practices that involve the standardised naming of things. For instance, it is critical to taxonomy, the science of identifying and naming species or organism. Taxonomy is an important sub-discipline of biology where the taxonomic scheme of biological organisms is organised hierarchically in terms of domain, kingdom, phylum, class, order, family, genus and species. This amounts to a formalised hierarchy of signs and allows biologists across the world to communicate effectively. Most libraries also use taxonomy for organising the storage and retrieval of publications. For instance, the Dewey Decimal scheme, much used in libraries worldwide, organises publications into ten main classes. Each main class is then expanded into ten divisions. And, each division is then expanded into ten sections. Many applications of the concept of generalisation do not fall into neat hierarchies. In such cases, we speak of a generalisation lattice. In other words, a given object class may be a sub-class of more than one super-class. Example

Within the stock market, a MarketMaker class could be said to be a sub-class of both an Investor class and a FinancialIntermediary class. ◄

4.9

Generalisation Hierarchies and Lattices

In formal terms, generalisation relationships are transitive, irreflexive and antisymmetric: • Transitive. If A is a kind of B and B is a kind of C, then A is a kind of C. • Irreflexive. A is not a kind of A. • Anti-symmetric. If A is a kind of B, then B is not a kind of A.

4.10

Aggregation and Decomposition

73

Example

Transitive—If programmers are computing staff and computing staff are employees, then programmers are employees. Irreflexive—Employees are not a kind of employee. Anti-symmetric—If computing staff are a kind of employee, then employees are not a kind of computing staff. ◄ Since classification relationships define links between objects and object classes, it does not make sense to talk of the transitivity or symmetry of these relationships. In terms of generalisation hierarchies, it is sometimes useful to make a distinction between partial and covering sub-classes. In terms of some information class, if its sub-classes are partial, then other sub-classes can be included for the super-class. If sub-classes are covering, then no further sub-classes are permitted. Example

If we regard Broker and MarketMaker as partial sub-classes of FinancialIntermediary, then other sub-classes are possible. If these sub-classes are covering, then Brokers and MarketMakers would be the only type of FinancialIntermediary permitted on the stock market. In terms of our manufacturing example, it is unlikely that lintel and crash barrier are the only types of product produced by the company. Hence, this generalisation relationship would be described as partial. ◄ Disjoint sub-classes do not overlap. However, we can conceive of situations where the concepts referred to by information classes do overlap. If all sub-classes in an information model are disjoint, we have a strict hierarchy of classes. If some are overlapping, we have a lattice structure. Example

Share and Stock are disjoint sub-classes of Security. A Security cannot be both a share and a stock. Broker and MarketMaker are two overlapping sub-classes of financial intermediary since market makers can act as brokers. ◄

4.10

Aggregation and Decomposition

We can build a substantial part of some ontology with classification, attribution, association and generalisation. However, there is one more constitutive rule that can be useful in certain circumstances for building institutional ontology—this is aggregation or its opposite decomposition. The constitutive rule here is:

74

4

Information Modelling from First Principles

[X PART OF Y to Z in C] in which X is a class which is part of a wider whole class Y in some domain C. An aggregation relationship occurs between a whole and its parts and is an abstraction in which a relationship between objects is considered a higher-level object. This makes it possible to focus on the aggregate while suppressing lower-level detail. Example

For example, in terms of the financial domain, we might define a financial portfolio class that aggregates together all the financial products making up a given customer’s interaction with the financial company. In such terms, a financial portfolio class can be considered an aggregate of securities, insurance policies and savings accounts. Likewise, a country can be considered an aggregate of regions which are aggregates of counties which are aggregates of districts and so on. In the case of the health service, a patient history can be considered as a collection or an aggregate of diagnoses, prescriptions and treatments. ◄ Hence, aggregation relationships compose an object out of an assembly or aggregation of other objects. When we state that: [Railway station PART OF railway] [Railway line PARTOF railway ] We are declaring that railways are composed of an aggregation of railway stations and railway lines. The opposite of aggregation is decomposition, that is, the process of decomposing an object class into its constituent parts. But given that we can build aggregation as well as generalisation hierarchies, what is the difference between the two? It is possible to distinguish between aggregation and generalisation in the following way. If two classes are defined in terms of a generalisation relationship, then both sub-class and super-class effectively refer to the same physical or institutional thing, the same group of objects. The super-class is merely a higher-level abstraction of the thing than its sub-class, and both instances of the sub-class and super-class will be referred to by the same identifier. In contrast, within an aggregation relationship, the aggregate, the whole, is different from any of its parts. The aggregate is merely a useful container for collecting together a set of cognate classes, instances of which will all have different identifiers. Example

A lintel is the same thing as a product, and a stock is the same thing as a security. However, a financial portfolio is different from an insurance policy, and a country

4.11

Institutional Ontology as a Sign Lattice

75

is different from a county. A railway is different from a railway line, and a patient history is different from a patient treatment. ◄

4.11

Institutional Ontology as a Sign Lattice

So, let us review where we have got to. In the previous chapter, we made the case for thinking of information modelling as an attempt to build a partial model of institutional ontology, a model focused upon the things identified and described by a group of institutional actors within communicative practice. A lattice of signs helps provide a concrete way of thinking about the notion of institutional ontology and is constructed from objects and classes as well as relationships of attribution, association, generalisation and aggregation. Objects and classes are signs we use not only to identify and describe things of interest within some institutional domain; they also prescribe what can exist within this domain to institutional actors. When we declare a lintel to be a kind of product and that is a product, we not only identify a product as being of a certain type; we expect through inheritance for it to be described in terms of its length and weight. But, as we indicated in a previous section, identifying and describing a product in this way brings this thing into existence for the domain. As far as institutional ontology is concerned, a thing does not exist until it can be signified or named by actors within the domain. But information classes do not exist in isolation. They exist in a complex lattice consisting of other related signs. The way in which a certain sign has the potential to inform actors is down to its relationships with other signs within the lattice structure. Hence, as we have seen, a stillage only makes sense as a container of product which can be stored at production locations and moved between such locations. The lattice also establishes that products may be lintels or crash barriers and both of these classes are part of the wider aggregate of a product line. We shall look in some detail at how to visualise an information model in Chap. 5 and consider a number of different conventions for doing this. Here, we just provide a stepping point from the discussion of the current chapter to ways of visualising information models. One simple way of visualising the sign lattice appropriate to some delimited institutional domain is to simply position appropriate terms for classes upon the page and link these terms together with lines or arrows labelled with the appropriate construct—HASA, AKO and PART OF—and appropriate labels for relationships of association, such as CONTAINS, LOCATED-AT and MOVE-TO. Figure 4.3 provides an example of such a simple visualisation for aspects of the institutional ontology of the manufacturing domain we have discussed in the current chapter. This form of representation is similar to something known as a semantic net and has proven popular not only in areas of artificial intelligence such as machine learning but also in attempts to develop an architecture for the so-called semantic web. In Chap. 5, we shall cover other more involved and standard conventions for

76

4

Information Modelling from First Principles Object class

Product line

24536 26641

Product weight

26643

ISA

Product

Stillage

Product length LOCATED AT

Lintel

PL0102

ISA

Crash barrier

Location

PL0103

PL0104

9982

Fig. 4.3 A sign lattice

visualising the sign lattice appropriate to some institutional ontology—such as the form of visualisation in Fig. 4.4. This form of visualisation is particularly directed at the design of data systems (Chap. 8). Although we have taken great pains within this chapter to unravel the ways in which institutional ontology is built from first principles, we should remember that actors taking action within domains, such as manufacturing, emergency response and higher education, do not think and act with such a formal notion of ontology. Instead, they acquire the elements of such ontology through socialisation into the domain and utilise such ontology as an accepted and unexamined part of their surround-world—their ready-at-hand appreciation of the significance of objects. This means that a domain actor’s ontological understanding is very much entangled with their use of signs to identify and describe things and through this process of semiosis to act in terms of such things. As we have seen in Chap. 3, we arrive at a sign lattice, such as the simple one displayed in Fig. 4.3, by investigating and representing patterns of communication

4.11

Institutional Ontology as a Sign Lattice

Fig. 4.4 An information model

77

Location LOCATES

RECEIVES

LOCATED AT

MOVE TO

Stillage

Product line

CONTAINS

CONTAINED IN

Product

productCode productWeight productLength

(disjoint, complete)

Lintel

Crash barrier

appropriate to some institutional domain. This understanding is then used to unpack the content of communication and to compose an information model from a close understanding of the purpose of such content. We shall examine some of the ways of composing an information model in Chap. 6. If our aim is to build an information model of some existing domain, then we can build an information model from the bottom-up or the top-down. Through intensive investigation of some domain, we can traverse information situations from the

78

4

Information Modelling from First Principles

scaffolding of existing data structures through to communicative acts and the coordinated activities that rely on such practices. Or, we can reverse the investigation of information situations by starting with what people do in the domain and then by close study of communicative acts come to an understanding of what people identify and describe. If the information model is being built for an entirely new domain of action, then we first need to design the patterns of information situations appropriate to some new area of work. Once this is achieved, then we have a concrete basis for building an information model.

4.12

Conclusion

In this chapter, we have spent some time considering how the core constructs of information modelling—objects, classes, attributes and relationships—relate to our model of information situations discussed in Chap. 2. We have started to portray information modelling as an attempt to build a partial model of institutional ontology, a model focused upon the things identified and described by a group of institutional actors within communicative practice. In the next chapter, we consider the relationship between such a model and reality in greater detail and consider how we begin to investigate the basis for such an information model. In doing this, we shall highlight the key differences between the approach to information modelling promoted in this book and traditional approaches to information modelling, alluded to in the introductory chapter.

4.13

Summary

• An object is some thing a set of actors within some institutional domain takes to exist. • When we group a set of similar objects together and provide a category for such a group, we classify such objects. Objects are then said to instantiate the class. An information class is a sign which stands for an object, or more likely a set of such objects, and serves to categorise or classify such objects. • One information class can be associated with another information class. Association relationships are characterised by two sets of rules: cardinality rules and optionality rules. • Cardinality defines how many instances of one class are related to how many instances of another class. • Optionality establishes whether all instances of a class must participate in a relationship or not. • An information class is characterised by a number of properties or attributes. One or more attributes of the class are chosen to be identifiers for the class.

4.13

Summary

79

• A class may be a sub-class of another class. In which case it is related through a relationship of generalisation. • A class may be part of a container class. In which case it is related through a relationship of aggregation. • A lattice of object classes provides a concrete way of thinking about the notion of institutional ontology and is constructed from objects and classes as well as relationships of attribution, association, generalisation and aggregation.

5

Visualising an Information Model

5.1

Introduction

In Chap. 4, we considered information modelling purely in terms of its major constructs. These core constructs consist of classes, attributes and relationships of association. Additional constructs include relationships of generalisation and aggregation. The modeller can use these constructs to form a representation of important aspects of communicative competence relevant to some institutional domain. Within Chap. 4, we considered a canonical form for the representation of an information model in which a series of constitutive rules written as binary relations is used as the means for capturing the essence of some institutional ontology. However, as we indicated in Chap. 3, information modelling originally developed as a diagramming technique meant to aid the work of analysts and designers of data systems of various forms, particularly relational database systems. Diagramming is used because, rather than trying to understand and capture what is going on or what people would like to happen in words, business analysts tend to use pictures of various forms. Visualisation is not only used to build a collective record of some experience; it can also be used to facilitate creative thinking or to improve analysis of some problem.

5.2

Why Visualise?

As we have seen in previous chapters, it is perfectly possible to build an information model as a written definition of the physical and institutional facts appropriate to some domain. However, information modellers, just like business analysts in general (Beynon-Davies 2021a), prefer to build visualisations of information models. We use the term visualisation here not to refer to the process of forming a mental image of something but to the process of building a diagram or graphical representation of something. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_5

81

82

5 Visualising an Information Model

Analysts like to visualise problems and designers like to visualise solutions because there are some key advantages to visualisation, bound up with adages such as ‘a picture is worth a thousand words’ (Tufte 1990). By this adage is meant that a diagram of something can frequently cover more of some problem situation than a written description. Generally, more can be represented on one visual than in many pages of writing. Visualisations also appear to be especially good at representing the complexity of some situation, particularly situations in which there are many interactions between things. Visualisations also tend to be more engaging than written text. This makes it easier to communicate common understandings of things amongst actors with diverse backgrounds and perspectives. Since visuals are by their very nature highly visible, this helps encourage group working. Finally, visualisations tend to be easier to change than written descriptions or specifications. This makes them extremely suitable for building prototypes as models of possible solutions to problems. It is for these reasons that in this chapter we look at this issue of visualisation in some detail and show how to diagram an information model using one of many possible visual notations. Exercise To gain a feeling for some of the advantages of visualisation, try this exercise. Suppose you have to explain to somebody the complex route to be taken from point A where you are currently situated to point B, a location many miles distant. Try to do this in two ways. In the first way, you verbally describe the route to be taken. In the second way, you draw them a rough route map on a sheet of paper. Which of these two ways of representing a problem situation is likely to be the most effective and why?

5.3

Notations for an Information Model Diagram

As we explained in Chap. 3, ways of conducting information modelling have remained relatively stable for over three decades. To isolate and express the accepted elements of an information model separate from issues of visualisation, we have implicitly used at a number of points what are known as binary relations, which were originally proposed in the work of Richard Frost (1982, 1983). We have used binary relations as what is known as a canonical form for expressing an information model. A canonical form is a basic or standard form for representing something, but which can be translated easily into other forms. A binary relation, as we have seen, can be considered a triple of items, in which the first item is termed the subject, the second the relation and the third the object. The theory of binary relations is useful because it can be shown that many representational formalisms, familiar within information modelling, can be reconstructed from these simple, atomic forms (Frost 1983).

5.3 Notations for an Information Model Diagram Fig. 5.1 Variation in information modelling notation

83

offers Course

Module offered-on

Course

1

offers

M

Module

offered-on

offers Course

Module offered-on

Course

1:1 offers

0:M Module offered-on

0

Course offers

1..1

Module

offered-on

1..*

Course

Module offers

offered-on

Information models are usually mapped out as diagrams, but there is unfortunately no standard notation for diagramming information models. A number of notations used in practice are illustrated in Fig. 5.1. Each of the diagrams in this figure specifies the same subset of an ontology associated with an academic institution, which we might represent as a series of institutional facts in the following manner: [Course OFFERS Module] - a course offers modules [Module OFFERED-ON Course] - a module is offered on courses [OFFERED-ON cardinality One] - A module is offered on one and only one course [OFFERED-ON optionality Optional] - A module does not need to be offered on a given course. [OFFERS cardinality Many] - A course offers a number of modules [OFFERS optionality Mandatory] - A course must offer at least one module. In the following sections, we use one of these notations to consider issues of visualisation in more detail. The resulting visualisation can be easily translated into a visualisation using the other notations. The first notation illustrated in Fig. 5.1 is the one used within this book, primarily because it is the one closest to the original notation proposed by Chen and it is the

84

5 Visualising an Information Model

one most easy to draw quickly on paper. The last notation in Fig. 5.1 is that proposed by the Unified Modelling Language (UML) for class diagramming. But to reiterate, the reader should not worry about the use of any one notation as each of these notations can be readily translated into any of the other notations.

5.4

Visualising Classes

An information class is typically represented upon an information model diagram by a rectangular box in which is written a meaningful name for the class. Note that it is conventional to denote an information class with a singular noun. This is because, as we discussed in Chap. 4, a class represents a category of something. There is only one example of a category, but a category is used to cover many instances of objects. One way to think of this graphic is that the class is setting a boundary around objects relevant to this class. It is dividing up the world into those things included inside the box and hence instances of the class and those things outside the box which represent all other things within the institutional domain in question. Example

We speak of an order and not of orders, a patient and not of patients. Figure 5.2 provides some more examples of information classes from different organisational domains. ◄ Example

Draw the likely information classes from the following description: The stock market is a market for the purchase and sale of securities. Securities come in two major forms: stocks and shares. A stock, sometimes known as a giltedged security or gilt, is a security with an associated interest rate. The most important type of stock are government bonds. Shares are a type of security which pay no interest, but pay a dividend to shareholders at regular intervals. Shares are normally issued by companies to raise capital. ◄

5.5

Visualising Relationships of Association

An association relationship between classes is represented by drawing a line between the relevant boxes on the diagram. In many notations, labels are placed on the relationship lines and are typically used as a way of resolving ambiguity. It must be acknowledged that it is frequently difficult to think of meaningful labels in this manner for relationships and sometimes including labels for relationships is cumbersome to represent on a diagram. Most relationships are best represented by verbs. Verbs however usually imply some direction. Hence, the relationship between person and grade might be read as

5.5 Visualising Relationships of Association

85

Module

Lecturer

Product

Patient

Incident

Location

Customer

Supplier

Student

Order

Payment

Sale

Fig. 5.2 Example information classes

person is graded by grade in one direction and grade grades person in the opposite direction. Example

Figure 5.3 illustrates a number of relationships, some labelled and some unlabelled, between information classes. ◄ Example

Identify further classes and relationships of association from the following description: Persons or institutions which deal in securities on the stock market are known as financial intermediaries. There are two main types of financial intermediary: brokers and market makers. Securities are bought from certain registered market makers by investors. A purchase of a security is known as a deal. ◄ Within the notation proposed by the Unified Modelling Language (UML), modellers are encouraged to assign role names to classes involved in a relationship of association, as illustrated in Fig. 5.4. As a name, a role must clearly be a noun which adds a certain confusion to the situation and departs from standard practice in information modelling. We have adopted a compromise position in this book, in which two labels may be included upon a relationship line but these labels are standard verbs. Where the size of the diagram is particularly large, then these labels are often omitted.

86

5 Visualising an Information Model

Incident

involves

Patient

located-at

Incident

Location site-of

Lecturer

teaches

Module

Student

enrolled

Course

Customer

places

Order

Product

Order

Fig. 5.3 Sample relationships of association

Order

product order

ordered product

Product

Fig. 5.4 An example of the use of role names

5.6

Visualising Attributes

Upon an information model, attributes may be represented by adding their names to the appropriate class box. Attributes are enclosed within the class box itself to represent the way in which they add detail to the description of an information class. However, when an information model becomes full with a large number of

5.6 Visualising Attributes

87

information classes, the attributes associated with particular classes are likely to be left off an information model diagram. Instead, they will be included within an accompanying document to the diagram. Example

Figure 5.5 provides some examples of attributes appropriate to a number of different information classes. The chosen identifiers for each class are underlined. ◄ Exercise Draw the relevant classes with their attributes from the following description: Each market maker will define the state of each type of share it holds in terms of two prices: the offer price and the bid price. The offer price is the price a market maker is willing to sell a share: the price at which an investor will buy. The bid price is the price a market maker is willing to pay for a share: the price at which an investor can sell to him. The difference between the two prices is known as the market makers’ ‘spread’. Different market makers will quote different spreads on shares depending on the state of their book.

Fig. 5.5 Example attributes

Incident incidentNo incidentDescription incidentCategory incidentStatus

Lecturer employeeNo lecturerName lecturerStatus

Patient patientNo patientName patientAge patientCondition patientMedicalHistory

Module

moduleCode moduleName credits

88

5 Visualising an Information Model

5.7

Visualising Constraints upon Association

The cardinality and optionality characteristics of a given relationship of association amount to constraints upon the behaviour of the two classes involved in this association. There are a number of competing notational devices available for portraying the cardinality of an association relationship. A popular and convenient way to represent cardinality is by drawing a crow’s foot on the many end of an association relationship. A crow’s foot is so-called because it looks like the foot of a bird, such as a crow. We assume that the default participation of a class in an association relationship is mandatory. If the participation is optional, we add a circle (an ‘O’ for optional) alongside the relevant class. Hence, if no ‘O’ is present, we assume that the optionality of a class is mandatory. If we want to be certain of our definition, we can use a strike symbol (a line drawn perpendicular through the relationship line) to indicate mandatory status. Example

Figure 5.6 provides a number of examples of relationships with the cardinality and optionality of classes defined for these relationships. ◄ Exercise Draw the cardinality and optionality appropriate for the following domain description: To conduct a deal, an investor issues a broker with an order specification. An investor may place many orders with a broker but may not place any orders with a particular broker. A broker may handle many orders but may not handle deals with certain investors. UML treats the two concepts of cardinality and optionality, somewhat confusingly, through the single idea of multiplicity. An example of this visual notation is illustrated in Fig. 5.7. A specification of multiplicity is placed at each end of an association relationship and consists of a lower bound and upper bound separated by two dots. The lower bound is the minimum value that can be taken by instances or objects of a class, while the upper bound is the maximum value taken. An asterisk is used to indicate an unspecified number of many instances as an upper bound or lower bound. Hence, the multiplicity of a given class could be expressed as 0..3 meaning that there cannot be more than three associated objects in this relationship but there may be none. In Fig. 5.7, the multiplicity 1..1 indicates that there is one and only one object of this class associated with the other class in this relationship, whereas the multiplicity 1..* indicates that there is at least one but possible many objects associated for this class within the relationship.

5.8 Visualising Generalisation

89

Incident

Patient

Incident

Location

Lecturer

Module

teaches

Student

Customer

enrolled

Course

places

Order

Order

Product Fig. 5.6 Example relationships

Order

1..1

1..*

Product

Fig. 5.7 Multiplicity upon an association relationship

5.8

Visualising Generalisation

A generalisation relationship is indicated on an information model diagram by a line drawn between sub-class and super-class with a triangle placed at the head of the line next to the super-class. This is the UML notation for generalisation. Disjoint generalisation is represented by the labels disjoint or overlapping expressed in brackets and placed next to the triangle. Partial generalisation may be indicated by the keywords incomplete or complete expressed in a similar way.

90

5 Visualising an Information Model

Example

Figure 5.8 illustrates two examples of the diagramming of generalisation relationships. The first diagrams the facts that both stock and share are sub-classes of a financial security. Stock and share are also disjoint and complete sub-classes, meaning that a security must be either a stock or a share. The second diagram defines the facts that a broker and a market maker are sub-classes of a financial intermediary. These sub-classes are overlapping and incomplete, meaning that a broker can also be a market maker and vice versa. There also other types of financial intermediary besides a broker and market maker. ◄ Fig. 5.8 Generalisation

Security (disjoint, complete)

Stock

Share

FinancialIntermediary (overlapping, incomplete)

Broker

MarketMaker

5.9 Visualising Aggregation

91

Exercise Draw a generalisation hierarchy for the following case: Stocks can have variable interest rates or fixed interest rates. Fixed interest rate stock is sometimes called debenture or loan capital. Similarly, shares can offer fixed or variable dividends. Fixed dividend shares are sometimes known as preference capital; variable dividend shares are known as equity capital.

5.9

Visualising Aggregation

Graphically, we may depict aggregation as a series of lines or a forked line between the whole and its parts. A diamond is also placed next to the aggregate class. Example

Figure 5.9 illustrates a sample aggregation relationship, indicating that a financial portfolio class is made up of a collection of other classes. ◄ Exercise A patient record is a classic example of an aggregate consisting of personal details, health conditions, treatments, medicines, allergies and past reactions to medicines, scans, X-ray results and lifestyle information such as if the patient drinks and smokes. Try to draw a visualisation of this aggregate.

FinancialPortfolio

Stock

Fig. 5.9 Aggregation

Share

InsurancePolicy

SavingsAccount

92

5.10

5 Visualising an Information Model

Institutional Facts to an Information Model Diagram

Within our discussion within the current chapter as well as Chap. 4, we have introduced rather informally the idea that it is relatively straightforward to move from a set of institutional facts established for some domain to an information model diagram. Let us demonstrate this process more completely here in terms of an extended example. Suppose that we have established through some form of investigation (Chap. 3) the following institutional facts held important to a certain manufacturing domain. The first set of facts establish the set of information classes held important by actors working within this domain. It is therefore convenient to list them as a set of binary relations as follows: [Delivery advice ISA Object Class] [Dispatch advice ISA Object Class] [Customer ISA Object Class] [Delivery item ISA Object Class] [Dispatch item ISA Object Class] [Product item ISA Object Class] [Job ISA Object Class] [Production run ISA Object Class] [Production Schedule ISA Object Class] In a sense, the class Object class is a meta-class here—a class which is normally implicit rather than explicit upon an information model diagram. We next need to establish which class is associated with which other class. A possible set of association relationships is listed as follows, with each relationship named twice for consistency to indicate the direction of each relationship or the role played by a class in the relationship: [Delivery advice DETAILS Delivery item] [Delivery item DETAILED-UPON Delivery advice] [Dispatch advice LISTS Dispatch item] [Dispatch item LISTED-UPON Dispatch advice] [Customer CREATES Delivery advice] [Delivery advice CREATED-BY Customer] [Customer RECEIVES Dispatch advice] [Dispatch advice RECEIVED-BY Customer] [Product item APPEARS-DELIVERY Delivery item] [Product item APPEARS-DISPATCH Dispatch item] [Dispatch item NAMES-DISPATCH Product item] [Delivery item NAMES-DELIVERY Product item] [Delivery item PROCESSED-AS Job] [Job PROCESSES Delivery item] [Dispatch item COMPLETED-AS Job]

5.10

Institutional Facts to an Information Model Diagram

93

[Job COMPLETES Dispatch item] [Product item MANUFACTURES-AS Job] [Job MANUFACTURES Product item] [Production run HANDLES Job] [Job HANDLED-BY Production run] [Production schedule DETAILS-RUN Production run] [Production run DETAILED-ON Production Schedule] The cardinality of each relationship can then be established by indicating whether a given class has at least one or many instances involved in the detailed relationship. Hence, we might specify the cardinality in the following manner: [DETAILS cardinality One] [DETAILED-UPON cardinality Many] [LISTS cardinality One] [LISTED-UPON cardinality Many] [CREATES cardinality One] [CREATED-BY cardinality Many] [RECEIVES cardinality One] [RECEIVED-BY cardinality Many] [APPEARS-DELIVERY cardinality One] [APPEARS-DISPATCH cardinality One] [NAMES-DISPATCH cardinality Many] [NAMES-DELIVERY cardinality Many] [PROCESSED-AS cardinality One] [PROCESSES cardinality Many] [COMPLETED-AS cardinality One] [COMPLETES cardinality One] [MANUFACTURES-AS cardinality One] [MANUFACTURES cardinality One] [HANDLES cardinality One] [HANDLED-BY cardinality Many] [DETAILS-RUN cardinality One] [DETAILED-ON cardinality Many] Note that each relationship label should be unique within any one information model to enable us to unambiguously assign constraints upon the relationship. Also, we have not defined each relationship as an association explicitly, such as [HANDLES ISA Association]. Instead, we have assumed, as in the case of an Object class, that the Association class is a meta-class used implicitly to classify each relationship of association. Finally, the optionality of each relationship needs also to be listed in the following manner: [DETAILS optionality Mandatory] [DETAILED-UPON optionality Mandatory]

94

5 Visualising an Information Model

[LISTS optionality Mandatory] [LISTED-UPON optionality Mandatory] [CREATES optionality Mandatory] [CREATED-BY optionality Mandatory] [RECEIVES optionality Mandatory] [RECEIVED-BY optionality Mandatory] [APPEARS-DELIVERY optionality Optional] [APPEARS-DISPATCH optionality Optional] [NAMES-DISPATCH optionality Mandatory] [NAMES-DELIVERY optionality Mandatory] [PROCESSED-AS optionality Mandatory] [PROCESSES optionality Mandatory] [COMPLETED-AS optionality Mandatory] [COMPLETES optionality Mandatory] [MANUFACTURES-AS optionality Optional] [MANUFACTURES optionality Mandatory] [HANDLES optionality Mandatory] [HANDLED-BY optionality Mandatory] [DETAILS-RUN optionality Mandatory] [DETAILED-ON optionality Mandatory] Given that we have established the relevant institutional facts for the domain, it is a relatively simple process to produce a diagram from this using the conventions discussed in this chapter. First, produce the labelled boxes for each information class. Second, draw and label the lines to represent associations between information classes. Third, add cardinality to the relationships. Fourth, add optionality to the relationships. A completed information model diagram which corresponds to the facts established for this domain is illustrated in Fig. 5.10. Note that we have not listed any attributes for the information classes upon this diagram. This might form part of a more detailed investigation and diagramming effort. Clearly, this diagram does not indicate any generalisation or aggregation. This would be entirely possible by, for instance, making a generalisation hierarchy out of advices, such that a Delivery advice and Dispatch advice are considered sub-classes of an Advice super-class. However, as a general rule, we suggest that the information modeller should only include such relationships of abstraction where they are deemed necessary to illuminate aspects of communication within the domain under investigation. Example

Consider the case where this manufacturing company deals with two types of customer which it refers to as a major and minor customer. Major customers make repeat orders with the company and produce their own delivery advices; minor customers make irregular orders with the company and do not produce their own

creates

lists

listed-upon

namesdispatch

Dispatch item

appearsdispatch

Product item

namesdelivery

Delivery item

appearsdelivery

detailed-upon

details

completedas

manufactures

manufactures-as

processed-as

Job

completes

Production run

detailed-on

handles

handled-by

processes

detailsrun

Production schedule

Institutional Facts to an Information Model Diagram

Fig. 5.10 An information model diagram for a manufacturing domain

Dispatch Advice

received-by

receives

Customer

created-by

Delivery advice

5.10 95

96

5 Visualising an Information Model

Fig. 5.11 Generalisation in the manufacturing domain

Customer

Major customer

Minor customer

delivery advices. In this situation, we might choose to model these institutional facts as a generalisation hierarchy and visualise this situation as in Fig. 5.11: [Major customer AKO Customer] [Minor customer AKO Customer] ◄

5.11

Conclusion

Although the constructs of an information model are relatively standard, there are many different ways of visualising an information model. Within this chapter, we have considered one of the many ways of visualising an information model through a diagram. Visualisations are important for a number of reasons, such as encouraging group working through being highly visible and compact. We have begun to suggest in this chapter that a way of building an information model diagram is from an established set of physical and institutional facts appropriate to the domain in question. This we refer to as the process of composing an information model. To compose something, we make or form some representation from its perceived constituent elements. In the next chapter, we consider this approach of composition in much more detail.

5.12

Summary

• Three basic constructs are used in information modelling as a business analysis technique: classes, relationships and attributes. Relationships can be relationships of association, generalisation or aggregation. • An information class is typically represented upon a diagram as a labelled box. • A relationship of association is typically indicated upon a diagram as a line drawn between related classes. Cardinality constraints are typically indicated by notating the many end of the relationship with some graphic such as a crow’s foot. The

5.12

• • • •

Summary

97

optional end of some relationship is indicated by some other notation upon the relationship line such as a circle. Attributes can be represented as labels nested within the class box upon the diagram. For large information models, the attributes are typically left off the diagram. A generalisation relationship is indicated upon an information model diagram by a line drawn between sub-class and super-class with a triangle placed at the head of the line next to the super-class. We may depict aggregation as a series of lines or a forked line between the whole and its parts. A diamond is also placed next to the aggregate class. From a set of institutional facts established for some domain, it is relatively straightforward to compose an information model diagram.

6

Composing an Information Model from Institutional Facts

6.1

Introduction

The key question faced by any information modeller is, where do I start? As we have indicated in previous chapters, this is a question not addressed adequately by conventional literature. The main reason for this is that such literature does not work with any established theoretical understanding of the ‘material’ that information modelling deals with (Chap. 2), as well as what information modelling is attempting to do with this ‘material’ (Chap. 3). Both issues arise from the fact that conventional approaches to information modelling work with a narrow and unproductive conception of the institutional domains with which information models engage. As we have seen, within the current book, we have promoted the idea of considering an information model as a model of important aspects of institutional ontology. Institutional ontology is what actors within some domain deem to exist, how they communicate about such things and how they use such communication to coordinate joint activity. Institutional ontology provides to such actors a way of making sense of both physical and institutional facts about reality and through this to construct and reconstruct this reality. This is why we have proposed that an information model must necessarily be focused as a model upon the patterns of instrumental communication relevant to some domain of institutional action. This way of thinking about both the content and the purpose of information modelling allows us to develop a clear way of composing an information model which does justice to some institutional ontology under investigation. Within the current chapter, we shall demonstrate how to build information models from an analysis of the instrumental communicative practices within some domain. The approach can also be readily adapted to designing an information model for some new domain of action. The steps of this approach, which are described in more detail within this chapter, are as follows:

# The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_6

99

100

6

Composing an Information Model from Institutional Facts

• Develop a model of the pattern of information situations under consideration. This may be a model of some existing or as-is pattern or a model of some possible or as-if pattern. • Unpack the content of the messages used within this communicative pattern into the constructs of an information model—classes, attributes and relationships. • Generate a set of binary relations which adequately reflects the content of communicative acts as a series of abstractions of relevant physical and institutional facts. • Form the set of physical and institutional facts into a complete information model and produce a visualisation of this model using one of the many possible notations. • Check the validity and consistency of this model with actors or stakeholders within the domain in question. • Revise the information model if necessary.

6.2

A Pattern of Information Situations

Within Chap. 2, we described the proper context for an information model as being some domain of institutional action. More precisely, an information model cannot be built without some detailed understanding of the set of communicative practices undertaken by actors within the domain under consideration currently or likely to be undertaken in some new domain of institutional action. One way of building a communicative pattern for an existing situation is to observe and collect together a series of actual communicative practices within the domain under consideration. Hence, in the case of medical emergency response, one might collect samples of emergency calls, dialogue within the control centre and communications between dispatchers and ambulance resources. Such communicative practice might then be analysed using one or more established approaches for dealing with extended stretches of conversation such as content analysis or discourse analysis. However, in line with the general way in which business analysts typically approach requirements elicitation, one would normally expect a communicative pattern to be built from a series of extended and unstructured interviews or workshops with domain stakeholders. We discussed such investigation techniques in Chap. 3. Elicitation of whatever form is a way-station to making sense of the communicative situation under investigation. To do this, we need some understanding of not only what is communicated and by whom but the sequencing of communication and how this relates to coordination of activity within the domain. The representation of communicative acts within some communicative pattern, as well as the chronology of such acts, is necessarily a narrative abstraction of the communicative situation—a story of who communicates about what with whom and in what sequence. Each speech bubble placed upon the visualisation of a communication pattern is an abstraction of a range of actual communications we might find while investigating

6.2 A Pattern of Information Situations

101

some domain, such as an ambulance control centre and its associated ambulance crews. Example

Hence, the content—a medical emergency has taken place at location X on person Y—is an abstraction of the communicative practices or utterances between callers and call-takers within ambulance control. ◄ Consider the emergency ambulance case as a pattern of information situations first discussed in Chap. 2. This pattern can be written as a narrative in something like the following manner. 1

2

3

4

5

6

7

8

The lifecycle begins when telephone operators take an emergency call. The caller’s area code or closest mobile phone cell is identified from the call, which is then routed to the ambulance control centre At the control centre, a call-taker matches the call number with a physical address using a computerised map (or gazetteer) of the area covered by the service. The call-taker asks a pre-established series of questions of the caller(s), prompted by a set of rules embedded in the incident system Most ambulance services in the UK now institute a process of ‘triage’ to enable prioritisation of response to incidents. Calls are classified as category A (life-threatening), category B (serious but not life-threatening) or category C (does not require emergency response). On this basis, further decisions are made about the dispatch of resources to such incidents, taking account of two national targets set for response times to category A and B calls. Within the UK, ambulance services are required to reach 75% of category A calls within 8 min and 95% of category B calls within 19 min. For category C calls, patients are referred to other healthcare providers or transferred to a paramedic who will offer medical advice Assuming a call is categorised as either A or B, an emergency incident is declared and the location entered in an incident management system by the call-taker. A dispatcher will start to listen in to the call at this point The task of the dispatcher is to assess the most appropriate resource to send to the incident using a screen indicating a plan designed to maximise the efficient use of resources (known as the system status management or SSM plan), a screen listing the status of all resources and a screen which plots the current location of such resources against a computerised map. The SSM plan is an attempt to dynamically deploy resources around the area covered by an ambulance service according to demand patterns established for day and time, geographical area and clinical urgency Using this technology and her knowledge of the local area, the dispatcher selects and assigns a resource to the emergency incident. The dispatcher uses a radio message to inform the crew about the location of the incident (including a map grid reference) and reported details of the patient’s condition While the dispatcher is conducting this task, the call-taker will be giving pre-arrival advice to the caller. In certain extreme cases, the call-taker will remain in continuous communication with the caller until the ambulance arrives at incident Following receipt of an incident alert from the control room, and once mobile, a member of ambulance crew presses a button on their communication set to indicate departure. Crews are guided by satellite navigation to the incident location, supplemented with radio communication from the control room (continued)

102

9

6

Composing an Information Model from Institutional Facts

Upon arrival, a member of crew presses an arrive button on the communication set. A paramedic then administers any immediate treatment required at the scene and communicates the medical condition of the patient back to ambulance control The dispatcher will enter details of the patient condition and the treatment administered into the incident system. If the patient condition is sufficiently serious, the dispatcher will request of a general patient admissions system to suggest possible hospitals to admit the patient based upon the patient condition, the location of the incident and the location of hospitals. If the patient condition is deemed non-serious, then the ambulance resource makes itself available for further allocation In the case of further treatment being required, the dispatcher will select an admitting hospital and communicate the patient condition and likely time of arrival to the emergency department of this hospital. The admitting hospital is indicated to the ambulance crew by the dispatcher. When the patient is deemed ready, she is moved into the ambulance and prepared for departure. A crew member then presses a leave scene button Upon arrival at the general hospital, an at hospital button is pressed As soon as a cubicle is available in the emergency department, the patient is admitted Finally, the crew presses a clear button which declares that they are available to be allocated as a resource again

10

11

12 13 14

From a narrative such as the one provided here, it is possible to identify a range of communicative acts appropriate to this institutional domain. Such communicative acts are essential to the coordination of the activity of numerous actors such as callers, call-takers, dispatchers, ambulance drivers and paramedics. The investigator will need to abstract the detail of these communicative acts from a close understanding of the communicative practice taking place in the control room, the ambulance, the incident site and the accident and emergency department of the general hospital. This will probably not only involve interviews with key stakeholders but close observation and possible participation in such settings. As introduced in Chap. 3, we have found it useful in other work to visualise this pattern (Beynon-Davies 2021a) as illustrated in Fig. 6.1. Each element of the narrative is numbered to correspond to an appropriate element of the visualisation in this figure.

6.3

Unpacking the Content of Messages

Assuming that a communicative pattern can be built from our analysis of some existing situation or from a design of some new situation, then the next step is to take each act of communication in turn and unpack the content of messages. This cumulative content can then be converted into the constructs of an information model, namely, classes, attributes and relationships. Example

For example, take the first communicative act from the pattern illustrated in Fig. 6.1 and presented in greater focus within Fig. 6.2.

Paramedic

Ambulance driver

Admitting nurse

Ambulance driver

Assert departure to hospital

Dispatcher

DIRECT[take patient X to hospital Y]

Ambulance driver

11

Yes

14

Declare incident closed

Caller

No

Dispatcher

DIRECT[ Hospital admission ]

10

3

Category C Call taker

Declare treatment administered

10

ASSERT [Possible hospitals]

Admissions system

admitting

Incident system

3

DIRECT[ Call category] Category A or B

Call taker

DECLARE [patient has condition X, received treatment Y]

Dispatcher

End of pattern

Communicate category c actions

Call taker

Call taker

DIRECT[Emergency is category A/B/C]

9

Category A or B

COMMIT[An ambulance will respond within X minutes]

Dispatcher

Assert incident arrival

Paramedic

ASSERT[Patient X has condition Y and received treatment Z]

Ambulance driver

Ambulance driver

ASSERT[Arrived at incident X]

Ambulance driver

Dispatcher

Assert incident departure

DIRECT[Go to location X and attend incident Y]

COMMIT[We are leaving to respond to incident X]

Paramedic dispatcher

ASSERT[Departing to incident X]

Communicate Call taker category a/b actions

7

SSM plan

5 Decide upon resource

DIRECT[Resource type X needs to be sent to incident Y]

ASSERT[ available resources at locations]

DIRECT[Medical actions X need to be taken before ambulance arrives]

Dispatcher

Caller

Incident system

DECLARE[Incident X has occurred at location Y on patient Z]

4 Declare incident

3 Deciding upon response

ASSERT[Medical emergency X is of this form]

Paramedic dispatcher

DIRECT[Medical actions X need to be taken]

Dispatcher

DECLARE[Incident X is now closed and resource Y is available for dispatch]

DECLARE[Patient X is admitted to hospital Y]

Admissions system

ASSERT[Leaving for hospital X]

Dispatcher

ASSERT[Arrived at hospital X]

Assert patient admitted

ASSERT[This is the handover of patient X with condition Y]

Call taker

DIRECT[Find caller and emergency Call taker location] Gazeteer

ASSERT[Caller X and emergency location Y]

Fig. 6.1 Emergency response as a system of communication

at hospital

12 Assert arrival

13

Caller

ASSERT[A medical emergency incident has taken place at location X on person Y]

2 Identifying locations

1

Notification of emergency

8

Dispatcher

Instruct resource

6

6.3 Unpacking the Content of Messages 103

104

6

Composing an Information Model from Institutional Facts

COMMUNICATIVE ACT Notification of emergency

INTENT

CONTENT

ASSERT[A medical emergency has taken place at location X on person Y]

ACTOR

Caller

Call taker

Fig. 6.2 Unpacking a communicative act

The content of this communicative act is: [A medical emergency has taken place at location X on person Y] This content is an abstraction of the key elements taken from an analysis of the range of emergency calls taken by ambulance control. As such, it identifies the key things or objects of interest that serve to trigger further actions by actors such as call-takers, dispatchers and ambulance crew. This content can be unpacked as a series of institutional facts using the constructs of classes, attributes and relationships as described in Chap. 4. Hence, such facts which are immediately apparent from this content include: [Medical emergency OCCURS-AT Location] [Medical emergency INVOLVES Person] [Location SITE-OF Medical emergency] [Person INVOLVED-IN Medical emergency] ◄ Within this specification of the communicative act, Medical emergency, Location and Person are likely information classes. OCCURS-AT and INVOLVES are two relationships of association with their reverse names or roles being SITE-OF and INVOLVED-IN. The relationship of association [Medical emergency INVOLVES Person] establishes the context of the relationship between the named person and the

6.3 Unpacking the Content of Messages

105

specific medical emergency. The association relationship [Medical emergency OCCURS-AT Location] specifies the relationship between the particular medical emergency and an established map location. But the communicative act within this pattern forms only the starting point for an analysis of the wider information situation within which it occurs. Probably in structured conversation with key stakeholders, a number of other institutional facts will become apparent which serve to provide greater depth to the model of information. For a start, it is likely that we need to add to our information model some reference to both actors involved in the communicative act—Caller and Call-taker. We will probably also wish to record data relating to the medium by which the communication occurred. In other words, in this case, we need to include a class Emergency call. In total, this adds the following institutional facts to our specification: [Caller MAKES Emergency-call] [Emergency call MADE BY Caller] [Call-taker HANDLES Emergency call] [Emergency call HANDLED BY Call-taker] This adds two further classes and relationships to our information model. But, knowing that we have a set of information classes means that we also know that there will be a range of identifiers needed as attributes of these classes. Hence: [Medical emergency REFERENCE incidentNo] [Person REFERENCE personNo] [Location REFERENCE locationRef] [Emergency call REFERENCE callNo] [Caller REFERENCE callerID] [Call-taker REFERENCE calltakerID] As we have seen in Chap. 4, the relation REFERENCE is a special form of relation which relates a class to an identifier. From further investigation, we might further infer that we would wish to record other attributes of certain classes, such as that an emergency call has a start time and end time and that a person of concern has a name, sex and age. This makes the total set of institutional facts analysed so far as being: [Medical emergency OCCURS AT Location] [Medical emergency INVOLVES Person of concern] [Location SITE OF Medical emergency] [Person of concern INVOLVED IN Medical emergency] [Caller MAKES Emergency call] [Emergency call MADE BY Caller] [Call-taker HANDLES Emergency call] [Emergency call DESCRIBES Medical emergency]

106

6

Composing an Information Model from Institutional Facts

Caller makes made by Emergency call called from

handled by

describes

occurs at

described handles by Medical emergency Call taker involves involved in Person of concern base of

site of Location

Fig. 6.3 A first-pass information model of emergency response as a visualisation

[Medical emergency DESCRIBED BY Emergency call] [Emergency call HANDLED BY Call-taker] [Call-taker HANDLES Emergency call] [Emergency call CALLED FROM Location] [Location BASE OF Emergency call] [Person REFERENCE personNo] [Location REFERENCE locationRef] [Emergency call REFERENCE callNo] [Caller REFERENCE callerID] [Call taker REFERENCE calltakerID] [Person HASA name] [Person HASA sex] [Person HASA age] [Emergency call HASA start time] [Emergency call HASA end time] Using the procedure for translating a series of institutional facts into a visualisation discussed in Chap. 5, we can produce a first-pass information model as illustrated in Fig. 6.3. Note that in making this visualisation, we have assumed an appropriate cardinality and optionality for each of the relationships of association on our model. These assumptions will need to be confirmed in structured conversation with key stakeholders and if necessary revised.

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Call taker

Caller

Paramedic

Ambulance driver

Admitting nurse

Ambulance driver

Dispatcher

Admissions system

Dispatcher

Dispatcher

Incident Admissions system system

Dispatcher

Dispatcher

Call taker

Dispatcher

Resource system

Incident system

Paramedic dispatcher

Ambulance driver

Ambulance driver

Dispatcher

Paramedic

Ambulance driver

Caller

Ambulance driver

Call taker

Gazeteer

Paramedic dispatcher

Call taker

Call taker

Dispatcher

Actors

ASSERT[Arrived at incident X]

to

DECLARE[Patient X is admitted to hospital Y]

DIRECT[take patient X hospital Y]

admitting

ASSERT[Patient X has condition Y and received and treatment Z]

DIRECT[Medical actions X need to be taken before ambulance arrives]

DIRECT[Go to location X and attend incident Y]

DIRECT[Resource type X needs to be sent to incident Y]

DECLARE[Incident X is now closed and resource Y is available for dispatch]

ASSERT[This is patient X with condition Y]

ASSERT[Arrived at hospital X]

ASSERT[Leaving for hospital X]

ASSERT [Possible hospitals]

ASSERT[Departing to incident X]

COMMIT[An ambulance will respond within X minutes]

COMMIT[We are leaving to respond to incident X]

ASSERT[ available resources at locations]

DECLARE[Incident X has occurred at location Y on patient Y]

ASSERT[emergency X is of this form]

DECLARE [patient has condition X and received treatment Y]

DIRECT[These actions need to be taken]

ASSERT[Caller X and emergency location Y]

DIRECT[Emergency is category A/B/C]

ASSERT[A medical emergency has taken place at location X on person Y] DIRECT[Find caller and emergency location]

Communicative acts

?[Hospital admission]

?[Call category ]

Incident closed

Handover patient

Arrived at hospital

Leaving for hospital

Declare condition

Arrived at incident

Departing to incident

Commit to respond

Leaving for incident

Direct type of resource

Declare incident

Classify emergency

Emergency call

Terms

Emergency incident

Patient admission

Hospital

Patient

Patient

Emergency incident

Emergency incident

Ambulance resource

Direct incident

Ambulance resource

Emergency incident

Emergency category

Location

Person

Resource available

Patient

Patient condition

Assert condition

Departing to incident

Response time

Emergency incident

Resource type

Location

Category A

Caller

Medical emergency

Ambulance driver

Hospital

Ambulance driver

Hospital

Treatment administered

Patient condition

Ambulance driver

Take medical actions

Location

Location

Incident system

Category B

Gazeteer

Caller

Call-taker

Call-taker

Dispatcher

Patient condition

Dispatcher

Ambulance driver

Admitting hospital

Treatment administered

Dispatcher

Call-taker

Dispatcher

Resource system

Call-taker

Category C

Table 6.1 Actors, acts and terms within the communicative pattern of emergency response

Admitting nurse

Dispatcher

Dispatcher

Ambulance driver

Caller

Ambulance driver

Dispatcher

Patient

Call-taker

Location

Paramedic

Incident system

Dispatcher

Paramedic dispatcher

Paramedic dispatcher

Admissions system

Admissions system

Paramedic

SSM plan

6.3 Unpacking the Content of Messages 107

108

6

Composing an Information Model from Institutional Facts

Exercise Generate the institutional facts that define the cardinality and optionality of information classes in the various relationships of association detailed in Fig. 6.3.

6.4

Generating Institutional Facts

The process described in the previous section needs to be undertaken for each of the information situations represented in the communicative pattern under consideration. In this manner, the modeller will generate a complete set of terms used within and about the communicative acts represented upon a communicative pattern. Table 6.1 extracts such terms and presents them alongside the actors and acts in which such terms appear in Fig. 6.2. As you can see, most of the terms consist of things identified and described, such as emergency incident, patient, ambulance resource and location. But the table also includes appropriate terms for communicative actors that participate in communicative acts—dispatcher, caller, paramedic and incident system. Finally, the modeller needs to include terms for certain critical communicative acts themselves that are likely to be referred to and described: emergency call, arrived at incident or admit patient. As we have seen, to form binary relations as representations of the facts appropriate to this institutional domain, we need to make decisions as to whether the terms identified in Table 6.1 are used to identify or describe things of interest in the communicative context under consideration. Most of the terms present within Table 6.1 would serve to classify and thus to identify things of interest. Thus, we can infer the presence of institutional identifiers for classes such as: [Patient REFERENCE nhsNo] [Medical emergency REFERENCE emergencyNo] Classifying terms also relate to terms which serve primarily to designate or describe. Hence, patient condition serves to describe the diagnosed medical condition of an identified patient, while treatment administered designates a course of medical intervention undertaken upon a patient. These are represented as relations of attribution, such as: [Patient HASA Medical condition] [Patient HASA Medical treatment] Finally, the modeller needs to decide whether things referred to by classifying terms co-occur within the communicative pattern under consideration. When there is

6.5 Validating an Information Model

109

evidence of terms relating to other classifying terms, then we have instances of association. These are represented by free-ranging predicates, such as: [Call-taker HANDLES Emergency call] [Emergency call CALLED FROM Location] [Emergency incident INVOLVES Patient] This means that we make sense of the communicative pattern of medical emergency response in terms of the list of binary relations represented in Table 6.2. Note that the terms forming classes, attributes and associations are defined here on first occurrence in the pattern and not repeated in the table. Of course, we need to perform this translation for each communicative act visualised on a communicative pattern and collate the various classes, attributes and relationships together to form a complete information model which adequately describes the communicative practice in this domain. Such a model is illustrated in Fig. 6.4. Note that we have left off the labels for relationships in this diagram, merely because of issues of space.

6.5

Validating an Information Model

Clearly, one information model may be better than another in modelling the institutional reality under consideration. Traditionally, the quality of an information model would be judged in terms of features or facets such as accuracy, completeness, simplicity and elegance. Accuracy is considered in terms of how closely the model represents the reality. Completeness is considered in terms of whether or not the model completely covers the reality being considered. Simplicity refers to the use of the minimum number of constructs required to model the domain. Finally, elegance refers to the degree to which an information model is easily understood both by business and technical actors. We would argue that the notion of information model quality is very much bound up with the issue of validity. It is important to re-confirm the validity of an information model with domain actors, because the ‘quality’ of any information model can only be established through acts of sense-making. In other words, rather than thinking of an information model in terms of ‘accuracy’ or ‘completeness’, the modeller needs to ask—does my representation do justice to the patterns of communicative action in the domain? In this sense, an information model is necessarily a pragmatic construct—it focuses upon and is always oriented towards action. At first glance, the information model illustrated in Fig. 6.4 may appear overcomplex, because it attempts to encapsulate all, not part of, the communicative context illustrated in Fig. 6.1. The classes and associations on such an information model might appear to represent undisputed things of interest for actors within this domain. In practice, classes such as patient, medical emergency and emergency incident act as signs which help ‘scaffold’ this institutional order. What constitutes or should constitute a patient and what constitutes a true emergency and thus a valid

110

6

Composing an Information Model from Institutional Facts

Table 6.2 Binary relations pertinent to the terms in Table 6.1 1

2

3

Class [Emergency call REFERENCE callNo] [Person REFERENCE name] [Medical emergency REFERENCE emergencyNo] [Caller REFERENCE callerNo] [Call-taker REFERENCE handlerID] [Location REFERENCE locationID] [Gazetteer REFERENCE versionID]

[Paramedic dispatcher REFERENCE practitionerID] [Category C response REFERENCE responseID] [Classify emergency REFERENCE eventID]

Attribute [Medical emergency HASA emergencyDescription] [Person HASA personDescription] [Caller HASA callerDescription]

Association [Caller MAKES Emergency call]

[Call-taker HASA calltakerDescription]

[Person INVOLVED IN Medical emergency]

[Location HASA locationDescription] [Location HASA coordinate X] [Location HASA coordinate Y] [Medical emergency HASA emergencyCategory] [Classify emergency HASA eventDateTime]

[Emergency call CALLED FROM Location] [Medical emergency OCCURS AT Location] [Location IDENTIFIED IN Gazetteer] [Paramedic dispatcher CLASSIFY EMERGENCY Medical emergency] [Medical emergency BECOMES Category C response] [Medical emergency BECOMES Emergency Incident] [Call-taker ISSUES Category C response] [Call-taker TAKE MEDICAL ACTION Caller] [Emergency incident OCCURS AT Location]

[Take medical action HASA eventDateTime]

[Take medical action REFERENCE eventID]

4

5

[Emergency incident REFERENCE incidentID] [Incident system REFERENCE versionID]

[Patient HASA DateOfBirth]

[Patient REFERENCE nhsNo] [Declare incident REFERENCE eventID]

[Patient HASA Sex]

[Ambulance resource REFERENCE resourceID]

[Patient HASA Name]

[Declare incident HASA eventDateTime] [Emergency incident HASA startTime] [Ambulance resource HASA resourceType]

[Call-taker HANDLES Emergency call] [Emergency call ABOUT Medical emergency]

[Call taker DECLARE INCIDENT Emergency incident] [Emergency incident INVOLVES Patient] [Person BECOMES Patient] [Emergency incident RECORDED IN Incident system] [Ambulance resource CURRENTLY AT Location] (continued)

6.5 Validating an Information Model

111

Table 6.2 (continued)

6

7

Class [Resource system REFERENCE versionID]

Attribute [Direct resource HASA eventDateTime]

[SSM plan REFERENCE versionID]

[Availability HASA eventDateTime]

[Direct resource REFERENCE eventID] [Availability REFERENCE eventID] [Dispatcher REFERENCE dispatcherID] [Ambulance driver REFERENCE driverID] [Leaving for incident REFERENCE eventID] [Direct incident REFERENCE eventID] [Commit to respond REFERENCE eventID]

[Leaving for incident HASA eventDateTime]

[Ambulance driver LEAVING FOR INCIDENT Dispatcher]

[Direct incident HASA eventDateTime]

[Dispatcher DIRECT INCIDENT Ambulance driver]

[Call-taker COMMIT TO RESPOND Caller]

8

[Departing to incident REFERENCE eventID]

[Commit to respond HASA responseTime] [Commit to respond HASA eventDateTime] [Departing to incident HASA eventDateTime]

9

[Arrived at incident REFERENCE eventID] [Assert condition and treatment REFERENCE eventID] [Hospital REFERENCE hospitalID]

[Arrived at incident HASA eventDateTime] [Assert condition and treatment HASA eventDateTime] [Patient HASA patientCondition]

[Declare condition and treatment REFERENCE eventID]

[Patient HASA treatmentAdministered]

10

11

12

[Leaving for hospital REFERENCE eventID] [Direct hospital REFERENCE eventID] [Arrived at hospital REFERENCE eventID]

Association [Paramedic dispatcher DIRECT RESOURCE Ambulance resource] [Resource system AVAILABILITY Ambulance resource] [Ambulance resource SSM PLAN Location]

[Declare condition and treatment HASA eventDateTime] [Leaving for hospital HASA eventDateTime] [Direct hospital HASA eventDateTime] [Arrived at hospital HASA eventDateTime]

[Ambulance driver DEPARTING TO INCIDENT Dispatcher] [Ambulance driver ARRIVED AT INCIDENT Dispatcher] [Paramedic dispatcher ASSERT CONDITION AND TREATMENT Dispatcher] [Dispatcher DECLARE CONDITION AND TREATMENT Incident system] [Emergency incident RECORDED IN Incident system]

[Ambulance driver LEAVING FOR HOSPITAL Dispatcher] [Dispatcher DIRECT HOSPITAL Ambulance driver] [Ambulance driver ARRIVED AT HOSPITAL Dispatcher] (continued)

112

6

Composing an Information Model from Institutional Facts

Table 6.2 (continued) 13

14

Class [Admitting nurse REFERENCE nurseID] [Patient handover REFERENCE eventID] [Patient admission REFERENCE eventID]

Attribute [Patient handover HASA eventDateTime] [Patient admission HASA eventDateTime]

[Incident closed REFERENCE eventID] [Resource available REFERENCE eventID]

[Incident closed HASA eventDateTime] [Resource available HASA eventDateTime]

Association [Paramedic PATIENT HANDOVER Admitting nurse] [Admitting nurse WORKS AT Hospital] [Admitting nurse PATIENT ADMISSION Patient] [Hospital PATIENT ADMISSION Patient] [Patient admission RECORDED IN Admissions system] [Ambulance driver INCIDENT CLOSED Dispatcher] [Dispatcher RESOURCE AVAILABLE Availability]

emergency incident is a continuous source of sense-making for participating actors within the domain of medical emergency response. As we have already mentioned, for instance, an emergency call only becomes a medical emergency and consequently an emergency incident through the ways in which actors such as paramedic dispatchers triage events. An emergency call only becomes the institutional fact of an emergency incident if it is deemed sufficiently ‘serious’ to warrant dispatch of an ambulance. One of the main practical advantages of this approach to composing an information model is that such a model displays greater flexibility to accommodate change to institutional action. For example, in 2005, the UK government recommended that targets set for responding to emergency calls should be measured consistently across the UK. It suggested that the clock should start ticking when an emergency call is connected to the control centre and not when the call-taker declared an emergency incident, which is what most UK ambulance services had been measuring. Following adoption of this subtle recommendation, UK ambulance services spent years re-configuring their IT systems, because on average the difference between connecting a call and identifying an incident is as much as 1 min. The information model in Fig. 6.4 distinguishes between an emergency call and an emergency incident and thus easily accommodates this change to the measurement of performance. Indeed, as suggested in Table 6.2, the information model has the potential to log every critical state communicated about within the lifecycle of an incident. More interestingly, statistics collected on the practice of emergency response reveal that while 30% of calls are categorised as life-threatening by call-takers and ambulance dispatchers, only 10% of such incidents turn out to be life-threatening in nature. Also, 77% of all emergency calls result in a journey to a local hospital, but only 40% of these patients are eventually admitted for treatment. There are complex reasons for this situation. Nevertheless, various ambulance units have attempted to

6.5 Validating an Information Model

Caller

113

Call-taker Paramedic dispatcher

Person

Take medical action

Key Commit to respond Class Category C response

Emergency call

Declare incident

Classify emergency

Direct resource

Cardinality: Cardinality: one many

Involved in

Category C becomes

Association

Optionality: optional

Medical emergency

Optionality: mandatory

Becomes

Category A/B becomes

Emergency Incident

Involves Resource system

Ambulance resource

Direct incident

Incident system

Patient

SSM plan

Currently at

Availability Admissions system

Location

Resource available

Paramedic

Gazeteer

Patient handover

Assert patient condition and treatment

Patient admission

Ambulance driver

Direct incident

Leaving for incident

Departing to incident

Arrival at incident

Direct hospital

Leaving for hospital

Arrived at hospital

Incident closed

Declare condition and treatment

Hospital

Dispatcher

Admitting nurse

Fig. 6.4 An information model for emergency response

make changes to such breakdowns in practice, to meet the implicit intentions expressed in such measurement. For instance, some have begun to re-configure their IT systems to collect a patient summary containing not only important medical data about the patient but also a history of interaction with the ambulance service. This inherently amounts to a re-configuration of the notion of what patient and incident means to emergency response. It is hoped that records based upon such re-configuration will not only allow call-takers to refine the process of triaging patients and incidents but also better signal to an ambulance crew what to expect at incidents and consequently how better to perform. It should be evident on the information model in Fig. 6.4 that a clear distinction is drawn between a person described within the context of an emergency call and an eventual patient responded to. This might enable a more nuanced history of interaction with a service, which might better inform changes to practice.

114

6

Composing an Information Model from Institutional Facts

The example of emergency response used here is based on analysing an existing situation and producing an information model of this domain. But information modelling is equally as relevant to the design of new domains of institutional activity. The process of composition then becomes one of envisaging some possible or as-if pattern of information situations in sufficient detail so that it becomes possible to unpack the content of the messages used within the new communicative pattern and translate these into the constructs of an information model.

6.6

Revising Information Models

There is an important consequence of the view of information models promoted in this book. If information models are attempts to model institutional ontology and institutional ontology relies upon patterns of information situations, then information models never stand still. Any domain of institutional action will continuously experience changes to the way in which data structures are articulated, messages are communicated and activity is coordinated. Because an information model must continually reflect changes to institutional action, it is therefore essential that an information model is continuously revisited and revised where necessary. Example

Consider the issue of marriage as an institutional fact. Recently, in the UK and in a number of other countries, the definition of what constitutes marriage has changed. Prior to this change, marriage could only occur between two person of different sex—one person being male and another person being female. Hence, we might have modelled this as having marriage as an association between a male and a female. Now, same-sex marriage is acknowledged in law. This requires us to change our model to perhaps something like a unary relationship between a person class, where a person is a generalisation (a super-class) of a male or a female. ◄ Exercise Draw an information model for the original definition of marriage and the new institutional definition of marriage. Another way of putting this is that information models are important to data administration within organisations of all forms. Data administration is generally seen to be that function concerned with the management, planning and documentation of the data resource of some organisation and as such is seen to be important to the effective control, security, integrity and sharing of data both within some organisation and between organisations. The key driver towards data administration has been that data, like capital, personnel, etc., should be treated as a manageable resource. In other words, data are seen as a critical commodity in an organisation’s

6.6 Revising Information Models

115

attempt not only to operate effectively but also to adapt to its changing environment. However, there are key problems with managing data as ‘commodity’. For instance, a number of units, departments or services within the organisation might collect data on similar things of interest but in radically different ways. Some units may collect data but have no clear idea why they collect this data. Certain other organisational actors may believe that there are notable gaps in the data collected by the organisation. Where data are collected, it may be inconsistent or untimely or irrelevant. This means that users of such data may feel it is too unreliable to be useful or may receive data too late for it to be useful. More worryingly, decision-makers within the organisation may receive conflicting data from different sources within the organisation. Example

Consider why data is such a critical resource for a university in terms of its operations such as teaching. Without data, such as what students it has, what students are taking which modules and what grades have been achieved by students, a university is unable to operate effectively in teaching, grading and awarding students. But what if different university departments or schools maintain their own distinctive collections of data about students with their own distinctive definitions for such data structures? What if data is frequently missing or incomplete and out of date? Also, what if there is incomplete knowledge amongst both administrative and academic staff as to what data is collected and where it is kept. In such situations, various staff within the university may spend a substantial amount of their time resolving problems with such data. Such situations demonstrate the key need for some systematic way of managing the data resource on an organisation-wide rather than a unit-wide basis. ◄ Data administration is hence an attempt to develop some order from the potential chaos in which data structures are articulated across an institution. Data administration also involves planning the data required for future action. Hence, data administration concerns itself with a number of themes associated with data definition and use. In terms of data definition, administrators implement standards for the definition of data and attempt to control the media for the recording and communication of such definitions. Administrators also implement data control practices that define and police access to data resources. They also attempt to ensure the integrity of data and that it is secured from threats. This also means implementing procedures to ensure that the organisation complies with any legislation concerning data privacy. Finally, data administrators encourage sharing of data across applications and promote the idea that data as a resource is independent of IT applications and its users. To achieve such goals, traditionally, data administration is conducted through the development of data dictionaries, which attempt to encapsulate the metadata of the organisation—data about data (more about metadata in Chap. 9). Alternatively, data administrators may seek to develop corporate or enterprise information models.

116

6

Composing an Information Model from Institutional Facts

Prefixing the term information modelling with the word corporate would tend to suggest an elevation of this practice to the level of the whole organisation. A corporate information model should form a map of institutional ontology of the whole or a substantial part of an organisation. This differs from an application information model which is produced to support a specific organisational function or IT development project. Data administrators then attempt to use these ‘maps’ to control the design and use of the data resource of the organisation, typically by enforcing levels of standardisation of data structures across organisational units and in order to ensure better data integration and consequent data sharing both within and between organisational units. Information models are an important tool in the armoury of the data administrator. An information model, as we have seen in previous chapters, provides understanding of the things of interest to organisational actors. Such things or objects have to be identified in a consistent manner. A data administrator should ensure that identifiers for objects like products, people, invoices and orders should be designed to have three key features. First, an identifier, as a matter of definition, should be uniquely associated with one and only one object. Every object, such as an instance of a product, should have one and only one identifier. An identifier should be assigned immediately on creation of an object. An identifier should not contain any details about the object it identifies. It should serve the sole function of identifying an object. The reason for this is that so-called non-mnemonic identifiers maintain stability over time. Example

Consider the case where a company uses a three-digit code to identify its products. The first digit is used to indicate the warehouse where the product is stored. Now suppose the company decides to change its warehousing practice and moves all products of a particular type from one warehouse to another. This will necessitate changing all the product codes for the products moved. ◄ A natural consequence of this discussion is that it is important to administer and control the assignment and use of identifiers in organisations. It is also usually not good practice to rely on identifiers supplied by external agencies. Example

Suppose an organisation uses the delivery advice number from its supplier to identify different deliveries. If the supplier inadvertently send two separate deliveries with the same advice number, then the organisation’s internal information systems are likely to suffer. ◄

6.8 Summary

6.7

117

Conclusion

As we mentioned in Chap. 3, learning the principles of a visualisation technique is not the same as applying it. This is the reason that in this chapter we have covered in some detail the key things involved in the composition of an information model. We have particularly emphasised how a good understanding of the context or pattern of information situations relevant to some institutional domain is a necessary prerequisite for information modelling. We have also emphasised that an information model, by its very nature, is a continuously moving beast and as such is critically important to the administration of the data infrastructure within and between organisations. In the next chapter, we consider a number of issues associated with the practice of building information models, such as when to model something as a class, attribute or relationship.

6.8

Summary

• To compose an information model, we first need a model of the pattern of information situations under consideration. This may be a model of some existing or as-is pattern or a model of some possible or as-if pattern. • We then need to unpack the content of the messages used within this communicative pattern into the constructs of an information model—classes, attributes and relationships. • This enables us to generate a set of binary relations which adequately reflects the content as a series of institutional facts. • The institutional facts are then used to produce a visualisation of this model using one of the many possible notations. • The validity and consistency of the information model are checked with actors within the domain in question. • The adequacy of the information model in reflecting communicative practice is continuously examined, and the information model is revised if necessary. • Because they must continually reflect changes to institutional action, an information model is continuously revisited and revised where necessary. This makes information models important to data administration within organisations of all forms.

7

Practical Issues in Information Modelling

7.1

Introduction

The coverage of information modelling in previous chapters has largely focused upon the theoretical aspects of applying this technique. In contrast, the current chapter examines a number of practical issues associated with the conduct of information modelling and how these may be resolved. We first consider the issue of interpretive flexibility—the fact that the modeller may choose to model the same thing as a class, attribute or relationship depending upon the institutional context under consideration. The same flexibility applies in the case of using generalisation and aggregation within information modelling. Then we consider the distinction between strong and weak classes and notions of ternary and recursive relationships of association. This leads to a discussion of how to include time within an information model and the important problem of connection traps and how to avoid them.

7.2

Class, Attribute or Relationship

In Chap. 3, we spent some time making the case for thinking about an information model in a specific way. Within this book, we have promoted the idea of a model as a way of negotiating collective belief as to either how things are in some domain or how we, as a collective or community of actors, might like things to be in this domain. A key question faced by any information modeller is how do you know that something should be modelled as a class, attribute or relationship? When trying to model some domain, it is difficult to know where to start and what constructs to use. In other words, what things are or should be of interest and what constructs should be used to model such things? The modeller hence usually has a degree of interpretive flexibility in deciding which construct is most appropriate. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_7

119

120

7

Practical Issues in Information Modelling

Example

Suppose you are given the task of building an information model to represent the social convention of marriage. The common difficulty experienced here is whether to represent marriage as a class, a relationship between classes or an attribute of a class. Marriage might be represented merely as marital status attributed to a person class. Alternatively, marriage might be seen as an association between two persons of whatever gender. ◄ The answer to the first question of what things are or should be of interest relies upon a close understanding of the communicative practice appropriate to the domain in question. In other words, the modeller must develop a close appreciation of what actors either currently communicate about or wish to communicate about. The answer to the second question of what constructs should be used to model such things must depend upon an appreciation of how certain signs are used by certain actors within the domain in question and for what purpose. In other words, the answer to such a question depends upon the institutional context within which signs such as marriage are being used by actors. This means that what helps guide the construction of an information model must be a clear understanding of the communicative pattern relevant for the domain in question. Example

For the institutional reality which is the domain of emergency response, marriage is probably only important as a means of signalling to control, ambulance and hospital staff that there is an important next of kin to communicate with. Hence, modelling marriage as an attribute (marital status) of a patient might be sufficient for the patterns of action evident in this domain. For some tax authority, marriage is of interest because it may affect the tax position of certain citizens—hence, it is probably appropriate to model this situation as a relationship, perhaps a recursive relationship with a citizen class. Finally, for a marriage registry, the event of marriage itself is the significant thing of interest. Hence, it is appropriate to represent marriage within this institutional context as a class in itself, with its own attributes and relationships. ◄ What we are actually saying here is that for any sign to be meaningful, it must have practical consequences. This is actually a maxim of the philosophy of pragmatism first promulgated by the American polymath Charles Sanders Peirce (Bacon 2012). For Peirce, to understand the nature of signs, we must always apply the pragmatic test of whether the making of a set of differences to some substance makes a further difference in turn to some actor (Bateson 1972). In other words, we must judge any potential sign in terms of its consequences, whether it has any practical bearing on some situation, in terms of how actors act in such situations.

7.3 Repeating Attributes

121

Example

For example, we can only judge how to model a sign such as marriage in terms of what it communicates to actors within some domain. We also need to judge the result of such ‘communication’, in turn, in terms of differences made to the consequent behaviour of actors, perhaps changes to their work activity. Within emergency response, the attribute marriage or marital status directs ambulance and hospital staff to communicate with a nominated next of kin. Within some tax authority, the relationship of marriage will direct staff and systems to modify the taxation of particular citizens. Within a marriage registry, the class of marriage needs to be recorded in great detail to coordinate the work of staff before, during and after this life event. ◄

7.3

Repeating Attributes

A related problem to this is that frequently we may wish to communicate many things about some apparent class. In practice, this means that we may have to model many properties or attributes of this thing. In such cases, it is normally fruitful to examine whether the class is actually one class or is better represented as a series of related classes. Some proponents of information modelling argue that you should look to try to keep the number of attributes associated with any class within the bounds of 7 plus or minus 2 attributes—this is a cognitive limit established for human short-term memory. One heuristic or rule of thumb you can apply to achieve this is to look for attributes which repeat in terms of any one instance of a class. What we are really saying here is looking for attributes which have what we shall refer to in Chap. 8 as a non-functional dependency with the identifier for a class. When this occurs, we need to fragment out these repeating attributes into a separate class. Example

Suppose you are working within human relations or human resources and you spend much of your time communicating about employees. The tendency here would be to build one class Employee and assign all the attributes communicated about employees to this one class. But suppose you find that this makes Employee a class with 40 attributes. This might direct you to consider whether all the attributes are directly attributable to an employee or whether they are best modelled as related classes of a person. For instance, an employee may hold down a number of distinct roles having defined durations of employment with the company. Each duration needs to be recorded in terms of a role number and role name along with its start date and end date. This means that for any one employee identified by a staffNo, there are likely to be a number of role numbers, role names, start dates and end dates. Rather than placing these attributes within one

122

7

Practical Issues in Information Modelling

Employee class, it is probably best to put these attributes in a separate class such as an Employment class and associate them back to the Employee class. ◄

7.4

One-to-One Relationships

Within Chap. 4, we defined relationships of association largely as one-to-many or many-to-many relationships. However, it is possible to model such associations as one-to-one relationships. It must be said that one-to-one relationships of association are not used very frequently within information modelling, but there are two situations in which they sometimes prove useful. The first of these is when the thing referred to by a class undergoes a change of state during its lifecycle. Example

Remember that within the domain of emergency response, a medical emergency reported to the ambulance control centre has to be classified in terms of its medical severity. If the medical emergency is classified as one of the first two categories (A and B), then this medical emergency becomes an emergency incident to this institution and triggers a set of consequent actions, such as sending an ambulance to the incident location. In such a case, we need to record this change of state, and we do so by relating the medical emergency class to an emergency incident class through a one-to-one relationship as in Fig. 7.1. The same goes for the transition from a person reported to a call-taker to a patient of the ambulance service. ◄ The second case where one-to-one relationships may prove useful is when we wish to communicate about many things attributed to a class. Generally, we might use a unary relationship to name collections of attributes that apply to the same thing.

Fig. 7.1 One-to-one relationships

Medical emergency

becomes

Emergency incident

Person

becomes

patient

Clothing customer

Food customer

7.5 When to Generalise and Aggregate

123

Example

Suppose a company runs both a high-street clothing retail operation and a supermarket chain. The company offers a loyalty card to its customers and wishes to reward customers that utilise both arms of their business. However, because the number of attributes they wish to record about both types of customer exceeds 30 attributes, they decide to have a clothing customer class and a food customer class on their information model (Fig. 7.1) related by a one-to-one relationship. ◄

7.5

When to Generalise and Aggregate

As we have seen in Chap. 6, information models can be built largely without using any of the relationships of abstraction we discussed in Chap. 3. So the key question is frequently—when should you use generalisation and aggregation within an information model? Generalisation is actually another useful strategy to use within modelling when we wish to communicate about many properties of some thing. Frequently, we may find that many of such properties only apply to specific instances of a class rather than to all instances of a class. The set of properties that applies to the specific set of instances should then be used to form a sub-class of the wider class. Example

Suppose we have a Product class and we find that salesmen wish to communicate about many properties of the products they sell such as product code, product description, price, length, weight and so on. However, on closer examination, we find that certain properties such as load bearing capacity are used within communication by salesmen and customers only for certain instances of manufactured steel products—namely, lintels. In this case, it makes sense to split off these properties into a separate sub-class of the super-class. ◄ Aggregation is a form of abstraction that relates some whole to its parts. Unlike generalisation however, the parts are distinct objects from the whole. Within information models, aggregation is particularly used for handling problems of representing component-assembly issues, member-collection issues or place-area issues (Winston et al. 1987). In component-assembly problems, an object is divided into its components in terms of some organised pattern or structure. Each component part has a distinct function which can be separated from the whole.

124

7

Practical Issues in Information Modelling

Example

A handle is part of a cup. Wheels are part of cars. Chapters are parts of books. ◄ In member-collection problems, each part in this type of relationship is not the same as the whole and does not play a specific function in terms of the whole. The parts can be clearly separated out from the whole. Membership in the collection is determined purely on the basis of spatial proximity or social connection. Example

A tree is part of a forest. A juror is part of a jury. The ship is part of the fleet. ◄ Place-area aggregates are used to relate areas to places or locations within them. Places are not parts by virtue of any functional contribution to the whole. Every location shares common features with all other areas. But no location can be separated from the area of which it is a part. Example

The Everglades are part of Florida. An oasis is part of a desert. ◄

7.6

Strong and Weak Classes

Information classes are not created equal—within any information model, there are likely to be strong classes and weak classes. An information class is said to be a strong class if the existence of its instances or objects does not depend on the prior existence of the instances of some other class. In contrast, the objects of a weak class depend on the existence of the instances of some other class within the domain considered. Example

In a university domain, the classes Module and Student are both strong classes because the existence of a given module and student does not depend on the prior existence of any other thing. However, the class Assessment is likely to be a weak entity since it depends on the prior existence of both a Student class and a Module class. ◄

7.7 Recursive and Ternary Relationships

125

Identifying weak information classes is important because instances of such classes cannot be uniquely identified by attributes of the class itself. They must acquire some identifying properties from the strong classes with which they are associated. Example

A student will be identified by a studentNo and a module by moduleCode. An assessment class may be identified by a compound identifier made up of a studentNo and moduleCode. [moduleCode REFERS TO Module] [studentNo REFERS TO Student] [moduleCode AND studentNo REFERS TO Assessment] ◄ From the example, it should be evident that weak classes are particularly relevant to many-to-many relationships of association. Some approaches to information modelling recommend decomposing any many-to-many relationships within an information model into two one-to-many relationships. The relationship is then promoted to becoming a link class. A link class is a weak class because it crossrefers between instances of one class and the instances of another class. Example

A Student will normally be assessed a number of times on a given Module, and a Module will assess many students. In Fig. 7.2, we introduce an assessment link class to connect students with modules. Note that the many ends of the relationships always appear at the link entity. Student and Module are both strong classes in this relationship. Assessment is a weak class because it relies on the prior existence of both Student and Module. ◄

7.7

Recursive and Ternary Relationships

In conventional information model diagrams, the relationships are all binary, that is, we diagram two information classes and a relationship or a set of relationships between these information classes. It is possible however for association relationships to be unary. In other words, a relationship may involve only a single information class. Unary relationships are frequently described as being recursive in that they relate classes of the same type.

126

7

Module

Practical Issues in Information Modelling

Module

moduleCode ..

moduleCode ..

assesses

Assessment studentNo moduleCode ..

assessed by

Student studentNo ..

Student studentNo ..

Fig. 7.2 Decomposing a many-to-many relationship

Example

Figure 7.3 details a recursive relationship called prerequisites or prerequisite for which applies to the class Module. A module may have a number of prerequisite modules; a given module may also be a prerequisite for numerous other modules. This makes the cardinality of both ends of this relationship mandatory. A module does not need a prerequisite and a prerequisite does not need to have any postrequisite modules. This makes the optionality of both ends of the relationship optional. ◄ As well as relating a class to itself, we may also find uncommon situations in which three or more classes are related together—a so-called ternary relationship. Fig. 7.3 Unary relationship

prerequisite for

prerequisite of

Module

7.8 Modelling Time

127

Fig. 7.4 Ternary relationship

Employee

Skill-used

Project

Skill

Because of their complexity, ternary relationships are only used when they cannot be decomposed into a series of binary relationships. Example

The relationship skill used in Fig. 7.4 associates the classes Employee, Skill and Project. What shows that this ternary relationship is necessary is if we attempt to fragment the relationship skill used into any two relationships we lose valuable information. Hence, if we had a relationship between Employee and Project, between Project and Skill and between Skill and Employee, we would not be able to determine on which project a particular person utilised a particular skill. ◄

7.8

Modelling Time

In most institutional settings, actors are interested in things which we might generally call events. Events are happenings and as such occur at a particular time and date—classes that must be timestamped in some way. Hence, in many situations, some way of handling both past and future time as well as present time must be utilised within the analysis and design of communicative acts. Example

Consider the case where staff at a university wants to communicate about the courses it runs as well as details of the students which enrol on such courses. If we are only interested in current enrolment, then the information model in Fig. 7.5a is sufficient.

128

7

Practical Issues in Information Modelling

B

A

Course

Course

courseCode ..

courseCode ..

enrols

Enrolment studentNo courseCode enrolmentDate

enrolled on

Student studentNo ..

Student studentNo ..

Fig. 7.5 Information model with time

However, a more realistic situation is approached when the university decides it wishes to build institutional facts about past enrolments for management decision-making. Staff now wish to record which students have enrolled in which courses over a period of say 5 years. This means that we now make the relationship between course and student a many-to-many relationship as in Fig. 7.5b. This structure however is equally capable of handling future events. Suppose the university wishes to extend the use of its records to schedule future courses. The only modification we need to make to our information model is to make course and student optional in the appropriate relationships. In other words, we wish to allow course details to be recorded prior to places being filled. Likewise, we wish to record details of students prior to their enrolment on a particular course. ◄

7.9

Connection Traps

One key advantage of visualising an information model is that it becomes easier to identify a number of potential problems with the navigation around such models. Figure 7.6a, b illustrates a number of potential pitfalls in information modelling. These pitfalls are known as connection traps (Howe 1981) because they may trick the modeller into making invalid assumptions about the connection between information classes.

7.9 Connection Traps

129

A – fan trap

1

1

1

Faculty 2

2

3

Department

3

2

4

Staff

4

Department

Faculty

Staff

B – chasm trap Faculty

Department

1

1

2

2

3

3

4

4

5

Staff

Department

1

2

Faculty

Staff

Fig. 7.6 Connection traps. (a) Fan trap, (b) chasm trap

The first type of connection trap to consider is known as a fan trap because it may occur when two one-to-many (1:M) relationships fan out from the same information class. Example

In the information model on Fig. 7.6a, the following assertions are made about some academic domain: • • • •

A faculty has many departments. Every department belongs to at most one faculty. A department has many staff. A member of staff belongs to at most one department.

The business analyst assumed that this way of building an information model was sufficient to tell him which staff belong to which department. As we see from the associated instance diagram however, this assumption is incorrect. Although we can tell from the information model which staff belong to which faculty and

130

7

Practical Issues in Information Modelling

which department belongs to which faculty, the link between staff and departments is ambiguous. Figure 7.7 illustrates a representation for the same information model which overcomes the fan trap. The instance diagram seems to indicate that the query— which staff work for which departments—is clearly answerable from this revised model. ◄ The second kind of connection trap is known as a chasm trap because it suggests that a relationship exists between all instances of two information classes when this is not the case. Example

The revised information model in Fig. 7.6b may be subject to a further problem. What if we have within our university staff who are employed by faculties rather than by departments? In other words, the optionality of staff in this relationship is optional. Our information model would give us an incorrect answer as it assumes that all staff must be employed by departments. To avoid this mis-interpretation, we have to introduce an additional relationship into our diagram between staff and faculty (see Fig. 7.7) and clearly indicate that both relationships are optional. Note that what defines a fan trap or a chasm trap is determined by the business rules applicable to a particular domain. Fig. 7.6b, for instance, would be perfectly reasonable as a representation of some domains where the business rules prohibit faculty staff. ◄

Faculty

Department

Staff

1

1

2

2

3

3

1

4

2

5

3

Staff

Fig. 7.7 Faculty staff information model

Department

Faculty

7.10

7.10

Information Model Patterns

131

Information Model Patterns

We proposed in Chap. 2 that what serves to scaffold the very idea of an institutional domain is the pattern of information situations present within this domain. But the idea of a pattern has another useful facet—the idea that a pattern in whole or in part may be applicable to more than one institutional domain. In other words, it is more than likely that we might observe common ways of doing things (of articulating, communicating and coordinating things) between separate domains. From such commonalities, we might develop a pattern which applies across domains. This idea is very much the foundation of benchmarking and reuse. A benchmark was originally a mark cut in a wall or pillar of some building and was used as a reference point to take measurements. In business terms, a benchmark now typically refers to some way of doing things which is regarded as in some way exemplary. In such a sense, benchmarking refers to the idea of comparing some pattern of activity, information or data within one’s own domain with that from some other domain which is perceived to engage in best practice. Example

For instance, the UK is divided into a number of local authority areas for administrative purposes. Local authorities are tasked with delivering a number of public sector services to citizens living within the boundaries of the local authority. Such services include waste collection, social services and primary and secondary education. One would expect that there are commonalities between the way in which one authority delivers services and the ways in which other local authorities operate. One would therefore expect that the communication required to support service delivery as well as the data structures needed to support communication would have common patterns across a range of local authorities. ◄ One key consequence of the idea of common patterns is that we should be able to reuse certain patterns within processes of analysis and design. In other words, we should be able to use a pattern derived from observing commonalities as a template to establish an appropriate pattern for activity, information and data in some other domain. In the 1990s, David Hay (1996) proposed a range of information model patterns (he called them data model patterns) for a number of different institutional domains such as inventory, work orders, contracts, accounts, laboratory testing, material requirements planning, process manufacturing and documents management. Michael Blaha (2010) has published a more recent incarnation of this idea in which he refers to information model patterns as archetypes. This should come as no surprise in the sense that business practices in areas such as trade between one organisation and another are relatively standardised across countries. This is just another way of saying that information modelling is useful in clarifying conventions associated with information situations across domains.

132

7

Practical Issues in Information Modelling

Example

Take the idea of a sales order which of course underpins economic exchange in capitalist economies. It involves communication about a number of things important to establishing the contractual basis of economic exchange. A sales order takes place between two parties—the seller of the goods and the buyer of the goods. A sales order also typically details a product sold and the quantity of such product. This description details the key elements of a core or generic pattern of the following form: [Buyer CREATES Sales order] [Seller RECEIVES Sales order] [Sales order CONTAINS Order line] [Order line DETAILS Product] [Order line HASA qty] We would expect an information model based on some elaboration of this ontology to apply across a range of retail situations. ◄ Example

Take another example of an information model pattern, pertaining to institutional facts associated with a contract (Blaha 2010). A contract is an agreement between two or more actors, typically for the supply of products. There are many different types of contract associated with different forms of product, such as foodstuffs, securities or even services. An abstract pattern for the common core elements of a contract is illustrated in Fig. 7.8. ◄

7.11

Conclusion

Just like any modelling technique, to prove proficient, one must practice it in many different settings. Within the current chapter, we have considered a number of issues which impinge upon the proper practice of information modelling. Such issues include which construct to use and when, how to handle unary and ternary relationships and how to avoid connection traps. We have also proposed the idea of an information model pattern that can be promoted as a benchmarking tool or reused within design work. The next chapter focuses upon the primary objective of information modelling, which is to design a data system of some form.

7.12

Summary

Fig. 7.8 A contract pattern as an information model

133

Contract type

Actor

Contract

Contract item

Product

7.12

Summary

• A key question faced by any information modeller is how do you know that something should be modelled as a class, attribute or relationship? The answer to this question depends upon an appreciation of how certain signs are used by certain actors within the domain in question and for what purpose. • When we wish to communicate many things about some apparent class, it is often fruitful to examine whether the class is actually one class or is better represented as a series of related classes. • One-to-one relationships are useful in situations when the thing referred to by a class undergoes a change of state during its lifecycle. They are also useful when we wish to communicate about many things attributed to a class. • Generalisation is actually another useful strategy to use within modelling when we wish to communicate about many properties of some thing. Aggregation is particularly used for handling problems of representing component-assembly issues, member-collection issues or place-area issues.

134

7

Practical Issues in Information Modelling

• It is useful to identify strong and weak classes upon an information model. An information class is said to be a strong class if the existence of its instances does not depend on the prior existence of the instances of some other class. In contrast, a weak entity depends on the existence of some other class within the domain considered. • On rare occasions, it may be important to include a unary or a ternary relationship on an information model. • When the business analyst needs to include time within an information model, it inherently means using many-to-many relationships between the classes which are time-dependent. • It is sometimes useful to decompose all many-to-many relationships on an information model into two one-to-many relationships. • The business analyst must ensure that her information model does not suffer from connection traps such as fan traps and chasm traps. Instance diagrams are a useful visual aid for spotting such traps.

8

Information Modelling and Data Systems

8.1

Introduction

Information modelling is typically directed at the design of some data system. We use the term data system, rather than database system, to indicate here that information modelling has a wider range of application than is originally assumed. We first review the nature of data and contrast this with our conception of information provided in earlier chapters. This leads us to define more precisely the concept of a data structure which lies at the heart of some data system. The architecture of some data system, as we shall see, is defined in terms of some data model. This leads us to consider one of the most popular contemporary data models, that of the relational data model, which we introduce in a somewhat unusual manner through the commonplace data structure of a list. The design of some relational database, which is referred to as a schema, is best understood through a visualisation technique known as dependency diagramming. This technique offers a straightforward route for conducting a process important to the design of a relational schema known as normalisation.

8.2

Data and Information

Our theory of information is the bedrock for our consideration of not only information modelling but the data systems that this activity is generally directed at. This is because the notion of an information situation helps clarify the important distinction between data and information. As we determined in Chap. 2, information is any difference that makes a difference to some actor. More accurately, in terms of our theory of information situations, information is any set of differences which makes further differences in the psyche (mind) of some actor or group of actors. The crucial distinction here is that data are differences made in some substance or medium by some actor or actors. Information, in contrast, is an accomplishment made by actors through their encounter with data. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_8

135

136

8

Information Modelling and Data Systems

Example

Consider human speech. Speech is physically a pressure wave, or more precisely a series of pressure waves, that travels through a solid, liquid or gas. In this form, differences correspond to the peaks and troughs of the soundwave, and this variation corresponds to what is known as the modulation of the signal. ◄ Certain differences made in some substance can be coded to form symbols. In Chap. 4, we defined a sign as some thing that stands to somebody for some other thing. Or, in terms of a constitutive rule: [X stands for Y to Z in C] In this definition, X is the symbol which stands for some thing (Y) to some actor (Z) within some institutional context (C). The important point about this definition is that the stands for relationship between a symbol and what it refers to is typically an arbitrary one. It relies merely upon precedent set amongst a community of actors: that collectively a community of actors accept that some thing will be taken to stand for or count as something else. There are a number of consequences arising from the arbitrary nature of symbols. First, a certain set of differences may hold significance for one actor but not for another. In other words, the stands for relation is not the same for everyone. Example

Consider the spoken word croeso (pronounced croyso). The units of speech making up the spoken word are known as phonemes and comprise the smallest significant set of differences made in the spoken word. These differences made by speaking the word croeso only hold significance for someone communicating within the institutional context of the Welsh language. To make sense of these differences to English speakers, we have to make another stands for relation, namely, that croeso stands for welcome. ◄ Second, the same set of differences may inform two different actors differently. In other words, different actors may accomplish a different stands for relation with the same symbol. Example

Consider ‘54’ as a set of differences—a sequence of graphical elements. Through convention established in our society, the graphic ‘5’ and the graphic ‘4’ are taken as digits. But what does the sequence of these two digits stand for? To one actor, perhaps within the institutional setting of a university, it stands for a number or more particularly a grade awarded to a student for an assessment. To another

8.3 Data Structures

137

actor, perhaps working within a manufacturing setting, the sequence stands for a product code and identifier a particular type of widget to this actor. ◄ Third, a set of differences only becomes significant to a group of actors through convention. Clearly, this is inherent in our definition of data as a set of differences created with the purpose of communicating something. This means that actors must collectively agree that a certain set of differences when used repeatedly will always stand for the same thing for them. Without this collective agreement, the set of differences would not form a symbol. Example

Without conventions, communication would be impossible. The group of such conventions of communication is sometimes confusingly referred to as a code. For example, Morse code was invented by Samuel Morse to facilitate communication over the newly invented telegraph system. The code establishes the conventions that certain sequences of dots and dashes making up the Morse code stand for the letters of the alphabet. Once telegraph operators accepted this group of conventions internationally, then remote communication across the world over the telegraph became possible. ◄ So, Fig. 8.1 illustrates the component elements of any sign within which a symbol achieves significance. First, there is the symbol—the set of differences made in some substance. Second, there is the object—the thing referred to or described. Third, there is the concept—the differences made to some actor though engagement with data. Fourth and finally, there is always the actor at the centre of the process of signuse or semiosis. All four of these elements must be present for any sign to exist. This conception of the sign allows us to precisely distinguish between data and information. Data is the set of differences made in some substance—the symbol. Information is accomplished by some actor in making the connection between the thing referred to—the object—and what that thing is taken to mean, the concept. So, data is formed from differences made within a substance. Symbols are coded from such differences and formed into larger entities known as data structures— structures for data. Such data structures may act as messages conveyed as signals, or they may also be used to record details of things—to build collective memory of some things.

8.3

Data Structures

Within our theory of information situations introduced in Chap. 2, a key part is played by structures sensed and effected by actors while engaging in their surrounding environment. We indicated there that the information modeller inherently focuses upon certain types of structure that are used to communicate things between two or more actors. This type of structure is known as a data

138

8

Information Modelling and Data Systems

Fig. 8.1 The component elements of a sign

THING Concept

Actor Symbol

Object

structure, and we shall use this term to refer to a conventional set of differences used to communicate things between a group of actors. Within computer science, a data structure is a term which is used broadly to refer to some systematic form for organising data (Tsitchizris and Lochovsky 1982). This concept is clearly central to the interests of all the information disciplines (information science, information management, information systems, computer science). Much of the infrastructure of information and communication technology, for instance, is clearly taken up with the mechanics of data structures, particularly as it pertains to applications within business and government. However, although much research and development continues to be devoted to finding better ways of storing, retrieving and manipulating data structures, this concept is only rarely examined critically within the information disciplines. By this we mean that the data structure is treated largely as a technological artefact, helping to support but somewhat isolated from considerations of institutional order. As such, data structures appear to form part of the accepted and unexplored background to the conduct of investigation and explanation in these disciplines. Although data structures are not really thought about much, the data formed in structures are critically important both to organisations and to individuals, in the sense that much organisational and individual action is reliant upon data structures.

8.3 Data Structures

139

Example

As a citizen of a modern nation-state, your biography is not only recorded but lived through data structures. Your birth is marked with a birth certificate, which enrols you as a citizen of the state. You pass various education exams and are issued with certificates which qualify you to do certain things. You learn to drive and apply for a driver’s licence. You purchase a car and you apply for a vehicle licence. You undertake gainful employment and get recorded in employment, national insurance and taxation records, which require you to do certain things like pay income tax. You decide to travel but must prove your citizenship by applying for a passport. You perhaps get married and are issued with a marriage certificate and may have children issued with their own birth certificates. All these data structures change your institutional status and in turn your rights and responsibilities. You may at some point fall seriously ill and need to access data structures such as your NHS record or national insurance record to access healthcare and welfare benefits. When you retire, crucial records held about your public and private pension scheme will determine the income available to you. Finally, your death triggers a death certificate, which is used by your descendants to resolve issues of probate (inheritance). ◄ Data structures may be paper forms, letters, documents and memos. They may be electronic tables in a database or electronic documents held on some data server or even emails, texts and social media messages. However, all data structures have a common core of features. In very general terms, any data structure can be seen to be a hierarchical construct. A data structure is made up of a series of data elements which in turn are made up of a series of data items. For instance, in a physical filing cabinet, a drawer of the cabinet might form the data structure, while a hanging section placed in the drawer might be the data element, while an individual paper form placed in a section might be the data item. Or in an electronic database, a table would be the data structure, while a row of the table would be a data element, and an individual attribute of a row would comprise a data item. Example

At its most basic, a list corresponds to a set of elements: an assembly of distinct ‘things’, considered as a thing in its own right. The list as a data structure consists of a set of list items. Each of these elements can take many different forms, but let us take a form we are familiar with—that of a binary relation. As we saw in Chap. 4, a binary relation consists of a triple of data items, in which the first data item is termed the subject, the second the relation and the third an object. So, consider a simple passenger list for an airline taking the following form:

140

8

Information Modelling and Data Systems

[109999555 REFERS TO John Smith] [105599544 REFERS TO Anwar Prakash] [103399565 REFERS TO Zu Cheng] .... This list consists of a series of binary relations, which we are already familiar with. The first data item in each list item is the subject and, in this case, constitutes a UK passport number. The last data item is a natural identifier for a person—a personal name. Both data items are related or predicated through the REFERS TO relation. This predicate effectively implements what we called identification—it associates a given surrogate identifier with some natural identifier for the person. ◄ The sense that a data structure is a particular way of organising data is clearly an abstraction—a set of principles for both storing and accessing data. In certain literature, this abstraction is sometimes referred to as an abstract data type. But data structures such as lists are clearly instantiated—given form. In this sense, a specific instance of a list, such as a product list, passenger list or picking list, is also a data structure. In the concrete, a data structure is used to represent things and through such representation to help constitute institutional order. Within the discussion that follows, we shall utilise the term data structure both to refer to an abstraction and to an instantiation. Exercise Take the data structure of a birth certificate. Determine what the data elements and data items of this data structure correspond to. Then think about the performativity of a birth certificate—what does a birth certificate actually do in institutional terms?

8.4

The Ontological Status of Data Structures

Within this book, we want to establish the key principle that information modelling is important because we should never take data structures for granted. Ontology, as we have seen in Chap. 4, is a theory of reality, being or what things are seen to exist. Within this book, much as we have done previously with the concept of information and the practice of information modelling, we want to reverse the conventional ontology associated with data structures. In other words, we want to question the conventional ways of thinking about why data structures exist or be. As we intimated in Chap. 3, conventionally, and as conceived in the dominant literature, a data structure is viewed purely as a technological artefact. In this view, data structures, their elements or their items are taken to represent propositions about

8.4 The Ontological Status of Data Structures

141

things in some institutional reality. The institutional reality is also assumed to be observer-independent, meaning that it is the same for all actors. Example

Hence, in a manufacturing setting, a picking-item, which relates a given identifier for a shipping item with a given identifier for a truck, serves as a proposition about these things to workers within the institutional reality of a supply chain. ◄ Within formal logic, data elements or data items as propositions may take only one of two values, namely, true or false. We either assert the truth of a given proposition by writing a data element or data item to the data structure, or we retract a given proposition by deleting the corresponding data element or data item from the data structure. This implies that the state of a data structure at any given time consists of true statements about the real-world domain it represents—such as the case of loading of shipping onto transportation. This so-called correspondence view of truth implies that there is a necessary separation between institutional reality and data structures. It also implies that a data item as an externalisation or representation is taken to correspond to some real-world thing or more likely a set of things important to actors within some institutional reality. As a consequence of our theory of information situations, we argue that we need to reverse this conventional ontology associated with data structures. Data structures are not separate from institutional reality; they are very much entangled within it. In fact, data structures are constitutive of such realities in the sense that they ‘scaffold’ action and inter-action between actors working within and between institutions. Data structures are not only forms of structure; they serve to inform institutional actors and often prescribe or proscribe action on the part of such actors. Data structures are important to instituting of facts about things and through this process are critical to the production and reproduction of institutional order (Beynon-Davies 2016). Example

Consider a list of passport numbers again. A member of the UK Passports Office can use this list to declare British citizens. In doing so, such actors are inherently using the identifiers in this list to instantiate an information class in the following manner: [109999555 ISA British citizen] [105599544 ISA British citizen] [103399565 ISA British citizen] ... Passports and passport identifiers were originally designed to enable the declaration of citizenship and as such to enrol persons into the institutional domain of international travel. But such tokens and identifiers are now used in

142

8

Information Modelling and Data Systems

many other situations relating not only to government and its agencies but to interaction with private sector institutions. For instance, a member of the UK Borders Agency can use a list item from the list above to authenticate a person. In other words, a fact from this list asserts that the individual is who they say they are. On this basis, a person with a passport number is permitted to travel to nominated countries. ◄ So, data structures are not only informative, they are also performative—they get things done. This we have referred to elsewhere as the performativity of data structures. Example

The performativity of lists is evident in a number of English terms—blacklisting, shortlisting, whitelisting and even watchlisting. These terms all refer to commonplace activities driven by the data structure of the list. When shortlisting a group of people for an interview or for a prize, the data structure, the shortlist serves to initiate action such as calling someone to interview. Blacklists may be used as data structures shared between financial institutions to prevent persons who have reneged on their debts from obtaining credit. The whitelist has been used particularly by trades unions to refer to people held by the union to be suitable for employment within their protected trade. Finally, a watchlist is a list of persons or things that some institution deems should be watched for possible action in the future. A whitelist, shortlist and watchlist are clearly a data structure which prescribes what should happen to those persons or things identified on the list. A blacklist, in contrast, proscribes persons or things identified on the list from certain happenings. ◄

8.5

Data Models

So, data structures, elements and items are used to represent instances of things of interest to a particular institutional domain. As we have seen in previous chapters, data structures are particularly used to represent what the philosopher John Searle refers to as institutional facts about some domain. Institutional facts are matters of convention which serve to define or constitute the institution in which they are used. Any model of data must be an abstraction from such instances of institutional facts. In the case of data structures, the business analyst needs to abstract from actual instances of records, lists, etc. certain commonalities of structure. We refer to such an abstraction as a data model. A data model primarily describes structures of data and how these relate together. As we have seen, data structures consist of data elements which in turn consist of data items. Data items act as placeholders or ‘containers’ that take or are assigned values. Valued items, elements or structures are used to

8.6 The Relational Data Model

143

represent things of interest to some domain: to build institutional facts which actually serve to constitute the domain of interest. Example

Take a historical example. With the rise of the modern office in the nineteenth century, the technology of filing cabinets, paper folders and paper forms was invented and used to organise data. Within this data system, a typical record consists of a series of data items or fields which serve to represent an instance of something of interest to the organisation. For instance, a business organisation might create a typical record, consisting of a paper form, for each of its customers, with fields such as customer name, customer address and customer telephone number. Such a form might then be placed in a suspension folder for easy access. Records (such as paper forms), in turn, are typically collected into the data structure of a file—consisting perhaps of a filing cabinet and representing some association between these data elements. For instance, a customer file assembles a series of customer records. Various different customer files might be created, with a specific criterion used to decide which record goes in which file: customers located in different areas of the country or handled by different account managers, for instance. ◄ This data model of files, records and fields served administrative organisations effectively for over 200 years. More recently, another model for managing data as electronic records has gained dominance. This is generally known as the relational data model.

8.6

The Relational Data Model

It is useful to approach the relational data model through the idea of a list, a data structure which we have already met. At its most basic, a list corresponds to a set of elements: a collection of distinct objects, considered as an object in its own right. There are two ways of describing or specifying the members of a set. One way is by intensional definition, using a rule such as A is the set of colours of the French flag. The second way is by extension—that is, listing each member of the set. An extensional definition within mathematics is denoted by enclosing the list of members in curly brackets such as A ¼ {blue, white, red}. Most lists used for modern institutional purposes are actually ordered sets known as sequences or tuples, implying that both the elements of the list and the position of the elements in a list are significant—hence, the tuple is different from the tuple . Within institutional contexts, simple lists consist of an ordered collection of data items, typically, as we have seen, identifiers for persons, things or events. More complex lists consist of ordered collections of data elements (such as records) or even data structures (such as files).

144

8

Information Modelling and Data Systems

Dispatch advice

Goronwy Galvanising Advice No.

Date

Customer name

101

22/01/1988

Blackwalls

Item length

Order Qty

Batch weight

Returned Qty

Returned weight

Order No.

Description

Product code

13/1193G

Lintels

UL150

1500

20

145

20

150

44/2404G

Lintels

UL1500

15000

20

1450

20

1460

70/2517P

Lintels

UL135

1350

20

130

20

135

23/2474P

Lintels

UL120

1200

16

80

14

82

Driver

Received by

Fig. 8.2 A dispatch advice

Exercise Consider another data structure of interest to modern organisation—the electronic mail or email. Analyse an email as a data structure. What constitutes its elements and what are its likely data items? The mathematician Ted Codd (1970) had the key insight of mapping aspects of this theory of sets, particularly the idea of tuples onto that of files, records and fields. Codd proposed mapping the data structure of a file onto that of a mathematical relation, being a set of tuples. This data structure fundamentally underlies the data management systems used within mainstream digital computing systems. The relational data model uses the data structures of tables, consisting of multiple data elements or rows and which in turn consists of a series of data items or columns. Consider a typical business form such as the dispatch advice in Fig. 8.2 which might be used by a manufacturing organisation. The data on this form can be stored in a relational database using the two data structures illustrated in Fig. 8.3. Exercise Primary keys act as identifiers. How does the idea of a primary key relate to the notion of an object identifier such as a person identifier? The table named Dispatch notes in Fig. 8.3 consists of four data items (dispatchNo, dispatchDate, customerCode and Instructions) and three data elements corresponding to three rows in the table, one for each dispatch note that arrives with a dispatch of steel product to a customer.

8.6 The Relational Data Model Dispatch notes

Dispatch items

145

Dispatch No.

Dispatch date

Customer code

101

22/01/2012

BLW

102

25/02/2012

TCO

103

10/03/2012

BLW

Dispatch No.

Sales order No.

Customer product code

101

13/1193G

UL150

20

101

44/2404G

UL1500

20

101

70/2517P

UL135

20

101

23/2474P

UL120

14

Instructions

Dispatch quantity

Fig. 8.3 A simple relational database

Each row in a table is identified by values in one or more columns of the table, called the table’s primary key. To act properly as an identifier, the values of a primary key must be unique and not null. In other words, we must have a value for each element of the primary key (rather than a null value), and each value must be unique in terms of other values of the primary key. For instance, in the Dispatch notes table, the Dispatch No. data item is the only item having both these properties. It is therefore the most suitable candidate for a primary key for this table. Values in columns may also act as links to data contained in other tables. Such columns are called foreign keys. A value for a foreign key must either be the value of some primary key elsewhere in the database or be null. When the value for a foreign key is null, the requisite rows are not related. In Fig. 8.3, the primary key of the Dispatch items table is actually composed of two data items: Dispatch No. and Sales Order No. Both these data items individually are in fact foreign keys to two other tables in the database. Dispatch No. acts as a foreign key back to the primary key of the Dispatch notes table. Sales Order No. acts as a foreign key back to a Sales orders table. The values of these two foreign keys can never be null because we always must know which dispatch note and sales order a particular dispatch item relates to. This leads to some clarity in the difference between a data model and an information model. A data model describes patterns of structure amongst the data utilised in some domain. An information model is an attempt to develop some representation of the collective accomplishment a group of actors make with data. In other words, an information model attempts to document what data structures are used for—what things they communicate to actors within the domain. We should also distinguish between a data model, such as the relational data model, and a data schema or data system model, such as a relational schema. The

146

8

Information Modelling and Data Systems

relational data model is an architecture for data, whereas a relational schema specifies the data structures that will be used for some specific application of a data system. There is a convenient shorthand for expressing the data structures in a schema for a relational database—sometimes known as the bracketing notation. The shorthand is as follows:

(, , , . . ., ) A relational data structure begins with a table name that is unique within the schema. Attributes of the table are then listed between rounded brackets and delimited with commas. Each attribute is given a unique name within the data structure. The attributes making up the primary key of the table are underlined, and any foreign keys are double-underlined. Foreign keys are normally given the same name as their linked primary key, and we should specify in the schema whether or not the foreign key can be null or not. Example

The schema for the database illustrated in Fig. 8.3 can be written in the bracketing notation as follows: Dispatch notes(dispatchNo, dispatchDate, customerCode, instructions) Dispatch items(dispatchNo, salesOrderNo, customerProductCode, dispatchQuantity) ◄

8.7

Normalisation

Certain schemas appropriate to some domain are better than others. In terms of relational data model, the concept of normalisation is applied to indicate whether a data schema is satisfactory. Normalisation refers to the process of designing a relational data schema which is free from so-called maintenance anomalies (Kent 1983). We would say that such anomalies are associated with the articulation of relational data structures, and we can demonstrate the problems with maintenance or articulation anomalies using a simple example. Example

Suppose we are given the brief of designing a database to maintain information about students, modules and lecturers in a university. An analysis of the documentation used by the administrative staff gives us the following sample dataset with which to work. If we pool all the data together in one table as in Table 8.1, a

8.7 Normalisation

147

Table 8.1 An unnormalised dataset Modules moduleName Relational database systems Relational database systems Relational database systems Relational database systems Relational database systems Business analysis Business analysis Information modelling

staffNo 234 234 234 234 234 234 234 345

staffName Davies T Davies T Davies T Davies T Davies T Davies T Davies T Evans R

studentNo 34698 34698 37798 34888 34888 34698 34698 34668

student Smith S Smith S Jones S Patel P Patel P Smith S Smith S Smith J

ass Grade B3 B1 B2 B1 B3 B2 B3 A1

ass Type cwk1 cwk2 cwk1 cwk1 cwk2 cwk1 cwk2 exam

number of problems, sometimes called file maintenance anomalies, would arise in maintaining this dataset. Now consider three issues with this data structure. What if we wish to delete student 34668? The result is that we lose some valuable information. We lose information about Information modelling and its associated lecturer. This is called a deletion side effect. What if we change the lecturer of Information modelling to V Konstantinou? We need to update not only the staffName but also the staffNo for this module. This is called an update side effect. What if we admit a new student on to a module, say studentNo 38989? We cannot enter a student record until a student has had at least one assessment. This is known as an insertion side effect. The size of our sample dataset is small. One can imagine the seriousness of the maintenance anomalies mentioned above multiplying as the size of the file grows. The above structure is therefore clearly not a good one for the data of this enterprise. ◄ Normalisation is a formal process whose aim is to eliminate such maintenance or articulation anomalies. Classic normalisation is described as a process of non-loss decomposition. The decomposition approach starts with one (universal) relation— the unnormalised dataset. Maintenance anomalies such as insertion, deletion and update anomalies are gradually removed by fragmenting the one data structure into a series of data structures in which all data items within any one data structure are dependent solely upon the primary key. Non-loss decomposition is therefore a design process guaranteed to produce a dataset free from maintenance problems. It does however suffer from a number of disadvantages, particularly as a practicable database design technique. Firstly, it requires all of the dataset to be in place before the process can begin. Secondly, for any reasonably large dataset, the process is extremely time-consuming, difficult to apply and prone to human error. In a further section within this chapter, we shall consider a visualisation technique known as determinancy or dependency diagramming. The main advantage of the determinancy diagramming technique is that it provides a mechanism for designing a

148

8

Information Modelling and Data Systems

database incrementally from the bottom-up. One does not need a complete dataset in hand to begin the process of design. The data analyst can begin his work with a small collection of central data items. Around this core, data items can be continuously added until all the dependencies are fully documented.

8.8

Turning an Information Model into a Relational Schema

Some texts on information modelling embed relational theory directly into their discussion of information modelling. This is a big mistake. An information model can be developed independently of any particular type of data system. Indeed, as we mentioned in Chap. 3, an information model has wider uses. For instance, an information model can be developed purely as a means of understanding the important things an organisation either communicates about currently or wishes to communicate about in future. This is an important part of organisation design. Nevertheless, it is certainly true to say that most business analysts develop information models with a database in mind, particularly a relational database. Therefore, we describe here how to turn an information model into a design for a relational database. You should make sure you understand what a relational database should look like before attempting this section. Consider an extended version of an information model relevant to some university domain, as presented in Fig. 8.4. The process of translating such an information

Lecturer

Course

Prerequisite-of Module

Prerequisite Prerequisite-for Assessment

Student

Fig. 8.4 University information model

Enrolment

8.8 Turning an Information Model into a Relational Schema

149

model into a relational schema is relatively straightforward and involves the following core steps. First, turn each strong class on an information model into a table. Second, make the identifier for each class the primary key of the table. Third, create compound primary keys for weak classes. Fourth, implement one-to-many relationships as foreign keys. Finally, implement optionality in terms of whether a foreign key is specified as being null or not. There are also a number of other rules of translation for situations, such as one-to-one, ternary and recursive relationships, as well as design options for relationships of generalisation and aggregation. Let us consider these steps in more detail. Step 1: Tables For each class on our information model, we form a table. In other words, for each thing uniquely identified, we need a table. It is good practice to take each information class name and make it a plural to distinguish table names from class names. Example

Consider the information classes on Fig. 8.4. Here, we form seven tables which we shall call courses, modules, lecturers, students, assessments, enrolments and prerequisites. We can therefore start the bracketing notation for this schema in the following manner: Courses(. . .) Lecturers(. . .) Modules(. . .) Students(. . .) Assessments(. . .) Enrolments(. . .) Prerequisites(. . .)



Step 2: Primary Keys—Strong Classes The identifying attribute of a strong information class becomes the primary key of the table. Weak classes may either be identified by the assigned identifying attribute of the class or a compound of the identifiers of the linked strong classes. Example

Identifiers are not indicated on Fig. 8.4, but we can assume the presence of the following identifiers for strong classes: courseName, moduleCode, lecturerCode and studentCode. The modified schema becomes:

150

8

Information Modelling and Data Systems

Courses(courseName, . . .) Lecturers(lecturerCode, . . .) Modules(moduleCode, . . .) Students(studentCode, . . .) Assessments(. . .) Enrolments(. . .) Prerequisites(. . .)

◄ Step 3: Primary Keys—Weak Classes In a strong class, the existence of its instances does not depend on the prior existence of the instances of some other class (Chap. 7). The instances of a weak class depend on the existence of some other class within the domain considered. Link classes which decompose a many-to-many relationship into two one-to-many relationships are a classic example of a weak class. A primary key for a weak class may be formed from a compound of the identifiers for the strong classes linked by the weak entity. Alternatively, the modeller can assign a surrogate identifier for the class. Example

Assessment is clearly a link class in Fig. 8.4. Assuming that moduleCode is the identifier for the module class and studentCode is the identifier for the student class, then an instance of an assessment can be identified by a compound identifier of moduleCode and studentCode. Alternatively, we can assume the presence of surrogate identifiers for the three weak classes in Fig. 8.4, such as an assessmentNo, enrolmentNo and prerequisiteNo. But, we still need to implement the relationship by keeping the identifiers from the linked strong classes (moduleCode and studentCode) as foreign keys. The prerequisite class is something of a special case as it is link class to a single entity. In this case, we need to provide a link via two module codes—module code of the prerequisite for the module and the module code for which it is a prerequisite. The modified schema is now: Courses(courseName, . . .) Lecturers(lecturerCode, . . .) Modules(moduleCode, . . .) Students(studentCode, . . .) Assessments(assessmentNo, moduleCode, studentCode, . . .) Enrolments(enrolmentNo, moduleCode, studentCode, . . .) Prerequisites(prerequisiteNo, prerequisiteofModuleCode, prerequisiteforModuleCode, . . .)



8.8 Turning an Information Model into a Relational Schema

151

Step 4: Attributes All other attributes of an information class become non-key attributes of the table. Example

We have not indicated any attributes upon Fig. 8.4, but we can assume the presence of such attributes through prior investigation of the domain. For instance, courseDescription might form a non-key attribute of the courses table, and moduleLevel might be a non-key attribute of the modules table. The schema then looks like: Courses(courseName, courseDescription, . . .) Lecturers(lecturerCode, lectureName, lecturerStatus, . . .) Modules(moduleCode, moduleDescription, moduleLevel, . . .) Students(studentCode, studentName, studentAddress, studentMobileNo, . . .) Assessments(assessmentNo, moduleCode, studentCode, assessmentDescription, assessmentGrade, . . .) Enrolments(enrolmentNo, moduleCode, studentCode, enrolmentDate, . . .) Prerequisites(prerequisiteNo, prerequisiteofModuleCode, prerequisiteforModuleCode, prerequisiteStatus, . . .)

◄ Step 5: Relationships For each one-to-many relationship within an information model, we need to implement the relationship in some way within the relational schema. To do this, we post the primary key of the table representing the one end of the relationship into the table representing the many end of the relationship. This then becomes a foreign key in the receiving table. Example

Hence, for the includes relationship between course and module, we post courseName into the modules table. We also post lecturerCode into the modules table. The foreign keys for the link entities assessment, enrolment and prerequisite have already been assigned. The schema now looks like: Courses(courseName, courseDescription, . . .) Lecturers(lecturerCode, lectureName, lecturerStatus, . . .) Modules(moduleCode, moduleDescription, moduleLevel, courseName, lecturerCode, . . .) Students(studentCode, studentName, studentAddress, studentMobileNo, courseName, . . .) Assessments(assessmentNo, moduleCode, studentCode, assessmentDescription, assessmentGrade, . . .)

152

8

Information Modelling and Data Systems

Enrolments(enrolmentNo, moduleCode, studentCode, enrolmentDate, . . .) Prerequisites(prerequisiteNo, prerequisiteofModuleCode, prerequisiteforModuleCode, prerequisiteStatus, . . .)



Step 6: Optionality Optionality on the many end of a relationship tells us whether the foreign key representing the relationship can be null or not. If the many end is mandatory, the foreign key cannot be null. If the foreign key is optional, the foreign key can be null. Example

In Fig. 8.4, the many end of the includes relationship is mandatory. This means that courseName cannot be null in the modules table. The same applies to all other relationships of association in the figure. The schema then becomes: Courses(courseName, courseDescription, . . .) Lecturers(lecturerCode, lectureName, lecturerStatus, . . .) Modules(moduleCode, moduleDescription, moduleLevel, courseName (Not null), lecturerCode (Not null), . . .) Students(studentCode, studentName, studentAddress, studentMobileNo, courseName (Not null), . . .) Assessments(assessmentNo, moduleCode (Not null), studentCode (Not null), assessmentDescription, assessmentGrade, . . .) Enrolments(enrolmentNo, moduleCode (Not null), studentCode (Not null), enrolmentDate, . . .)



Step 7: Many-to-Many Relationships of Association Note that we have not considered many-to-many or one-to-one relationships in the process described above. If there are any many-to-many relationships still left in our module, we need to break down each of these into one-to-many relationships to form weak or link classes. Step 8: One-to-One Relationships One-to-one relationships can normally be handled as a single table. In other words, we take both entities in a 1:1 relationship and feed the attributes into one table structure. However, if the number of attributes is excessively large, or we wish to distinguish clearly between classes, then it is possible to create a table for each class and relate them together by using the same primary key.

8.8 Turning an Information Model into a Relational Schema

153

Example

Hence, if we have a medical emergency class and an emergency incident class, we might use the following structure to handle this one-to-one relationship. Medical emergencies(incidentCode, ,,,) Emergency incidents(incidentCode, . . .)



Step 9: Ternary Relationships Ternary relationships are fortunately rare in information models. A ternary relationship would mean creating one table for the ternary relationship with a compound key composed of the identifiers of all the participating classes. Example

If we have the classes employee, skill and project related together through a ternary relationship, then we would need to use the following structure within a relational schema: Employees(employeeCode, ,,,) Skills(skillCode, . . .) Projects(projectCode, . . .) EmployeeSkillProjects(employeeCode, skillCode, projectCode, . . .)



Step 10: Recursive Relationships A one-to-many recursive relationship would be accommodated to one table with a foreign key which is effectively the primary key. A many-to-many recursive relationship would need to be broken down into two one-to-many relationships in the normal way. The link entity will then act as a set of cross-reference instances between the instance of the original entity. Example

Suppose we have a situation in which we wish to communicate about a supervision hierarchy within some organisation—in other words, who supervises who. This could be represented as a one-to-many recursive relationship as in Fig. 8.5 in which an employee supervises many other employees. The structure to accommodate this in a relational schema would be as follows:

154

8

Information Modelling and Data Systems

Fig. 8.5 Unary one-to-one relationship

Employee

Employees(employeeCode, supervisorCode, . . .) Here, the supervisorCode is effectively an employeeCode. ◄ Step 11: Handling Generalisation Clearly, the relational model has no concept of a generalisation hierarchy. Generalisation therefore has to be represented by appropriate design of a relational schema. There are a number of choices for accommodating a super-class-sub-class relationship to a relational schema. The first is to create one table and include all the attributes of the super-class plus all the attributes of each sub-class in the table. The second is to create one table for each sub-class and include all the attributes for the super-class in each sub-class table. The third and most flexible is to create one table for each sub-class and one table for the super-class. Example

Consider the situation in which a security class on the stock market has two sub-classes: stock and share. The various options for handling this generalisation relationship are expressed in the bracketing notation below: Securities(securityName, securityType, issuePrice, dividend, interestRate) Stocks(securityName, issuePrice, interestRate); Shares(securityName, issuePrice, dividend) Securities(securityName, issuePrice); Stocks(securityName, interestRate); Shares (securityName, dividend)



8.9 Visualising Data Structures

155

Step 12: Handling Aggregation Clearly, the relational model has no concept of an aggregate except in the sense of aggregating a group of related attributes together in a relation. In a relational schema, aggregation can be simulated by creating a separate table for each of the component classes and a table for the aggregate class. A foreign key can then be posted from the aggregate class table into each of the component tables. Example

Suppose we have a financial portfolio class which collects together a share class, a stock class and an investment trust class. The structure for this would look as follows: FinancialPortfolio(portfolioCode, . . .) Share(securityName, portfolioCode, . . .) Stock(securityName, portfolioCode, . . .) InvestmentTrust(trustName, portfolioCode, . . .) ◄

8.9

Visualising Data Structures

In our discussion in the previous section, we have assumed the presence of an existing information model for some domain of organisation. This information model then suggests to the business analyst how to structure a data model for the said domain. This can be said to be a top-down way of producing some schema for a data system. But what if no such information model exists? What if we want to start from the bottom-up perhaps by analysing documentation or other artefacts relevant to the domain? In such a situation, how do we go about designing data structures to support communicative acts in some new domain? One useful approach is to diagram the dependency or determinancy between data items of significance to such a domain (Fagin 1979). As we shall see, producing a schema from a suitably composed determinancy diagram guarantees that such a schema is in a fully normalised state. A data structure such as a table as we have seen is a collection of data elements such as rows. Each data element is made up of a series of data items. In the relational data model, such data items correspond to columns or attributes of the table. A data item is the atomic unit within a data model because it can take one and only one value at any one time. In a sense then, a data structure is a logical collection of data items. What points the designer in the right direction as far as assembling a group of data items as a data structure is a map of the dependencies between such data items.

156

8.10

8

Information Modelling and Data Systems

Identifiers and Candidate Keys

In Chap. 2, we first introduced the idea of an identifier and defined it as any thing which can be taken to refer to some other thing across time and space to multiple actors. In terms of data structures, identifiers of objects turn into determinant data items. Two data items, A and B, are said to be in a determinant or dependent relationship if certain values of data item B always appear with certain values of data item A. Determinancy/dependency also implies some direction in the relationship. If data item A is the determinant data item and B the dependent data item, then the direction of the determinancy is from A to B, and not vice versa. Data item B is said to be dependent on data item A if for every value of A, there is one, unambiguous value for B. In such a relationship, data item A is referred to as the determinant data item, while data item B is referred to as the dependent data item. Dependency or determinancy is based in the idea of a mathematical function. A function is a directed one-to-one mapping between the elements of one set and the elements of another set. Example

For example, in a human resources database, staffNo and staffName are in a determinant relationship. StaffNo is the determinant and staffName is the dependent data item. This is because for every staffNo, there is only one associated value of staffName. For example, 345 may be associated with the value J.Smith. This does not mean to say that we cannot have more than one member of staff named J.Smith in our organisation. It simply means that each J.Smith will have a different staffNo. Hence, although there is a determinancy from staffNo to staffName, the same is not true in the opposite direction—staffName does not determine staffNo. Staying with personnel information, staffNo will probably determine departmentName. For every member of staff, there is only one associated department which applies. A member of staff cannot belong to more than one department at any one time within this domain (Fig. 8.6). ◄ It should be recognised that an object may be identified in more than one way. In terms of data structures, this means that a data item or group of data items may have more than one determinant. According to the relational data model, a candidate key is any data item that can act in the capacity of a primary key for a table. Candidate keys are effectively competing determinants. Consider the diagram in Fig. 8.7. Here both staffNo and national insurance number (NINo) determine staffName independently. In this case, staffNo and NINo are candidate keys. As we shall see, a relational data structure can only have one primary key so we choose one of these candidate keys to be the actual key of the table and make the other a dependent data item in the table.

8.11

Determinancy Diagrams

157

staffNo

staffName

orderNo

orderDate

studentNo

courseCode

Fig. 8.6 Dependencies between data items Fig. 8.7 Candidate keys

staffNo

staffName NINo

8.11

Determinancy Diagrams

A diagram which documents the determinancy or dependency between data items is referred to as a determinancy or dependency diagram. Data items are drawn on a determinancy diagram as labelled ovals, circles or bubbles. Dependency is represented between two data items by drawing a single-headed arrow from the determinant data item to the dependent data item. Figure 8.6 illustrates a number of dependent relationships between data items.

158

8

Information Modelling and Data Systems

staffName

staffNo

deptName

location

Fig. 8.8 A simple determinancy diagram

Exercise As well as the ability to manage dispatch advices as data structures, the manufacturing company also uses a structure known as a job sheet to help coordinate production activity. An example of such a job sheet is given below.

Job sheet Job No:

2046

Order no.

Description

Product code

Item Length

13/1193G

Lintels

L150

Count discrepancy

Non-conforming black

Non-conforming white

Non-conforming no change

Galvanised

Dispatch no.

Dispatch date

Qty returned

1500

Order Qty 200

Batch weight 145

Weight returned

Y

If you can map the dependent relationships between data items as a diagram, it is relatively straightforward to devise efficient, normalised data structures to contain such data items. Let us illustrate this process with a very simple determinancy diagram, as in Fig. 8.8. Inspecting this diagram, we should see that we need two data structures to build the data model for this domain. In other words, the number of determinants on the diagram (ovals with arrows emerging from them) indicates the number of tables needed. The determinant becomes the primary key of the table. All immediate dependent attributes for a particular determinant become non-key attributes of the table.

8.11

Determinancy Diagrams

moduleName

staffNo

159

staffNo

staffLanguage

Fig. 8.9 Non-functional determinancy

Hence, in the determinancy diagram in Fig. 8.8, we have two determinants, staffNo and deptName. This means that one table is formed with the structure (staffNo, staffName, deptname); the other table is formed with the structure (deptName, location). Note that by following these simple steps, we have also created a foreign key (deptName) which connects the data stored in one table with that in the other. There is actually another type of determinancy or dependency important to this technique known as non-functional determinancy or dependency. Not all dependencies can be modelled in terms of functions. It is for this reason that we need to introduce the idea of a non-functional dependency. Data item B is said to be non-functionally dependent on data item A if for every value of data item A, there is a delimited set of values for data item B. The mapping is no longer functional because it is one to many (Fagin 1977). Example

Let us assume that the university maintains a list of languages relevant to the organisation. The university wishes to record which members of staff have which language skills. Clearly, the relationship between staffNos and languages is not a functional determinancy. Many staff members may just have one language, but some will have two or more languages. Also, each language, particularly in the case of European languages such as English, French and German, is likely to be spoken by more than one staff member. Therefore, staffNo and staffLanguage are in a non-functional or multi-valued determinancy. In other words, for every staffNo, we can identify a delimited set of language codes which apply to that staff member. ◄ Multi-valued or non-functional dependency is indicated by drawing a doubleheaded arrow from the determinant to the dependent data item. Figure 8.9 represents two non-functional relationships as determinancy diagrams.

160

8

moduleName

deptName

Information Modelling and Data Systems

location

Fig. 8.10 Transitive determinancy

Dependencies between any two data items may be diagrammed as A to B or B to A, but not both. Frequently, we may find that what is a single-valued dependency in one direction is a multi-valued dependency in the opposite direction. For example, take staffNo and departmentName. In the direction staffNo to departmentName, it is a functional dependency. In the direction departmentName to staffNo, it is a multivalued dependency. In such situations, we always choose the direction of the functional dependency. This makes the eventual relational schema a lot simpler. It reduces, as we shall see, the number of compound keys required. If, however, a functional or non-functional dependency exists in both directions, then we choose either. Example

For example, staffNo might functionally determine telephone extensionNo, and extensionNo in turn functionally determines staffNo. We hence may choose either staffNo or extensionNo as our determinant. Our choice is however likely to be influenced by how many other dependencies arise from the data item. StaffNo is more likely to do more work for us in this context. More data items are likely to be dependent upon staffNo than upon extensionNo. ◄ Figure 8.10 documents a transitive dependency. A functional dependency exists from staffNo to departmentName, from departmentName to location and from staffNo to location. Any situation in which A determines B, B determines C and A also determines C can usually be simplified into the chain A to B and B to C. Identifying these so-called transitive determinancies can frequently simplify complex determinancy diagrams and indeed is an important part of the process of normalisation. Frequently, one data item is insufficient to fully determine the values of some other data item. However, the combination of two or more data items gives us a dependent relationship. In such situations, we call the group of determinants a compound determinant. A compound determinant is drawn as an enclosing bubble around two or more data item bubbles. Hence, in Fig. 8.11, we need studentNo, moduleName and assessmentType to functionally determine assessmentGrade. The functional dependency is drawn from the outermost bubble. We can compose a relational data model from a diagram with functional determinancies by applying the rule: Every functional determinant becomes the

8.11

Determinancy Diagrams

161

Fig. 8.11 Compound determinancy moduleName

studentNo

assessmentGrade

assessmentType

primary key of a table. All immediate dependent data items become non-key attributes of the table. We handle non-functional dependencies by applying a further rule: Every non-functional determinant becomes part of the primary key of a table. That is, we make up a compound key from the determinant and dependent data items in a non-functional relationship. Example

A determinancy diagram for the dataset presented in Table 8.1 is presented in Fig. 8.12. ◄ Exercise Compose a relational data model from the determinancy diagram in Fig. 8.12. We should note two things about determinancy diagramming. First, it can be shown that the data model arising from determinancy diagramming is inherently fully normalised. This means that it is not subject to the maintenance anomalies discussed earlier. Second, what should be evident from this bottom-up exercise in data modelling is that data structures, at least as far as the relational data model is concerned, implement information classes, data elements implement information objects and data items value the attributes of objects and their classes. Example

In terms of the determinancy diagram in Fig. 8.12, four classes are evident— student, staff, module and assessment. This is because there are four candidate keys evident in the diagram and each of these forms the identifier for a class. From this diagram, it is also evident that the staff class has two attributes—the identifier staffNo and the dependent attribute staffName. ◄

162

8

Information Modelling and Data Systems

moduleName

studentName

staffNo

studentNo staffName

assessmentType

assessmentGrade

Fig. 8.12 Determinancy diagram for the academic example

8.12

Conclusion

Within this chapter, we have considered the concept of a data system and discussed how a data model documents the architecture of such a system. A data model must be distinguished from a data schema. The relational data model is an architecture for data, whereas a relational schema specifies the data structures that will be used for some specific application of a data system and how they relate. Normalisation refers to the process of designing a relational data schema which is free from so-called file maintenance anomalies. We considered how we might develop a fully normalised data model from the bottom-up—from an analysis of the dependencies between data items. Such an approach is normally only practicable for a relatively small dataset. It is for this reason that we have considered the approach of dependency or determinancy diagramming, which offers a more straightforward and practical route to the development of normalised schema. However, in practice, most designers of data systems would develop first an information model and translate this model into a schema by the process we have discussed in this chapter. Dependency diagramming might then be used to check or validate critical aspects of the schema design.

8.13

8.13

Summary

163

Summary

• Data are differences made in some substance or medium. Information is an accomplishment made by actors through their encounter with data. • A conventional set of differences used by a group of actors is known as a symbol. The symbol is the first element of a sign. The second component is the object or referent—the thing referred to or described by the symbol. Then, there is the concept—the differences made to some actor by engagement with the symbol. Finally, there is the actor at the centre of the process of sign-use or semiosis. • Symbols are normally bundled together in wider data structures. A data structure is a systematic form for organising data. A data structure is made up of a series of data elements which in turn are made up of a series of data items. • Data structures are not only forms of structure; they serve to inform institutional actors and often prescribe or proscribe action on the part of such actors. Data structures are important to instituting of facts about things and through this process are critical to the production and reproduction of institutional order. • A data model is an abstraction of the data structures relevant to some domain. A relational data model details how tables, primary keys and foreign keys can be used to store and access institutional facts pertinent to some domain. • Certain schemas appropriate to some domain are better than others. In terms of relational data model, the concept of normalisation is applied to indicate whether a data schema is satisfactory. Normalisation refers to the process of designing a relational data schema which is free from so-called maintenance anomalies. • Dependency or determinancy diagramming offers a way of building a data model from an analysis of data items and their dependencies. A data model produced in this manner can be shown to be fully normalised.

9

Information Modelling in Context

9.1

Introduction

Within this chapter, we want to consider the context of information modelling in a number of different senses. First, we shall consider how information modelling fits within the larger practice of business analysis and design. Second, we shall consider how information modelling has relevance not only to modelling data but also to the modelling of metadata. This leads us to discuss the way in which information modelling is relevant within the design of Web infrastructure. Third and finally, we consider how an understanding of information modelling is important to building a more nuanced approach to big data as well as to the more overarching and emerging discipline of data science. As we have seen, the business analysis and design technique of information modelling has been around for many decades and has been somewhat unusual amongst such techniques in changing very little during its lifetime. We have treated information modelling in something of a non-standard way as a process of documenting the important elements of communicative practices, either as they are currently within some institutional domain or as stakeholders would like such practices to be. As such, information modelling contributes in an important way to the effective management of patterns of information situations within domains. Normally however, and as we have seen in Chap. 8, information modelling is considered to have a much narrower purpose, directed at the analysis and design of data systems. The most commonplace type of data system experienced in the modern world is the electronic database. However, there are many other types of data system which benefit from information modelling, such as those underlying the infrastructure of the World Wide Web. The centrality of data to everyday life has led to an explosion of data and to a range of practices for manipulating mountains of contemporary data. Collectively, such practices are generally labelled as big data. Big data has been big news for some time, not only in the popular press but also within academia. This is true not only of the disciplines of computer science, information science and information systems # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_9

165

166

9

Information Modelling in Context

but also within business and management. We shall examine the nature of big data itself using the theoretical framework of information situations. We want to critique the often-purported notion that the ‘data’ in big data is in many ways distinct from the ‘data’ in small data. This leads us to call for a more socially nuanced account of data within the development of data science. Information modelling, conducted in the manner proposed in this book, has much to offer this nuanced framing of data science.

9.2

The Place of Information Modelling Within Business Analysis and Design

We have spent some time within this book discussing the idea of patterns of information situations and making the case that such patterns form the proper context for information modelling. Information modelling has also been portrayed as an important activity within the larger practice of business analysis and design. Business analysis and business design in turn work with a notion of science which is distinct from natural and social science. We have promoted elsewhere a view of business analysis and design as an act of sense-making—of making sense of some problematic situation, within some institutional domain (Beynon-Davies 2021a). We make such sense by applying the interrelated ideas of systems, patterns and signs. Domains are considered as systems of significant patterns of action undertaken by networks of both human and technical actors. We make sense using these tools for thought to understand how the domain currently works or is accomplished. We make sense also to identify problems with current situations and suggest coherent visions of changing how things work. Traditionally, the business analyst/designer is seen as a role mediating between the developers of ICT systems and the users of such systems. Not surprisingly, the key focus of the business analyst in this guise has been the analysis and design of technical systems such as ICT systems. However, we have suggested (Chap. 2) that the business analyst is also concerned with two other forms of systems and patterns—systems or patterns of communication (information) and systems or patterns of coordinated activity. Therefore, in previous work (Beynon-Davies 2021a), we have portrayed the role of the business analyst/designer as a facilitator of sense-making. The key task of the business designer is to make sense of the entangled systems and patterns that constitute institutional action within some domain and as an aid to this process to develop models of such systems. The notion of science is traditionally directed at understanding what is. The natural or physical sciences such as physics, biology and chemistry attempt to understand what constitutes the physical world. They seek to explore and explain phenomena such as gravitational waves, the structure of viruses and the shape of inorganic molecules. The social and psychological sciences such as economics, sociology, psychology and politics attempt to understand what the psychological and social world consist of. They seek to explore and explain phenomena such as

9.2 The Place of Information Modelling Within Business Analysis and Design

167

inflationatory cycles, the depth of inequality in society and the development of human personality. In the 1960s, Herbert Simon (1996), a founding father of both management science and computer science, proposed another type of science, which he originally referred to as the science of the artificial. Many now refer to this science as design science. Simon described this science in the following terms, ‘engineering, medicine, business, architecture and painting are concerned not with the necessary but with the contingent—not how things are but how they might be—in short, with design’. So, design science is concerned not with what is but with what might be. Business analysis and design is clearly an important endeavour within the broader notion of design science. Business analysts and designers are concerned with the contingent nature of organisation—the idea that the actions undertaken by actors within domains are conventions and as such are arbitrary and open to change. This means that the business analyst needs to utilise design theory in the production of design artefacts. Design science is implemented through design theory. Design theory tells the designer how to analyse and design certain things. Such theories provide explicit prescriptions as to how to understand, design and develop an artefact. Design artefacts are the output of applying design theories in practice. For the business analyst and designer, the primary design artefact of concern is the business model. Business analysis and design can be seen as a coherent set of activities with the purpose of developing a business model. The concept of a business model has become popular in recent times as a way of thinking about business change, particularly when such change incorporates some technological innovation. We have used the term business model in a different sense (Beynon-Davies 2017, 2021a) which encompasses the traditional usage of this term, but which broadens its coverage. Within our approach, we conceive of the core of a business model as a model of three types of system that interact, entangle or couple to produce and reproduce socio-technical organisation. These include an activity system (a system of coordination) model, an information system (a system of instrumental communication) model and a data system (a system of articulation) model. An information model is an important part of an information systems model and in turn critical to the analysis and design of a business model. An information model is a design artefact produced through the application of design theory. The design theory, first introduced in Chap. 2, presupposes that acts of instrumental communication can be described in terms of both their intent and content. The intent refers to the intended purpose of some communication. The content refers to the signs used to convey the message of some communication. The design theory further presupposes that such content can be unpacked in terms of classes, attributes and relationships and that using these constructs the design artefact of an information model can be produced.

168

9.3

9

Information Modelling in Context

Data and Metadata

As part of a business model, an information model has clearly a wider purpose and range of application than that proposed in traditional literature. Traditionally, as we have seen within Chap. 3, information modelling is thought of in narrow terms as applicable to the analysis and design of structured data and the systems that articulate such data, particularly relational database systems. We maintain that information modelling has a much wider purpose, and to demonstrate this, we consider here its use in the analysis and design of the metadata systems underlying the World Wide Web. Within Chap. 4, our coverage of objects, classes, attribution, association, generalisation and aggregation was designed to expose the component elements needed to form a model of some institutional ontology. But these notions are useful also in understanding, analysing and indeed designing another important layer within contemporary data infrastructure—that of metadata. Metadata is a form of abstraction which attempts to provide greater context to human or machine actors in their engagement with data. Example

One of the most straightforward examples of metadata is that of an index. There are many different structures for an index, but let us examine one of the simplest—that used at the end of this book. An index comprises the data structure of a list in which the data elements are list items, and such list items are composed of at least two data items. The first data item within a list item is a keyword and the second data item a page number. This example of metadata—data about data—is used to provide context to the some text. It provides to the reader of the text some understanding of where certain topics classified through use of a keyword are discussed within the text. ◄ Although the prefix meta means after or beyond in the original Greek, it is conventionally used in everyday English to mean about. Metadata might be defined more precisely as data which informs one or more actors about some aspects of underlying data and as such can be considered a form of abstraction. There are several different types of metadata of which the three most important are structural, descriptive and administrative. Structural metadata is data which defines the structure of some other data—how the underlying data is organised. The schema for some data system, such as a relational schema, is a key example of structural metadata since it defines the data structures appropriate to some domain, the attributes of such data structures and the relationships between such data structures.

9.3 Data and Metadata

169

Example

In Chap. 4, we defined the class Stillage and the class Location and confirmed that a stillage can be located at many different locations over time and likewise that a location locates many stillages at any one time. This suggests the following institutional facts: [Location LOCATES Stillage] [Stillage LOCATED AT Location] [LOCATES cardinality many] [LOCATED AT cardinality many] [LOCATES optionality optional] [LOCATED AT optionality mandatory] These institutional facts are used as part of the information model discussed in Chap. 4. Following the approach proposed in Chap. 8, we can generate a skeleton relational schema from such facts as follows: Locations(locationCode, . . .) Stillages(stillageCode, . . .) Positions(stillageCode(Not null), positionCode(Not null), . . .) This schema is an example of structural metadata. ◄ An information model in its entirety forms structural metadata because it is typically developed as an abstraction of some institutional ontology and which enables a mapping to be made from the constructs of an information model to a schema of data structures. Within a traditional database system, such structural metadata is held separately from the data structures which it describes. Within the data infrastructure of the WWW, at least traditionally, such metadata is embedded with the data structures it describes. Metadata was traditionally used in the card catalogues of libraries. Such catalogues contain metadata about books in the library such as author, title, subject classification, etc. This metadata is best thought of as descriptive in the sense of describing certain attributes of underlying data which might be useful in attempts to retrieve such data. Example

Within the academic world, a journal article is normally identified by a composite of its attributes such as journal name, author(s), date of publication, article title, volume number, issue number and page numbers. This particular combination of data items is often cumbersome to use in searches for articles and is frequently error-prone, typically because of incorrect representation of such details within references. This particular approach to identification of articles is also becoming

170

9

Information Modelling in Context

obsolete as many online-only journals have moved away from the practice of publishing in delineated volumes and issues. For such reasons, an approach to uniquely identifying publications or their parts through a digital object identifier (DOI) has been developed internationally. A DOI is a character string used to uniquely identify a digital object, such as an electronic document. The DOI system is implemented through a federation of registration agencies coordinated by the International DOI Foundation. Organisations such as journal publishers pay to become registrants within the DOI system, which enables them to assign DOIs for their electronic documents. A DOI is divided into a prefix and a suffix, separated by a slash. The prefix identifies the organisation registering the identifier known as the registrant, while the suffix is chosen by the registrant to uniquely identify a specific digital object. For example, within the DOI 10.1000/182, the prefix is 10.1000, and the suffix is 182. In terms of the prefix, 10 refers to the particular DOI registry, while 1000 identifies the particular registrant: in this case, the International DOI Foundation itself. The suffix 182 identifies a single digital object—the latest version of the DOI Handbook. One key advantage of a DOI is that it can be used to identify a complete journal, an individual article in the journal or a single figure in the particular article. Another key advantage is that within the DOI system, a clear separation is made between an identifier for a particular object and its so-called metadata, such as the location where the object can be accessed. This means that while the DOI for a document remains ‘persistent’ for its lifetime, its metadata, such as its location, may change a number of times. DOIs plus their associated metadata are deposited by a registrant in the international DOI registry. The metadata, such as the document’s location, are updated, whenever this changes. ◄ Descriptive metadata is not restricted to textual data. Descriptive data for a music file might include the artist’s name, the album and the year it was released. A digital image may include metadata that describes how large the picture is, the colour depth, the image resolution, when the image was created, the shutter speed and other data. But metadata may also be used for administrative purposes in the sense of providing data about the lifecycle of an underlying data structure or set of data structures, as well as the rights and responsibilities of certain actors in relation to the articulation of such data structures. In previous work (Beynon-Davies 2021b), we have suggested that because of the importance of data structures to scaffolding institutional order, it is important at the metadata level to clearly assign positive and negative powers to actors in relation to the articulation of data structures. This is a way of thinking about the issue of data control in terms of the theory of information situations. In this guise, data control can be seen as the need to create and maintain an architecture of such powers in relation to acts of articulation associated with the life of data structures. If we take the position that whoever controls the data controls the institution, then the critical importance of data control becomes apparent. The issue of data control is best expressed as defining certain roles for actors which have certain articulation rights over certain data structures. Such an

9.3 Data and Metadata

171

architecture of data roles granted certain data rights should provide answers to the following minimal set of questions associated with the life of a data structure: • • • •

Who is able to create a data structure/data element/data item about something? Who is able to delete a data structure/data element/data item about something? Who is able to update a data structure/data element/data item about something? Who is able to read a data structure/data element/data item about something?

The assignment of data rights effectively declares the powers of a data role. Such data rights come in two forms—positive and negative. Positive data rights are defined for data roles in terms of the articulation of data structures and effectively act as data permissions to do certain acts of articulation, such as the data such as update, read and delete rights. In contrast, data responsibilities assign negative powers in the sense of prohibiting the defined data role from doing certain things with data structures. Information modelling is important in the modelling of administrative metadata. Information modelling should provide for the analyst not only the structure of data systems but how these data systems are used and as such becomes an essential tool within data administration (Chap. 6). In recent times, an information model is necessary in the wider regulation of such data systems. Data management is a function within some organisation concerned with the management of data structures throughout their life. Such management will involve data control, data protection, data security and data retention and disposal. Data control, as we have seen, means establishing policies and procedures concerning who is able to articulate the data structure throughout its lifecycle. Data protection involves ensuring that adequate protections are in place to ensure that data is used only for declared purposes. Data security means ensuring that data cannot be articulated by unauthorised actors. Data retention and disposal involves maintaining clear policies for the deletion or retention of data structures after their useful life has to come an end, including the archiving of data. Example

Data protection is an important way of ensuring data privacy, particularly in relation to personal data—data held about the individual by institutions. For such reasons, many countries in the world have put in place legal powers to ensure that institutional actors such as companies register not only the structure of the data they hold about individuals but also the intended purpose of such personal data. ◄

172

9.4

9

Information Modelling in Context

The World Wide Web and Metadata

Structural and descriptive metadata is particularly important to contemporary technological infrastructure such as the World Wide Web (WWW) or Web for short. Descriptive metadata is an inherent part of the markup in HyperText Markup Language (HTML). Marking up refers to the process of putting a set of embedded tags within documents to indicate how the content is to be presented on devices such as personal computers and tablets. In the 1960s, work began on developing a generalised markup language for describing the presentation of electronic documents. This work became established in a standard known as the Standard Generalized Markup Language (SGML). SGML is in fact a meta-language: a language for defining other languages. Tim Berners-Lee, the inventor of the Web, used SGML to define a specific language for Web documents which he named HyperText Markup Language (HTML). HTML has undergone a number of versions since it was first introduced in 1991. As suggested above, a HTML document contains both content and tags. The document content consists of what is displayed on the computer screen. The tags constitute codes that tell the browser how to format and present the content on the screen. The general form of this relationship between tags and content is expressed as: content The tagname is taken from an established set of keywords established in the version of HTML. Tags are embedded in angled brackets. Certain tag names and the grammar with which they are used convey specific meanings to Web browsers. For instance, in the tag

, P is the tagname and acts as an abbreviation for the word paragraph, so this tag is designed to be placed at the start of a chunk or paragraph of text. The word align is a property which can be assigned several values from a limited list. One of these is right, which specifies that the paragraph in question should be right justified on screen. An end tag

is placed at the end of the chunk of text. The descriptive metadata discussed above is clearly fixed in the sense that the naming and functionality of HTML tags are specified and updated by the World Wide Web Consortium. But sometime into its history, HTML introduced a tag. This tag is used to describe to machine actors, such as search engine Web crawlers, the document into which it is included. Example

Suppose I wanted to include the authorship of a Web page as metadata. This might be done by placing the following meta tag in the header section of a Web document:

9.5 Information Modelling and XML

173

In a sense, what we are doing here is describing a document in terms of name/ value pairs. We are effectively establishing the following institutional facts about the content of this document. Some of these facts are of course implicit rather than explicit in the architecture of the Web: [Web Document AKO Object] [URI REFERS-TO Web Document] [Document HASA author] [www.peebedee.com/information modelling ISA Document] [www.peebedee.com/information author Paul Beynon-Davies] ◄

9.5

Information Modelling and XML

Metadata is critical to not only accessing and describing Web documents but also to the sharing of data between institutions. Data structures such as sales orders, delivery notes, invoices and payment advices are now coded as electronic messages, and to enable their effective sharing, three conditions must be satisfied: the message comprising the data structure must have a defined format; the receiver and sender of the message must agree on its format; and the message must be able to be sent and read by electronic devices. Two of these conditions should look familiar to the reader as they are taken from our framing of information situations, first introduced in Chap. 2. But note that all of these conditions merely refer to the technology of messages transmitted. They tell us nothing of the other important aspects of information situations that must be realised, such as the need for a common ontology to enable communication between dispersed actors. One of the most important consequences of our model of information situations is that for the effective transmission of any message, a common ontology must be shared between the sending actor and the receiving actors. Therefore, for the effective transmission of electronic documentation between machine actors, a formally implemented ontology must be devised and utilised between communicating institutions and the devices they utilise. Historically, a standard for the development of such ontology as it pertained to electronic documentation was based on a standard known as electronic data interchange (EDI). More recently, ontologies are defined using the Web-based technology of Extensible Markup Language (XML). One of the main advantages of HTML is its simplicity, enabling it to be utilised effectively by a wide user community across the world. However, users working within defined communities of institutional practice normally want to define their own tags, primarily to use such tags for the definition of data structures through structural metadata. This structural metadata can then be used by machine actors to enable the sharing of these data structures between institutions. The World Wide

174

9

Information Modelling in Context

Web Consortium developed Extensible Markup Language (XML) in 1998 to meet these needs. The term extensible within XML refers to the ability of users to define an extended range of new markup tags. Like HTML, XML is a restricted descendant of SGML. Whereas HTML is used to define how the data in a document is to be displayed, XML can be used to define the syntax (structure) as well as some of the restricted semantics (meaning) of a data structure. However, there is a significant difference between how XML manages the specification of data and the way in which relational databases do this. XML data structures are self-describing; relational data structures are not. An XML data structure contains not only data but also metadata tags that define the structure of the data. A single document can have different types of data. With the relational model, the content of the data is defined by its column definition. All data in a column must have the same type of data. An XML document consists of a set of elements and attributes. Elements or tags are the most common form of markup. The first element in an XML document must be a root element. The document must have only one root element, but this element may contain a number of other elements. Example

Let us use a simple example to illustrate the power of XML as a metadata language. Suppose your company is a coffee wholesaler. You might wish to create XML documents for the exchange of shipping information to your customers. An appropriate root element might therefore be the tag . An element begins with a start tag and ends with an end tag. The start tag in our document for the root element would be . The corresponding end tag would be . Note that tags are case-sensitive in XML. Hence, is a different tag from . Elements can be empty; in which case they can be abbreviated to . Elements must also be properly nested as sub-elements within a superior element. So, this XML element might be used to define a coffee product:

Kenya Special Kenya 20.00 4000

Here we have a ProductDetails element with several sub-elements. Definitions for these sub-elements such as ItemName, CountryOf Origin, WholeSaleCost and Stock are properly nested within ProductDetails.

9.6 The Semantic Web

175

In terms of our previous notion of a data structure, this would constitute a data element of a product’s data structure. Attributes, or what we previously called data items, are name-value pairs that contain descriptive information about an element. The attribute is placed inside the start tag for the element and consists of an attribute name, an equality (‘¼’) sign and the value for the attribute placed within quotes. In the coffee producer example, the tag contains the attribute ID and the value ‘1234’. ◄ Traditionally, the structure or syntax of an XML document has been defined in terms of a document type definition or DTD. More recently, the trend has been to use XML schema to define the structure of an XML document. This means that an XML schema can be defined separately from the data structures to which they apply: a similar principle to that applied to database systems. Such a schema, which is stored in a file with the extension .xsd, provides the names of all elements, which elements can appear in combination and what attributes are available for each type of element. It can also be used to specify certain rules on data elements such as whether an element is a piece of text or a number, whether an element has a default value or not as well as the minimum and maximum acceptable values. Clearly, data structures implemented in XML constitute a data system in the same way that a relational database constitutes a data system. Hence, information modelling is as equally relevant to the design of XML schema as it is to relational schema.

9.6

The Semantic Web

In Chap. 4, we discussed the notion of a sign lattice as a way of understanding and representing the notion of an institutional ontology. We stated there that the visualisation of a sign lattice we produced has many similarities with the idea of a semantic network or net. A semantic net was originally proposed as a way of representing knowledge—as a way of laying out the concepts appropriate to some domain and the relations between such concepts (Sowa 1984). As a visualisation, a semantic net consists of a directed graph, where the nodes of the graph represent concepts and the edges or vertices of the graph represent semantic relations between concepts. In the late 1990s, Tim Berners-Lee expressed a vision for something known as the semantic web. The semantic web can be seen as an attempt to build ideas of semantic nets into the architecture of the World Wide Web. As an extension to the Web, the semantic web is sometimes referred to as Web 3.0. The traditional Web, of course, consists of content connected via a multitude of hyperlinks implemented through embedded tags in documents. The important property of the Web is its universality, achieved through the power of such links to relate any content to any other content. However, this universality imposes a key limitation in that links do not contain any meaning over and above their ability to associate content items.

176

9

Information Modelling in Context

Generally speaking, some limited semantics is built into the representations of the World Wide Web using XML, a resource description framework (RDF) and the notion of an ontology described in Chap. 3. RDF codes the semantics of links as sets of triples, very similar in form to our use of binary relations as a canonical form for information models introduced in Chap. 3. These triples relate URLs using typed links which convey their relationship. Ontologies consist of taxonomies of objects and relations plus sets of inference rules. Software agents then traverse the Web and will need to share ontologies to enable them to perform tasks such as searching databases in multiple formats. Example

Suppose you have two Web pages stored on two separate Web servers. Currently, you can associate these two pages together by placing a hyperlink on one page which refers to the URL of the other Web page. However, the only meaning currently represented in this link is one of ‘see also’. When a user clicks on this link and is taken to the associated page, she must work out herself why the two pages have been associated. Now assume that you can create a link with an RDF triple of the form: [ ] Such as: [http://www.peebeedee.com/hr/employee1 http://www.peebeedee.com/hr/roles/ manages http://www.peebeedee.com/hr/employee3] The RDF triple is made up of three uniform resource identifiers (URIs). In this example, two employees are related together with a manages relation—supplying the meaning that one employee is the manager of another. ◄ The important point is that using such triples, the meaning as to why two pieces of content have been linked together is much more easily established, not only by humans but also by machines. However, this can only be achieved successfully if the terms used within such triples are established effectively amongst a community of users. Clearly documenting institutional ontology as an information model is an important step to building greater semantics into Web content. Also, engaging more directly with the pragmatics of institutional ontology (what the classes of an information model are used for) through information models might help to alleviate some claimed problems with the semantic web.

9.7 Big Data

9.7

177

Big Data

One obvious question to ask is whether information modelling is equally as relevant to big data as it is to small data. Big data is a difficult term to pin down because it is typically defined not only in terms of features such as data volume but also features such as data variety and data velocity. Data volume refers to the fact that size of datasets used within big data applications is very large, frequently constituting terabytes, peta-bytes and sometimes even exa-bytes of data. Data variety refers to the fact that the sources of data within such applications are diverse, frequently meaning that big data applications must handle multiple structures for data. Data velocity refers to the speed with which data is articulated—created, updated and read. With big data applications, velocity has rapidly increased, leading to data sensed, recorded and analysed almost in real time. These features of data volume, data variety and data velocity largely concern the mechanics of data. Modern technologies allow larger amounts of data of various different structures to be stored and articulated rapidly. Larger here can mean data elements identifying all things within a target population or describing through attributes a vast number of features of some thing. Example

All customers of a company might have data elements created which identify them, while in modern human genomics, one person’s entire genetic code might be sequenced and stored. ◄ However, several other features are also sometimes seen as important to big data such as data resolution, data relatability, data flexibility, data indexicality and data exhaustivity (Kitchin 2014). These are features that rely upon the intentionality associated with everyday institutional actors taking action. As such, these features, as we shall see, are subject very much to matters of social or institutional ontology. Consider data resolution, which refers to the objective of big data applications being as detailed as possible. The issue of data resolution is often made operational in data systems in terms of the notion of data granularity, which refers to the level of detail at which data is recorded within data structures. The greater the granularity of data, the deeper the level of detail about things held to be of interest. But granularity is normally very much a social construct in most data applications. As we made clear in Chap. 3, it is never possible to represent and store everything because representation (the act of creating some sign which stands for something else) is necessarily a process of abstraction. The act of making some sign involves decisions made by institutional actors (people or machines) about what is appropriate and often feasible to represent as well as how to represent it.

178

9

Information Modelling in Context

Example

For instance, airlines may store details of your seating position on journeys but not your choice of meal. Clinicians may choose to classify your condition as type 1 diabetes but not to record aspects of your lifestyle. Parcel delivery companies are likely to identify delivery points based upon data at the level of postcode or ZIP code and not in terms of GPS coordinates. ◄ We should not be surprised at this given our understanding of the theory of information situations (Chap. 2). All these examples demonstrate the linkage between data and coordinated action through the mediating accomplishment of information. Institutional actors normally create institutional facts at a sufficient level of detail to enable coordinated action. Next, what about data relatability? This refers to the objective that the data in one dataset should be relatable to that in other datasets acting as sources to big data applications. As we established in Chap. 4, to relate one data structure to another relies typically upon some institutional framing of not only what things can be referred to but what it is appropriate to classify, associate, generalise or aggregate. Example

For instance, it is normally deemed acceptable to associate a data structure containing details of a person’s occupation to another data structure describing a person’s health. This is why we would create a class for an Employee and associate it with another class, perhaps named simply Health, communicating details of a person’s health attributes. However, it would not be appropriate to relate the occupational record to a data structure storing the eating habits of occupants of a zoological garden, even though records from such different institutions might contain common data such as personal names. In other words, we might have a person named Eric in the occupational record and a panda named Eric in the zoological record, but these records clearly instantiate different ontologies. ◄ In conventional data management, the issue of data relatability is resolved or realised within an information model. As we have seen in Chap. 8, such an information model can be resolved into the data structures important to some institutional domain as well as the relationships between such structures. Such an information model, as we made clear in Chap. 4, is reliant upon an understanding of the communicative practices appropriate to some institutional domain, which is, of course, very much a matter of social or institutional ontology. Data flexibility refers to the objective that both the structure and size of data should be able to be expanded easily and rapidly. However, ideas about how it might be appropriate to extend data both in structure and size (volume) are very much reliant upon aspects of social ontology. In terms of size, flexibility is clearly a function of data exhaustivity—how much data is needed to cover the entire

9.7 Big Data

179

population of ‘things’ that data is meant to represent. In terms of structure, flexibility is a function of granularity, how many data items are needed to describe a ‘thing’ completely in terms of its properties or attributes. Therefore, both exhaustivity and flexibility rely upon an understanding of social ontology. Constraints on flexibility and exhaustivity are often not a technological matter but imposed by social institutions. Example

Within societies worldwide, there is ongoing debate about things which are important not only to refer to but also to prohibit from being referred to. For instance, many societies have introduced data protection legislation designed to restrict certain data being represented about the individual. There is also an ongoing debate about what not only data should be attributed to the individual but also whether it is appropriate to represent data about the activities of individuals within commercially owned data infrastructure. Shoshanna Zuboff (2019) is one of the most important critics and has questioned the way in which tech companies effectively conduct surveillance on their customers and harvest data about them for commercial gain. ◄ Data indexicality refers to the objective that data is uniquely identifiable. Clearly, as we have seen in Chap. 2, indexicality relies upon the social notion of identification—the deemed relationship between some identifier and what it indexes or refers to. Things or objects that are regarded as distinct or unique from other things (identity) are fundamental to notions of existence. The idea of indexicality is actually one of the three fundamental modalities of a sign identified by Charles Sanders Peirce—the other two being symbolism and iconicity. A sign is an index if the presence of the thing standing for some object co-occurs with the object either spatially or temporally but whose meaning is reliant upon the context in which the sign is used. Example

For instance, words such as I, here and this are typically indexical, relying upon who is using them and in what setting. Hence, what is indexical within any institutional domain is critically dependent on social ontology. ◄ Finally, data exhaustivity refers to the objective that whole populations of data, rather than samples of data, should be collected, stored and analysed. This notion of data exhaustivity relies upon the idea of what is considered a population and its proper extent.

180

9

Information Modelling in Context

Example

Take the population of all British citizens. Clearly the population here relies upon the social construct of citizenship—who is deemed a member of some nationstate, such as the UK. But citizenship is reliant upon a sophisticated scaffolding of data structures such as passports or identity cards. ◄ So, there is a circularity of referencing here in ideas of data exhaustivity. A population is defined in terms of the differences drawn amongst a group of objects, which in turn serves to define the data required to exhaustively cover the population. However, and more critically, if we take any data to be an act of creation by people or machines, then all data because of its basis as forms of representation is inherently ‘sampled’. As Kitchin (2014) states, ‘. . .though Big data may seek to be exhaustive, capturing a whole domain and providing full resolution, it is both a representation and a sample, shaped by the technology and the platform used, the data ontology employed and the regulatory environment, and it is subject to sampling bias. Indeed, all data provide oligoptic views of the world: views from certain standpoints, using particular tools, rather than an all-seeing, infallible God’s eye view’. Proponents and practitioners of big data frequently argue that the applications they build do not rely upon the notion of structure. To explain this, a distinction is frequently made within the technological literature between structured and unstructured data. Examples cited of structured data are records stored in a relational database consisting of defined data elements with defined data items. Examples cited of unstructured data are documents, emails, images and videos. This distinction is somewhat artificial because all data to be data must have structure. From our theory of information situations, we should understand how structure in being formed as data amounts to a consistently applied set of differences made in some substance. All data needs data structures for representation, storage and analysis. And all data must be structured in playing its part in the accomplishment of sign-use by institutional actors, whether such actors be people or machines. Example

So, an electronic document such as an XML document is likely to have a schema which defines its structure, and images have to be captured and stored using defined formats such as jpg or gif. ◄ What technologists are really doing here is making a distinction between levels of analysis that have to be performed upon data to generate institutional facts. Within so-called structured data, a great deal of the ontology of the data is embedded within the way data is organised and, as we have seen within this chapter, may even be defined systematically in some metadata, such as a schema. In contrast, within so-called unstructured data, work (sometimes significant work) must be performed on the data to re-engineer aspects of ontology.

9.8 The Notion of a Data Science

181

Example

Let us work through an example here to help explain this point. Take a simple data profile of the person—this can be represented as a record with data items name, age, marital status, address, landline telephone number and mobile phone number. To ask questions of this data such as how old the person is or whether he is married, you merely must retrieve the value held in the data item age or the data item marital status. The way in which this data is organised and implemented in some scheme reflects its overarching ontology. Now imagine that this same data is represented as a piece of text, perhaps implemented as an email or an electronic text. This might consist of: [John Doe is fifty-five years old. He lives on 24 Crawshay Street, Canton, Cardiff. John is married to Eileen and has four children. His land-line number is 029294563 and his mobile phone number is 0777643256] To ask the same questions of this piece of text (what is John’s age and marital status), a great deal of prior analysis must be done on the text to derive implicit rather than explicit ontology. The program used to implement the questions must work out that things such as that the phrase fifty-five years old refers to someone’s age and that to live on somewhere indicates someone’s place of residence. ◄ So, the distinction between structured and unstructured data is not really about structure but about the level of existing ontology defined and the consequent level of analysis needed to resolve issues of both content and intent. There is a consequent argument to that made on the structure of data. Big data sometimes promotes the idea that data can be treated without any consideration of its context. Our consideration of the features of big data, such as data resolution, data relatability, data flexibility, data indexicality and data exhaustivity, suggests otherwise. We want to propose that all data in its very nature comes with context and that an understanding of context through an information model is essential to good data handling not only in terms of small data but also in terms of big data.

9.8

The Notion of a Data Science

Big data is sometimes framed as an important element of the wider endeavour of data science. The term data science was originally coined in the late 2000s and although now much used is quite a difficult term to pin down. In 2009, Hal Varian Google’s chief economist at the time described some of the skills required of the data scientist—‘The ability to take data—to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it . . .’ (Mckinsey 2009). These skills rather curiously seem equally applicable to the information modeller.

182

9

Information Modelling in Context

There is now some core consensus that data science clearly focuses upon developing methods and technologies for engaging with the large amounts of data created in modern society and the correspondingly large datasets that need to be managed and analysed arising from this. Data science is also portrayed as an interdisciplinary area which is attempting to develop a unified view of data appropriate to the diverse areas of statistics, data analysis, machine learning, big data, mathematics and computer science. Yet another viewpoint characterises data science in terms of a lifecycle model with interlinked activities including data capture, processing data, maintaining data, analysing data and communicating data. The key problem is that, as we have seen in Chap. 3, most of the disciplines that make up current notions of data science appear to work with a limited worldview of the nature of data, which we referred to as the conventional ontology of data structures. This is evident in the claims made for important elements of data science such as big data applications. We have hopefully made the case that this viewpoint tends to straitjacket approaches developed for engaging with data and in some cases leads us astray in designing good ways of making data and drawing inferences from it. We think it is important to establish a program to try to define the prospects for a data science which is informed not by the conventional ontology of data structures but by a social ontology of data structures. Any valid data science must recognise the way in which data structures scaffold institutional action. Data structures are made by actors within acts of articulation. Data is not neutral or objective but valued. It is directed at achieving communication, and such communication is entangled with the rights and responsibilities imbued to actors through data structures. So, as a consequence of this, we would suggest that a socially informed understanding of information modelling, as promoted in this book, is a necessary prerequisite for any good data scientist. Data structures are instantiations of institutional ontology. Their use as messages within information situations is reliant upon a shared understanding of such ontology by a community of actors. An understanding of such ontology must therefore be developed before such data can be processed and utilised within algorithmic decision-making. For this purpose, we believe that information modelling conducted in the manner discussed in this book has an important part to play.

9.9

Conclusion

Within this chapter, we have tried to demonstrate that the context of use for information modelling is much wider than one would conventionally think. Information modelling is clearly an important business analysis and design technique that has been used successfully for many decades in the design of conventional data systems such as relational databases. But information modelling is as important to the design of metadata as it is to data and as such is important to the analysis and design of html, xml and semantic web schema. We have also proposed that information modelling supplies some greater sense to understanding not only the

9.10

Summary

183

potential of big data and data science but also why certain of the claims made by these recent and emerging areas of computing practice demand a much more nuanced conception of data and its uses, such as the one promoted in this book. We have attempted in this book to provide a new rendering of information modelling which we think provides a more productive way of approaching this important business analysis and design technique. Being centred in an attempt to better understand institutional ontology, information modelling offers a way of not only better designing our data infrastructures but a way of better understanding some of the problems that our contemporary data infrastructures provide for us. Information modelling is a useful tool with which to engage with the problems of contemporary data infrastructure. But just like any tool, it is only as good as the use to which it is put and the competence and integrity of its user. We hope that this book has suggested a coherent pathway for you, the user, to better use this tool in practice.

9.10

Summary

• Information modelling is an important practice within business analysis and design. • Business analysis and design is a set of practices concerned with the contingent nature of institutional action and as such is very much a proponent of design science. • Within this book, we have proposed a view of information modelling as a practice which utilises the design theory of information situations to produce the design artefacts of information models. • Information modelling is important to the analysis and design of metadata as well as data. As such, it is important to the design of html, xml and semantic web schema. • Information modelling is equally as relevant to big data as it is to small data. This is because it provides a means of understanding and unravelling issues of institutional ontology on which data is based. • Information modelling as the exploration of communicative competence offers a more nuanced way of understanding data within the realm of data science.

Exercises

10.1

10

Introduction

This chapter provides a set of exercises, additional to those found within the chapters of the book, to help ground understanding of information modelling. Solutions to each of these exercises are provided in the next chapter. Each section begins with a summary of a construct from information modelling, followed by one or more exercises which are designed to test understanding and application of the construct in question.

10.2

Information Classes

A class or more accurately an information class may be defined as some ‘thing’ which an organisation recognises as important and communicates about on a regular basis. Exercise 10.2.1 Identify likely information classes from the following description of a criminal courts domain. Each judge has a list of outstanding cases over which he will preside. Only one judge presides per case. For each case, one prosecuting counsel is appointed to represent the Department of Public Prosecutions. Cases are scheduled at one Crown Court for an estimated duration from a given start date. A case can try more than one crime. Each crime can have one or more defendants. Each defendant can have one or more defending barristers. If a crime has multiple defendants, each defendant can have one or more defence counsel defending. Defendants may have more than one outstanding case against them. Write these out first as institutional facts and then draw them as classes. # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0_10

185

186

10

Exercises

Exercise 10.2.2 Identify information classes in the following insurance domain. A policy holder may have a number of policies with the insurance company. Each policy is given a policy number and relates to a single policy holder. The company has a range of insurance products and may put together a range of products to form a policy. Examples of motor products are third party, fire, theft, accident damage, windscreen cover, etc. Brokers sell policies for commission, and any one policy may have commission payable to more than one broker. Claims are made against policies. A claim relates to only one policy, and each claim is classified according to one of six claim types. The company’s products are grouped by business area, i.e. life, motor, marine, etc. Any particular product belongs to only one business area. The company holds information on clubs and associations, for promotions and selective mailings. A holder may belong to a number of different associations. In order to limit risk, the company may place all or part of a policy with re-insurers. All or part of a single policy may be placed with a number of different re-insurers Write these out first as institutional facts and then draw them as classes.

10.3

Classification

Classification involves grouping objects that share common characteristics into an information class. The opposite process to classification is said to be instantiation. Exercise 10.3.1 Paul Beynon-Davies is a customer is one example of classification. Provide another example of classification from the domain of criminal court cases described in Exercise 10.2.1.

Exercise 10.3.2 Provide an example of classification from the domain of insurance described in Exercise 10.2.2.

Exercise 10.3.3 Biologists use classification to describe animals. Provide one example of biological classification in terms of likely classes.

10.6

10.4

Identifiers

187

Relationships

A relationship is some association between information classes. Exercise 10.4.1 Identify possible relationships in the criminal courts domain described in Exercise 10.2.1.

Exercise 10.4.2 Identify relationships of association from the following insurance domain described in Exercise 10.2.2.

10.5

Attributes

A class is given shape through its properties or attributes. Exercise 10.5.1 Identify attributes of a criminal court case as described in Exercise 10.2.1.

Exercise 10.5.2 Identify attributes of an insurance policy as described in Exercise 10.2.2.

10.6

Identifiers

An identifier is anything which can be taken to refer to some other thing across time and space to multiple actors. Exercise 10.6.1 Where are surrogate identifiers relevant in the domain of criminal court cases described in Exercise 10.2.1?

188

10

Exercises

Exercise 10.6.2 List some surrogate identifiers that might be appropriate to the insurance domain described in Exercise 10.2.2.

10.7

Constraints on Relationships of Association

To each relationship of association, we can add two types of business rule or constraint, which expresses for us how a given organisation works with its associated information classes. One type of rule is known as a cardinality rule, while the other type of rule is known as an optionality rule. Exercise 10.7.1 In the insurance domain described in Exercise 10.2.2, a policy holder is related to a policy. What is the likely cardinality of this relationship?

Exercise 10.7.2 A policy holder holds a policy and a policy is held by a policy holder. What is the likely optionality associated with each class in this holds relationship from the domain described in Exercise 10.2.2?

Exercise 10.7.3 Produce a possible instance diagram for the holds/held by relationship to confirm cardinality and optionality. You will need to generate some likely identifiers for objects.

Exercise 10.7.4 A lecturer may teach many modules but a particular module is taught by at most one lecturer. What is the cardinality of this relationship?

10.9

Generalisation Hierarchies

189

Exercise 10.7.5 A customer must place at least one sales order to constitute being a customer of the company. What is the optionality associated with the customer class in the sales relationship?

10.8

Generalisation

Generalisation is the process of extracting common features from a group of object classes and suppressing the detailed differences between object classes. In practice, generalisation allows us to declare certain object classes as sub-classes of other object classes. Exercise 10.8.1 The information classes person and employee might be considered as related through generalisation. Identify which is the super-class and which is the sub-class in this relationship, and explain why.

Exercise 10.8.2 A postgraduate student is a specialisation of a student—true or false?

Exercise 10.8.3 A given information class may be a sub-class of more than one super-class. Provide one example of such a situation.

10.9

Generalisation Hierarchies

Generalisation relationships are used to form generalisation hierarchies. Exercise 10.9.1 Programmer, employee and computing staff are three information classes that form a generalisation hierarchy. Which is the most general class and which the least general?

190

10

Exercises

Exercise 10.9.2 Broker and MarketMaker are partial sub-classes of FinancialIntermediary. This means that other sub-classes are possible. True or false?

Exercise 10.9.3 Share and Stock are disjoint sub-classes of Security. This means that a Security can be both a share and a stock at the same time—true or false?

10.10 Aggregation An aggregation relationship occurs between a whole and its parts. An aggregation is an abstraction in which a relationship between objects is considered a higher-level object. Exercise 10.10.1 In what way might an automobile be considered as an aggregate? Draw a tentative model of this aggregate.

Exercise 10.10.2 An ambulance resource is made up of an ambulance, its crew members and requisite equipment. Draw this as an aggregate.

10.11 Visual notation Information models are usually mapped out as diagrams. Exercise 10.11.1 Classes are represented as labelled boxes. True or false?

Exercise 10.11.2 A class may only have one defining attribute. True or false?

10.13

Recursive and Ternary Relationships

191

Exercise 10.11.3 All classes must be related to all other classes on an information model through defined relationships. True or false?

Exercise 10.11.4 An Employee uses a CompanyCar; a given CompanyCar will be used by a number of different employees. Draw the information model and include the appropriate cardinality on the diagram.

Exercise 10.11.5 Some employees do not use any company car; all company cars are used by at least one employee. Add the appropriate optionality to the information model.

Exercise 10.11.6 Draw an information model to represent the following generalisation problem: Lions and Tigers are big cats; BigCats are Mammals; Mammals are Animals.

10.12 Strong and Weak Classes An information class is said to be a strong class if the existence of its instances does not depend on the prior existence of the instances of some other class. In contrast, a weak entity depends on the existence of some other class within the domain considered. Exercise 10.12.1 Voter, house and registration are three classes relevant to a voting register. Which is the weak class of these three and why?

10.13 Recursive and Ternary Relationships In conventional information model diagrams, the relationships are all binary, that is, we diagram two information classes and a relationship or a set of relationships between these information classes. It is possible however for association

192

10

Exercises

relationships to be unary. In other words, a relationship may involve only a single information class. Unary relationships are frequently described as being recursive in that they relate classes of the same type. Exercise 10.13.1 An employee of some company is a manager of other employees. These employees are in turn managers of other employees. Represent this managerial hierarchy as a unary relationship.

Exercise 10.13.2 Produce an information model that represents the following domain describing a horses breeding register: A racing horse is identified by a unique name. The date of birth of the horse and its sex are also recorded. Each horse has a father and mother. The system must be able to produce a genealogy for each horse. A ternary relationship is one which associates three or more information classes together. Ternary relationships are only used when they cannot be decomposed into a series of binary relationships. Exercise 10.13.3 Consider the table below which stores data about automobile agents, automobile companies and automobiles: Outlets Agent Jones Jones Smith Smith

Company Ford Vauxhall Ford Vauxhall

Automobile Car Van Van Car

If agents represent companies, companies make products and agents sell products, then we might want to record which agent sells which product for which company. To do this, we need the structure above. We cannot decompose the structure because although agent Jones sells cars made by Ford and vans made by Vauxhall, he does not sell Ford vans or Vauxhall cars. Draw an information model for this ternary relationship.

10.14

Composing an Information Model

193

10.14 Composing an Information Model Composing an information model means applying the constructs of information modelling to representing the ontology of some institutional domain. Exercise 10.14.1 Draw an information model for the following domain. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships. A university decides to offer a series of short courses to industry and commerce. It is particularly interested in offering courses in information technology of some 3–4 days duration. It decides to set up a commercial arm USC (University Short Courses) to develop, market and administer such courses. Each short course is developed and maintained by one member of university staff, known as the course manager. A course may however be presented by a number of different lecturers, besides the course manager, depending on the popularity of the course. USC hold course presentations both at a specially prepared site on the university campus and at commercial and industrial sites. The former are described as being scheduled presentations, while the latter are described as being on-site presentations. Students on scheduled courses are likely to come from a number of different organisations. Students on on-site courses obviously come from one organisation. A number of companies use USC courses as part of their in-house training schedule. USC wants to build a database system to help it to perform the administration of short courses more efficiently. Besides conventional insertion, update and deletion of data, a range of other functions must be performed by the intended system: When a person telephones to register for a particular session, USC staff need to check the number of persons already registered. The number of people registered for a presentation should not be greater than the course limit. At any time prior to a particular presentation, USC staff want to query the database to find out how many people have registered for the presentation. On a regular basis, the system needs to notify USC staff of those presentations that have less than four students registered for them. Four students is a break-even point for costing a presentation. If less than four persons register, then the presentation will be postponed and run at a later date. Two weeks before a presentation, staff need to ring each student to confirm attendance. (continued)

194

10

Exercises

Exercise 10.14 (continued) One week before each presentation, staff need to check that the presentation fee has been paid. An attendance list then needs to be printed for each presentation. USC needs to keep track of which lecturers are qualified to teach which courses. Lecturers have to be assigned to scheduled courses at least 6 months ahead of a presentation. The manager of USC wants to produce statistics on the average number of students per course, to give him some idea of the popularity of each course. He is also particularly interested in finding out whether courses sell better at particular times of year. USC also wants to maintain an up-to-date mailing list so that it can send regular brochures of new and updated courses to potential customers.

Exercise 10.14.2 Produce an information model to handle the structure of a supermarket till receipt or a bill received from a company such as a utility company. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships.

Exercise 10.14.3 Draw an information model for the following domain. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships. Easyhire is a nationwide car hire company. This company maintains two types of business unit: depots and hire-points. Depots are places where hire cars are held and maintained. Hire-points are places where customers hire cars. Any one hire-point has access to many depots. Each depot can supply many cars to many different hire-points. Customers pick up cars from depots and return cars to depots. Customers have to pick up cars from a specified depot but may return cars to any depot of their choice.

10.14

Composing an Information Model

195

Exercise 10.14.4 Draw an information model for the following domain. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships. Cinemaland owns a number of small cinemas in the UK. Each cinema has a unique name and is described in terms of its seating capacity, number of employees, location and manager. Cinemas show a number of films over a season, and the company needs to know not only which cinemas are showing which films currently but also which films will show where over the next year. A venue is a showing of a given film at a given cinema. Venues have a start and end date. The company also wishes to know how many people have attended each venue and the takings associated with a particular venue.

Exercise 10.14.5 Draw an information model for the following domain. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships. An information technology system is required to manage details of infant immunisation within a health region. Every infant patient within the region is required to have a general course of vaccinations against diseases such as whooping cough and diphtheria. Patients are given a unique identifier, and details of the name, date of birth and identifier of the mother of the patient are also recorded. Vaccinations are delivered by qualified health practitioners such as nurses and doctors. Each vaccination is given a unique number and is either of a single vaccine or of a multiple vaccine. The specific vaccines and dosages associated with each vaccination have to be recorded as well as the person given the vaccination, the location at which the vaccination was delivered and the date of the vaccination. Every infant will be given a number of booster injections for particular vaccines at specified intervals.

Exercise 10.14.6 Draw an information model for the following domain. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships. Quality estate agencies are a large company whose business involves managing the sale of residential properties between vendors and purchasers. (continued)

196

10

Exercises

Exercise 10.14 (continued) Properties are each given a unique identifier and are described in terms of property type (detached, semi-detached, terraced, etc.), postcode location, number of bedrooms, whether the property has a garage and whether the property has a garden and most importantly the asking price. Details of the vendor are also recorded: name and contact telephone number. Purchasers make appointment to view properties. Purchaser details are taken such as name and contact number. Appointments are described in terms of the date and time of appointment and the property viewed. A given purchaser may make a number of appointments to view, and purchasers may also be vendors.

Exercise 10.14.7 Draw a complete information model for the criminal court cases domain as described in Exercise 10.2.2. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships.

Exercise 10.14.8 Draw a complete information model for the insurance policies domain as described in Exercise 10.2.2. You will need to make some assumptions about elements of the information model such as the cardinality and optionality of relationships.

10.15 Modelling Time Most database systems handle events of some form: classes that must be timestamped in some way. Hence, in such databases, some way of accommodating past and future time must be utilised within design. Exercise 10.15.1 Assume that we wish to build records of student alumni (past students) upon a database which records data about current students. How would time be accommodated into the information model for this scenario?

10.16

Connection Traps

197

10.16 Connection Traps Connection traps because they make invalid assumptions about the connection between information classes. The first type of connection trap to consider is known as a fan trap because it may occur when two 1:M relationships fan out from the same information class. Exercise 10.16.1 Explain how the diagram in Fig. 10.1 contains a potential connection trap. You may find it useful to ask yourself the following question: If you know the employeeNo of an employee authorised to use a department pool car, will this information model allow you to determine which car has been used by the employee?

Fig. 10.1 A connection trap

Car registrationNo ..

Department departmentName ...

Employee employeeNo ..

198

10

Fig. 10.2 Yet another connection trap

Exercises

Lecturer staffNo ..

Tutorial tutorialDate tutorialTime tutorialRoom

Module moduleName ...

Exercise 10.16.2 Explain how the diagram in Fig. 10.2 contains a potential connection trap. You may find it useful to ask yourself the following question: can you tell from this diagram which tutorial was taught by which lecturer for which module?

Appendix: Solutions to Exercises

A.1 Introduction This chapter provides a set of sample solutions to the exercises set in Chap. 10.

A.2 Information Classes Exercise 10.2.1 Identify likely information classes for the criminal courts domain. Write these out first as institutional facts and then draw them as classes (Fig. A.1). [Judge AKO Class] [Prosecuting AKO Class] [Criminal case AKO Class] [Defendant AKO Class] [Defence counsel AKO Class] Exercise 10.2.2 Identify information classes for the insurance domain. Write these out first as institutional facts and then draw them as classes (Fig. A.2). [Association AKO Class] [Area AKO Class] [Policy holder case AKO Class] [Product AKO Class] [Broker AKO Class] [Reinsurer AKO Class] [Policy AKO Class] [Claim AKO Class]

# The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0

199

200

Appendix: Solutions to Exercises

Fig. A.1 Classes in the criminal cases domain

Fig. A.2 Classes in the insurance domain

Judge

Prosecuting counsel

Criminal case

Defendant

Defence counsel

Association

Area

Policy holder

Product

Broker

Reinsurer

Policy

Claim

A.3 Classification Classification involves grouping objects that share common characteristics into an information class. The opposite process to classification is said to be instantiation. Exercise 10.3.1 Paul Beynon-Davies is a customer is one example of classification. Provide an example of classification from the domain of criminal court cases. [Ghislaine Maxwell ISA Defendant]

Exercise 10.3.2 Provide an example of classification from the domain of insurance. [Trusted insurers ISA Broker]

Appendix: Solutions to Exercises

201

Exercise 10.3.3 Biologists use classification to describe animals. Provide one example of biological classification. [Tiger ISA Panthera]

A.4 Relationships A relationship is some association between information classes. Exercise 10.4.1 Identify possible relationships in the criminal courts domain described in Exercise 10.2.2 (Fig. A.3).

Fig. A.3 Relationships in the criminal cases domain

Prosecuting counsel

Judge

Criminal case

Defendant

Defence counsel

202

Appendix: Solutions to Exercises

Association

Area

Policy-holder

Product

Broker

Reinsurer

Policy

Claim

Fig. A.4 Relationships in the insurance domain

Exercise 10.4.2 Identify relationships of association from the insurance domain described in Exercise 10.2.2 (Fig. A.4).

A.5 Attributes A class is given shape through its properties or attributes. Exercise 10.5.1 Identify attributes of a criminal court case (Fig. A.5).

Exercise 10.5.2 Identify attributes of an insurance policy (Fig. A.6).

Appendix: Solutions to Exercises

203

Fig. A.5 Attributes of a criminal court case

Criminal case caseNo caseDescription crownCourt judgeNo prosecutingCounselNo Fig. A.6 Attributes of an insurance policy

Policy policyNo holderNo policyType policyStartDate policyDuration policyPremium

A.6 Identifiers An identifier is anything which can be taken to refer to some other thing across time and space to multiple actors. Exercise 10.6.1 Where are surrogate identifiers relevant in the domain of criminal court cases? [caseNo REFERS TO Criminal case]

204

Appendix: Solutions to Exercises

Exercise 10.6.2 List some surrogate identifiers that might be appropriate to the insurance domain. [policyNo REFERS TO Policy]

A.7 Constraints on Relationships of Association To each relationship of association, we can add two types of business rule or constraint, which expresses for us how a given organisation works with its associated information classes. One type of rule is known as a cardinality rule, while the other type of rule is known as an optionality rule. Exercise 10.7.1 In the insurance domain, a policy holder is related to a policy. What is the likely cardinality of this relationship? (Fig. A.7)

Exercise 10.7.2 A policy holder holds a policy and a policy is held by a policy holder. What is the likely optionality associated with each class in this relationship from the insurance domain? (Fig. A.8) Fig. A.7 Cardinality of holds/held by relationship

Policy holder

holds

held by Policy

Appendix: Solutions to Exercises Fig. A.8 Optionality of holds relationship

205

Policy holder

holds

held by Policy

Exercise 10.7.3 Produce a possible instance diagram for the holds/held by relationship to confirm cardinality and optionality. You will need to generate some likely identifiers for objects (Fig. A.9).

Exercise 10.7.4 A lecturer may teach many modules but a particular module is taught by at most one lecturer. What is the cardinality of this relationship? (Fig. A.10)

Exercise 10.7.5 A customer must place at least one sales order to constitute being a customer of the company. What is the optionality associated with the customer class in the sales relationship? (Fig. A.11)

A.8 Generalisation Generalisation is the process of extracting common features from a group of object classes and suppressing the detailed differences between object classes. In practice, generalisation allows us to declare certain object classes as sub-classes of other object classes.

206

Appendix: Solutions to Exercises

held by 34698

234 37798 237 34888 123

24988

Policy holder

holds

Policy

Fig. A.9 Instance diagram for holds relationship Fig. A.10 Cardinality of teaches relationship

Lecturer

teaches

Module

Appendix: Solutions to Exercises Fig. A.11 Cardinality of places relationship

207

Customer

places placed by Order

Exercise 10.8.1 The information classes person and employee might be considered as related through generalisation. Identify which is the super-class and which the subclass in this relationship, and explain why. A Person is a generalisation of an Employee because properties of a Person such as sex and age apply to everybody, while properties of an Employee apply only to employed persons.

Exercise 10.8.2 A postgraduate student is a specialisation of a student—true or false? The answer is true as a postgraduate student will inherit some of the common properties of all students. Exercise 10.8.3 A given information class may be a sub-class of more than one super-class. Provide one example of such a situation. One way of thinking about an ambulance crew member is in terms of two sub-classes—a general ambulance operative and one trained with certain medical skills (Fig. A.12).

208

Appendix: Solutions to Exercises

Fig. A.12 Example of generalisation

Crew member

Paramedic

Ambulance operative

dateQualified

A.9 Generalisation Hierarchies Generalisation relationships are used to form generalisation hierarchies. Exercise 10.9.1 Programmer, employee and computing staff are three information classes that form a generalisation hierarchy. Which is the most general class and which the least general? Employee is the most general or super-class, while Computing staff is a sub-class of Employee. In turn, Programmer is a sub-class of Computing staff (Fig. A.13).

Exercise 10.9.2 Broker and MarketMaker are partial sub-classes of FinancialIntermediary. This means that other sub-classes are possible. True or false? True, a partial sub-class means that other sub-classes of the super-class are possible.

Appendix: Solutions to Exercises Fig. A.13 Generalisation hierarchy

209

Employee

Computing staff

Programmer

Exercise 10.9.3 Share and Stock are disjoint sub-classes of Security. This means that a Security can be both a share and a stock at the same time—true or false? False, a disjoint sub-class does not cross another sub-class in a generalisation hierarchy.

A.10 Aggregation An aggregation relationship occurs between a whole and its parts. An aggregation is an abstraction in which a relationship between objects is considered a higher-level object. Exercise 10.10.1 In what way might an automobile be considered as an aggregate? An automobile can be considered an assemblage of a number of component parts (Fig. A.14). Exercise 10.10.2 An ambulance resource is made up of an ambulance, its crew members and requisite equipment. Draw this as an aggregate (Fig. A.15).

210

Appendix: Solutions to Exercises

Automobile

Engine

Chassis

Shell

Wheel

Fig. A.14 Aggregation hierarchy for automobile assembly

Resource

Crew member memberNo memberName memberStatus

Ambulance ambulanceNo ageVehicle vehicleMileage dateLastService

Equipment equipmentNo equipmentName dateLastService

Fig. A.15 Aggregate for ambulance resource

A.11 Visual Notation Information models are usually mapped out as diagrams. Exercise 10.11.1 Classes are represented as labelled boxes. True or false? True, a class is represented as a labelled box. Exercise 10.11.2 A class may only have one defining attribute. True or false? False, a class has more than one attribute.

Appendix: Solutions to Exercises Fig. A.16 Cardinality in uses relationship

211

Employee

uses

used by

Company car

Exercise 10.11.3 All classes must be related to all other classes on an information model through defined relationships. True or false? False, only some classes will be related together on an information model.

Exercise 10.11.4 An Employee uses a CompanyCar; a given CompanyCar will be used by a number of different employees. Draw the information model and include the appropriate cardinality on the diagram (Fig. A.16).

Exercise 10.11.5 Some employees do not use any company car; all company cars are used by at least one employee. Add the appropriate optionality to the information model (Fig. A.17).

Exercise 10.11.6 Draw an information model to represent the following generalisation problem: Lions and Tigers are big cats; BigCats are Mammals; Mammals are Animals (Fig. A.18).

212 Fig. A.17 Optionality in uses relationship

Appendix: Solutions to Exercises

Employee

uses

used by

Company car

A.12 Strong and Weak Classes An information class is said to be a strong class if the existence of its instances does not depend on the prior existence of the instances of some other class. In contrast, a weak entity depends on the existence of some other class within the domain considered. Exercise 10.12.1 Voter, house and registration are three classes relevant to a voting register. Which is the weak class of these three and why? Registration is the weak class as an instance of a registration relies upon the presence of both a voter and a house on the electoral register.

A.13 Recursive and Ternary Relationships In conventional information model diagrams, the relationships are all binary, that is, we diagram two information classes and a relationship or a set of relationships between these information classes. It is possible however for association relationships to be unary. In other words, a relationship may involve only a single information class. Unary relationships are frequently described as being recursive in that they relate classes of the same type.

Appendix: Solutions to Exercises

213

Fig. A.18 Generalisation example

Animal

Mammal

Big cat

Tiger

Lion

Exercise 10.13.1 An employee of some company is a manager of other employees. These employees are in turn managers of other employees. Represent this managerial hierarchy as a unary relationship (Fig. A.19).

Exercise 10.13.2 Produce an information model that represents the following domain describing a horses breeding register: A racing horse is identified by a unique name. The date of birth of the horse and its sex are also recorded. Each horse has a father and mother. The system must be able to produce a genealogy for each horse (Fig. A.20). A ternary relationship is one which associates three or more information classes together. Ternary relationships are only used when they cannot be decomposed into a series of binary relationships.

214

Appendix: Solutions to Exercises

Fig. A.19 Recursive manages relationship

Employee

managed by

manages

Fig. A.20 Recursive parentage relationship

Paternal offspring

Father of Horse

Maternal offspring

Mother of

Exercise 10.13.3 Consider the table below which stores data about automobile agents, automobile companies and automobiles: Outlets Agent Jones Jones Smith Smith

Company Ford Vauxhall Ford Vauxhall

Automobile Car Van Van Car

If agents represent companies, companies make products and agents sell products, then we might want to record which agent sells which product for which company. To do this, we need the structure above. We cannot decompose the structure because although agent Jones sells cars made by Ford and vans made by Vauxhall, he does not sell Ford vans or Vauxhall cars. Draw an information model for this ternary relationship (Fig. A.21).

Appendix: Solutions to Exercises

215

A.14 Composing an Information Model Exercise 10.14.1 Draw an information model for the short courses domain (Fig. A.22).

Exercise 10.14.2 Produce an information model to handle the structure of a supermarket till receipt or a bill received from a company such as a utility company (Fig. A.23).

Exercise 10.14.3 Draw an information model for the car hire domain (Fig. A.24).

Exercise 10.14.4 Draw an information model for Cinemaland (Fig. A.25).

Fig. A.21 A ternary relationship

Agent

Automobile

Company

216

Appendix: Solutions to Exercises

Lecturer

Course manager

Qualification

Attendance

Student

Booking

Payment

Presentation

Course

Venue

Customer

Fig. A.22 Skeleton information model for the short courses domain

Exercise 10.14.5 Draw an information model for infant immunisation (Fig. A.26).

Exercise 10.14.6 Draw an information model for the estate agency domain (Fig. A.27).

Exercise 10.14.7 Draw a complete information model for the criminal court cases domain as described in Exercise 10.2.1 (Fig. A.28).

Exercise 10.14.8 Draw a complete information model for the insurance policies domain as described in Exercise 10.2.2 (Fig. A.29).

Appendix: Solutions to Exercises Fig. A.23 Billing information model

217

Customer

makes

made by

Purchase

purchases

purchased

Product

A.15 Modelling Time Most database systems handle events of some form: classes that must be timestamped in some way. Hence, in such databases, some way of accommodating past and future time must be utilised within design. Exercise 10.15.1 Assume that we wish to build records of student alumni (past students) upon a database which records data about current students. How would time be accommodated into the information model for this scenario? This could be handled in a number of ways. One way is through generalisation (Fig. A.30).

218

Appendix: Solutions to Exercises

Hire point

Customer

provides

purchases

purchased by

Car

rented

provided by

rents

Car hire dropped off

picked up

picks

drops Depot

Fig. A.24 Car hire information model

A.16 Connection Traps Connection traps are so-called because they make invalid assumptions about the connection between information classes. The first type of connection trap to consider is known as a fan trap because it may occur when two 1:M relationships fan out from the same information class. Exercise 10.16.1 The answer to the following question identifies the connection trap: If you know the employeeNo of an employee authorised to use a department pool car, will this information model allow you to determine which car has been used by the employee? The answer is no because there is no record of which cars are used by which employees. The information mode will only tell you which pool cars each department has and which employees each department has.

Appendix: Solutions to Exercises

219

Fig. A.25 Cinema information model

Film

Cinema

shown at

provides

provided by

Venue

locates

located at

Attendance

attends at

attends

Customer

shown by

220

Appendix: Solutions to Exercises

Practice

contains

works for

parent/ guardian of

dependent of

General practitioner

Patient

administers

vaccinated with

administered by

vaccinates

Vaccination

delivers

delivered in

Vaccine

boosted with

boosts

Booster Fig. A.26 Infant immunisation information model

Appendix: Solutions to Exercises

221

Fig. A.27 Estate agency information model

Vendor

offers

offered by

Property

viewed in

for viewing Appointment

arranged by

arranges

Purchaser

222

Appendix: Solutions to Exercises

Prosecuting counsel

Judge

presides

prosecutes

presided by

prosecuted by Criminal case

Defendant represented by

Defence counsel

Representation represents

Fig. A.28 Criminal court cases information model

Association

Area

contains

covers

joins

covered by

Policy-holder

Broker

Product

Reinsurer

opens

covered in

brokers

reinsurers

opened by

covers

brokered by

reinsured by

Policy

has against it

made against

Claim

Fig. A.29 Insurance information model

Appendix: Solutions to Exercises

223

Fig. A.30 Alumni students Student

Current student

Alumni student

Exercise 10.16.2 You cannot tell from this diagram which tutorial was taught by which lecturer for which module. The information model will only tell you which modules are taught by which lecturer and which tutorials are held by which lecturer. The information model needs to be modified to answer the question set.

Bibliography

Alexander C (1964) Notes on the synthesis of form. Harvard University Press, Harvard, MA Allen GA, March ST (2006) A critical assessment of the Bunge-Wand-Weber ontology for conceptual modeling. In: Workshop on information technologies and systems, Milwaukee, WI Atkin A (2016) Peirce. Routledge, London Bacon M (2012) Pragmatism. Polity, Cambridge Bateson G (1972) Steps to an ecology of mind. Balantine Books, New York Batra D, Davis JG (1992) Conceptual modelling in database design: similarities and differences between expert and novice designers. Int J Man Mach Stud 37(1):83–101 Beynon-Davies P (1992) The realities of database design: the sociology, semiology and pedagogy of database work. J Inf Syst 2(3):207–220 Beynon-Davies P (1994) Information management in the British National Health Service: the pragmatics of strategic data planning. Int J Inf Manag 14(2):84–94 Beynon-Davies P (1997) The corporate data model: a study of organisational practice. J Syst Inf Technol 1(1):47–63 Beynon-Davies P (2013) Making faces: information does not exist. Comm Assoc Inf Syst 33(1):19 Beynon-Davies P (2015) Form-ing institutional order: the scaffolding of lists and identifiers. J Assoc Inf Sci Technol 67(11):2738–2753 Beynon-Davies P (2016) Instituting facts: data structures and institutional order. Inf Organ 26(1–2): 28–44 Beynon-Davies P (2017) Characterising business models for digital business through patterns. Int J Electron Commerce 22(1):1–27 Beynon-Davies P (2018) Declarations of significance: exploring the pragmatic nature of information models. Inf Syst J 28(4):612–633 Beynon-Davies P (2020) Business information systems, 3rd edn. Red Globe, London Beynon-Davies P (2021a) Business analysis and design: understanding innovation in organisation. Palgrave/Macmillan, Cham Beynon-Davies P (2021b) Data and society. World Scientific Press, Singapore Beynon-Davies P, Wang Y (2019) Deconstructing information sharing. J Assoc Inf Syst 20(4) Blaha M (2010) Patterns of data modeling. CRC, Boca Raton, FL Bodart F, Patel F, Sim M, Weber R (2001) Should optional properties be used in conceptual modelling? A theory and three empirical tests. Inf Syst Res 12(4):384–405 Boland RJ (1987) The in-formation of information systems. In: Boland RJ, Hirschheim RA (eds) Critical issues in information systems research. John Wiley, New York Bowker G, Leigh-Star S (1999) Sorting things out: classification and its consequences. MIT Press, Cambridge, MA Brachman RJ (1983) What ISA is and isn’t: and analysis of taxonomic links in semantic networks. Computer 16(10):30–36 Chen PPS (1976) The entity-relationship model: towards a unified view of data. ACM Trans Database Syst 1(1):9–36 # The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Beynon-Davies, Information Modelling, https://doi.org/10.1007/978-3-030-98805-0

225

226

Bibliography

Codd EF (1970) A relational model for large shared data banks. Comm ACM 13(1):377–387 Clark H (1996) Using language. Cambridge University Press, New York Eriksson O, Henderson-Sellers B, Agerfalk PJ (2013) Ontological and linguistic metamodelling revisited: a language use approach. Inf Software Technol 55(4):2099–2124 Fagin R (1977) Multi-valued dependencies and a new normal form for relational databases. ACM Trans Database Syst 2(3):262–278 Fagin R (1979) Normal forms and relational database operators. In: ACM SIGMOD international symposium on the management of data, pp 153–160 Frost RA (1982) Binary-relational storage structures. Comput J 25(3):358–367 Frost RA (1983) A step towards the automatic maintenance of the semantic integrity of databases. Comput J 26(2):124–133 Hay DC (1996) Data model patterns: conventions of thought. Dorset House, New York Howe D (1981) Data analysis for database design. Edward Arnold, London Kent W (1978) Data and reality. North-Holland, Amsterdam Kent W (1983) A simple guide to five normal forms in relational database theory. Comm ACM 26(2):120–125 Kitchin R (2014, April–June) Big data, new epistemologies and paradigm shifts. Big Data Soc:1–12 Klein HK, Hirschheim RA (1984) A comparative framework of data modelling of data modelling paradigms and approaches. Comput J 30(1):8–14 Le Guin U (1993) The Earthsea quartet. Puffin Books, London Lyytinen KJ (1985) Implications of theories of language for information systems. MIS Q 9:61–74 Maturana HR, Varela FJ (1987) The tree of knowledge: The biological roots of human understanding. New Science Library/Shambhala, Boston, MA Mckinsey (2009) Hal Varian on how the web challenges managers Moody DL, Shanks GG (2003) Improving the quality of data model: empirical validation of a quality management framework. Inf Syst 28(5):619–650 Searle JR (1970) Speech acts: an essay in the philosophy of language. Cambridge University Press, Cambridge, UK Searle JR (1983) Intentionality: an essay in the philosophy of mind. Cambridge University Press, Cambridge, UK Searle JR (1995) The construction of social reality. Penguin, London Searle JR (2010) Making the social world: the structure of human civilization. Oxford University Press, Oxford Shannon CE (1949) The mathematical theory of communication. University of Illinois Press, Urbana Siau K (2003) The psychology of information modelling. Adv Top Database Res 1(1):106–119 Simon H (1996) The sciences of the artificial, 3rd edn. MIT Press, Cambridge, MA Sowa JF (1984) Conceptual structures: information processing in mind and machine. AddisonWesley, Reading, MA Stonier T (1994) Information and the internal structure of the universe: an exploration into information physics. Springer Verlag, Berlin Tsitchizris DC, Lochovsky FH (1982) Data models. Prentice-Hall, Englewood Cliffs, NJ Tufte ER (1990) Envisioning information. Graphics Press, Cheshire, CT Wand Y, Weber R (1990) An ontological model of an information system. IEEE Trans Software Eng 16(11):1282–1292 Wand Y, Weber R (1995) On the deep structure of information systems. Inf Syst J 5(3):203–223 Weber R (2003) Conceptual modelling and ontology: possibilities and pitfalls. J Database Manag 14(3):1–20 Winston M, Chaffin R, Herrmann D (1987) A taxonomy of part-whole relations. Cognit Sci 11(4): 417–444 Zuboff S (2019) The age of surveillance capitalism. Profile Books, London