352 41 28MB
English Pages 578 Year 1975
COMPUTER DATA - BASE
ORGANIZATION
A
"- f/.J....:_
BOOK
Prentice-Hall Series in Automatic Computation
AHO, ed.,
Currents in the Theory of Computing The Theory of Parsing, Translation, and Compiling, Volume I: Parsing; Volume II: Compiling ANDREE, Computer Programming: Techniques, Analysis, and Mathematics ANSELONE, Collectively Compact Operator Approximation Theory and Applications to Integral Equations BATES AND DOUGLAS, Programming Language/One, 2nd ed. BLUMENTHAL, Management Information Systems BRENT, Algorithms for Minimization without Derivatives BRINCH HANSEN, Operating System Principles COFFMAN AND DENNING, Operating Systems Theory CRESS, et al., FORTRAN IV with WATFOR and WATFIV DAHLQUIST, BJORCK, AND ANDERSON, Numerical Methods DANIEL, The Approximate Minimization of Functionals DEO, Graph Theory with Applications to Engineering and Computer Science DESMONDE, Computers and Their Uses, 2nd ed. DRUMMOND, Evaluation and Measurement Techniques for Digital Computer Systems ECKHOUSE, Minicomputer Systems: Organization and Programming (PDP-I I) FIKE, Computer Evaluation of Mathematical Functions FIKE, PL/1 for Scientific Programmers FORSYTHE AND MOLER, Computer Solution of linear Algebraic Systems GEAR, Numerical initial Value Problems in Ordinary Differential Equations GORDON, System Simulation GRISWOLD, String and list Processing in SNOBOL4: Techniques and Applications HANSEN, A Table of Series and Products HARTMANIS AND STEARNS, Algebraic Structure Theory of Sequential Machines JACOBY, et al., Iterative Methods for Nonlinear Optimization Problems JOHNSON, System Structure in Data, Programs, and Computers KIVIAT, et al., The SIMSCRIPT II Programming Language LAWSON AND HANSON, Solving Least Squares Pr.oblems LORIN, Parallelism in Hardware and Software: Real and Apparent Concurrency LOUDEN AND LEDIN, Programming the IBM 1130, 2nd ed. MARTIN, Computer Data-Base Organization MARTIN, Design of Man-Computer Dialogues MARTIN, Design of Real-Time Computer Systems MARTIN, Future Developments in Telecommunications MARTIN, Programming Real-Time Computing Systems MARTIN, Security, Accuracy, and Privacy in Computer Systems MARTIN, Systems Analysis for Data Transmission MARTIN, Telecommunications and the Computer AHO AND ULLMAN,
MARTIN,
Teleprocessing Network Organization
MARTIN AND NORMAN,
The Computerized Society
A Compiler Generator MEYERS, Time-Sharing Computation in the Social Sciences MINSKY, Computation: Finite and Infinite Machines NIEVERGELT, et al., Computer Approaches to Mathematical Problems PLANE AND MCMILLAN, Discrete Optimization: Integer Programming and Network Analysis for Management Decisions POLIVKA AND PAKIN, APL: The Language and Its Usage PRITSKER AND KIVIAT, Simulation with GASP II: A FORTRAN-based Simulation Language PYLYSHYN, ed., Perspectives on the Computer Revolution RICH, Internal Sorting Methods Illustrated with PL/ I Programs SACKMAN AND CITRENBAUM. eds., On-Line Planning: Towards Creative Problem-Solving SALTON, ed., The SMART Retrieval System: Experiments in Automatic Document Processing SAMMET, Programming Languages: History and Fundamentals SCHAEFER, A Mathematica/ Theory of Global Program Optimization SCHULTZ, Spline Analysis SCHWARZ, et al., Numerical Analysis of Symmetric Matrices SHAH, Engineering Simulation Using Small Scientific Computers SHAW, The Logical Design of Operating Systems SHERMAN, Techniques in Computer Programming SIMON AND SIKLOSSY, eds., Representation and Meaning: Experiments with Information Processing Systems STERBENZ, Floating-Point Computation STOUTEMYER, PL/1 Programming for Engineering and Science STRANG AND FIX, An Analysis of the Finite Element Method STROUD, Approximate Calculation of Multiple Integrals TANENBAUM, Structured Computer Organization TAVISS, ed., The Computer Impact UHR, Pattern Recognition, Learning, and Thought: Computer-Programmed Models of Higher Mental Processes VAN TASSEL, Computer Security Management VARGA, Matrix Iterative Analysis WAITE, Implementing Software for Non-Numeric Application WILKINSON, Rounding Errors in Algebraic Processes WIRTH, Systematic Programming: An Introduction YEH, ed., Applied Computation Theory: Analysis, Design, Modeling MCKEEMAN, et al.,
COMPUTER
DATA-BASE ORGANIZATION JAMES MARTIN IBM Systems Research Institute
PRENTICE-HALL, INC., Englewood Cliffs, New Jersey
Library of Congress Cataloging in Publication Data MARTIN, JAMES Computer data·base organization (Prentice-Hall series in automatic computation) Bibliography: p. 1. Data base management. QA76.M324
001.6'442
T. Title.
74-13981
ISBN 0-13-165506-X
Computer Data-Base Organization
James Martin
© 1975 by Prentice-Hall , Inc.
Englewood Cliffs, N . J .
All rights reserved . N o part of this book may be reproduced in any form , by mimeograph or any other means, without permission in writing from the publisher. 10
9
8
7
6
5
4
3
Printed in the United States of America .
London Sydney CANADA, LTD., Toronto INDIA PRIVATE LIMITED, New Delhi JAPAN, INC., Tokyo
PRENTICE-HALL INTERNATIONAL, INC.,
PRENTICE-HALL OF AUSTRALIA PTY. LTD., PRENTICE-HALL OF
PRENTICE-HALL OF PRENTICE-HALL OF
TO C H A R I TY
CONTENTS
Preface
xiii
I ndex of Basic Concepts
xv
PROLOGUE 1
2
PAR T
I ntroduction
2
B asic Terminology
8
I
LOGICAL ORGANIZATION
4
What should be the Obj ectives of a Data Base Organiz ation?
3 5
6
7
8
9
10
11
What is a Data Base?
19
Entities and Attributes
44
Schemas and Subschemas
53
Data Base Management Systems Tree Structures
79
Plex Structures
89
31
Data Description Languages
66
100
The CODASY L Data Description Language
112
ix
12 13
14
15
16
PART
II 17
18
19
20
21
22
23
24
25
26 27
28
29
30
31
32 x
IBM's Data Language/I
149
Relational Data Bases Third Normal Form
129
169
Varieties of Data I ndependence
179
Operations Sys tems versus I nformation Systems
188
PHYSI CAL ORGANIZATION Criteria Affecting Phys ical Organization
198
Differences Betw een Physical and Logical Organization Pointers
219
Chains and R ing Structures Address ing Techniq ues
227
249
I ndexed Seq uential Organizations Has hing
267
291
Phys ical Representa tions of Tree Structu res
312
Phys ical Representations of Plex Structu res
332
Multiple-Key Retrieval I ndex Organization
345
365
A Comparison of Multiple-Key Organizations Separating Data and Relations h ips I ndex Searching Techniq ues Data Compaction
391
402
433
Virtual Memory and Storage Hierarchies
449
379
207
33
34 35
36
478
I nver ted F ile Systems Volatile F iles
492
Fast Response Systems Associative Memor y
504
516
Appendix
A
T he Mean Number of Pr obes in a Binar y Sear ch
B
Sample Logical Data Descriptions
Appendix
Class Questions I ndex
525
528
536
552
xi
PREFA CE
I NT E N T
One
of
the
most
badly
needed
courses
in
universities and other establishments which teach computing is a course on the realities of data base technology. The l 970's is the decade of the data base. Probably the biggest difference between the next generation of computers and the present will be massive on-line storage and its software. By the end of the l 970's, much of the computing in industry and government will relate to the data bases which have been painfully constructed piece by piece, and management effectiveness will relate to the quality of their organization's data sources and the versatility with which they can be used. At the time of writing, data base technology is widely misunderstood. Its role as the foundation stone of future data processing is often not appreciated. The techniques used in many organizations contain the seeds of immense future difficulties. Data independence is often thrown to the winds. Data organizations in use prevent the data being employed as it should be. And most educational establishments do not yet teach a data base course. This book is being used at the IBM Systems Research Institute as the text for such a course.
ACKNOW L E D G E M ENTS
The author wishes to thank many students who have reviewed the text critically. He is very grateful
for the detailed comments from Mr. R. M. Gale, Dr. E. F. Codd, Mr. R. W. Holliday, Mr. C. P. Wang, Mrs. S. Snook, Mr. Andy Chiu and Mr. H. S. Meltzer. The author is especially grateful to his wife for her logical and xiii
xiv
Preface
physical assistance. Miss Cora Tangney, who helped with the manuscript, and Miss Judy Burke, who helped with the production process, were immensely appreciated by the author and admired for their professional competence. The author, having been sustained when writing late at night by consuming large quantities of cookies, feels that acknowledgements are also warranted to Pepperidge Farm and Cadbury. James Martin
INDEX OF BASIC CONCEPTS
The basic concepts, principles, and terms that are explained in th is book
are
listed
here
along
with
the
page on
which an
introductory
explanation or definition of them is given. There is a complete index a t the end of the book.
Adaptive organization Addressing Algebra, relational Anticipatory staging Area (CODASYL) Associative memory Associative m emory , software Balanced tree Binary relation Binary search Binary tree search Binding Block search Bucket Bucket-resolved index Calculus, relational Candidate key Cell Cellular chains
5 10 249 16 1 46 1
124, 126 (Box 11.2) 5 16 52 1 81 396 25 1 407 180 250 15 367 16 1 153 9 245
Cellular inverted list Cellular multilist Cellular splitting Cliain
359
Child-and-twin pointers Circular file CODASYL C O D A S Y L D a t a Description Language CODASYL set Compaction of data Complex mapping Complex plex structure Compression of key Control area
322
Coral ring Cycle ( data structure) Cylinder
24 1
DASO ( Direct-Access Storage Device) Data administrator Data aggregate
356 285 227 495 1 12 1 12 1 15 433 59 91 369 277 93 9
9 29 12 xv
xvi
Data bank Data base Data-base m anagement system Data com paction Data Description Langu age Data dictionary Data independence, logical Data independence, physical Data independence, dynamic and static Data i tem Data manipulation language Data set DB DC ( D ata-Base-DataCommunications) Demand staging Device/Media Control Language Dictionary, data Directory Distributed free space DL/I Domain Dynamic block allocation Dynamic data independence Em bedded versus directory pointers Entity Entity identifier Executive systems Extent File File, inverted ( an d partially inverted) File organization Flat file Functional dependence Full functional dependence Global logical data base description Hashing Hierarchical file Hierarchy of storages
I ndex of Basic Concepts 14 19
Hit ratio Homogeneous data structures
460
Huffman code
446
87
66 433
Independence, data
100
Index Index, bucket-resolved Index chains Index , inverted system
186 36 36
36 253 367 362
479 (Box 33.1)
15
Index, Index, Index, Index,
69
Index, secondary Indexed sequential file
181 12 73
461 75 186 255,326
marked occurrence range record-resolved
Indirect index Information system Instance of a schema Intersection data Inverted file
274 129 151
369 479 369 367 345 251 357 191 54 136
50, 479 (Box 33.1)
Inverted list ISAM
274
Join of relations Julian date
434
50
498 181
225 44 48
Key com pression Key, primary Key, secondary
161
369 49 49
194 15 13 479 ( B ox 33.1) 29 46 169 171
57
Language, Data Description Left list lay out LFU ( least frequently used) replacement List List, inverted Logical d ata-base record Logical data description Logical D B D (in D L/ I ) Logical data independence Logical data organization Logical versus physical organization
449
454 227 50 131 29 146 31 29 12, 200-201
( B ox 17.1)
260 83
100 317
Look-aside buffers Loop
419 93
Index of Basic Concepts
LRU ( least recently used) replacement Maintenance Managemen t system , database Mapping, simple and complex Marked index Migration, data Module M ultilist chain M u ltilist organization M ultilist, parallel cellular Multiple child pointers Multiple key ret rieval Ne twork structure Normal form, second N o rmal form, third N ormalization
454 268 66 57 369 38 9 245 351 356 322 52
172 175 150
23.2)
Physical data base description Physical data description language Physical data indepe n dence Physical, DBD (in D L/I) Physical record Physical storage organization Plex structure
Pointer Projection of relations Primary key Privacy
219 159 49 34
Randomizing
260
Range index Record Reference stream Relation Relation, binary Relational algebra Relational calculus Relational data base Replacement algorithm Ring structures Root
369 13 453 150 396 161 161 150 454 237 79
89
Objectives of d ata base 40 (Box 4.1) Occurrence index 479 Operations system 191 Organization, file 29 Organization, global logical 29 Organization, physical 29 Overflows 280 Overflows in hashing 296 (Box
Paging Parallel cellular chains Parallel cellular multilist Parallel cellular organizations Parsing Partially inverted file
xvii
453 245 356 210 375
479 (Box 33.1) 57 75 36 140 14 29 89
Schema Schema language Search con trailer Second normal form Secondary index Secondary key Security Segment Sensitivity Sequence set index Sequence set operations Set, CODASYL
53 103 511 172 345 49 34 13 140 277 415
115, 123 (Box l l .1)
Set, singular (CODASYL) Simple mapping Simple plex structure Skip-searched chain Sparse index Stack distance Stack replaceme n t algorithms Stacks (in hierarchical storage) Staging Static data independence Storage hierarchy Subschema Subschema language Success function Supervisory system Symbolic pointer
118 59 91 240 369 462 461 462 457 181 449 55 103 461 194 220
xvi ii
Third normal form Transitive dependence Transparen t data Tree Tree index, balanced Tree index, u nbalanced Tree search, binary Triad Tuple
I ndex of Basic Concepts 175
Uncommit ted storage list
499
174 16 79
415 421 407 399 48
Virtual data Virtual storage Volatility Volume VSAM
16 449 203 9 277
PROLO GUE
1
INTRODU C'T IO N
The development of corporate data bases will be one of the m ost important data-processing activities for the rest of the 1 97 0s . Data will be increasingly regarded as a vital corporate resource which must be organized so as to maxim ize their value. In addit ion to the data bases within an organization, a vast new demand is growing for data-base services, which will collect, organize , and sell data. Frost and Sullivan Inc. estimate that this demand will exceed $ 1 . l billion per year i n the United States by 1 9 78 [ 1 ] . The files of d ata which computers can use are growing at a staggering rate . The growth rate in the size of computer storage is greater than the growth in size or power of any other component in the e xploding data-processing industry . The more data the computers have access to, the greater is their potential power . I n all walks of life and in all areas of industry , data banks will change the realms of what it is possible for man to do. I n centuries hence, historians will look back to the coming of computer data banks and their associated facilities as a step which changed the nature of the evolution of society , perhaps eventually having a greater effect on the human condition than even the invention of the printing press. Some o f the most impressive corporate growth stories of the generation are largely attributable to the e xp l osive growth in the need for information . Alre ady , as we m ove to an increasingly information-oriented society, about 20% o f the U . S . Gross National Product is devoted to the collection , processing, and d issemination of information and knowledge in all its various forms ( 2 ] . The vast majority of this information is not yet computerized. However, the cost of data-storage hardware is dropping m ore rapidly than other costs in data processing. It will become cheaper to store data on computer files than to store them on paper. Not only printed information 2
Chap. 1
I n troduction
3
will be store d . The computer industry is improving its capability to store line drawings, data in facsimile form, photographs, human speech, etc. In fact, any form of information other than the most intimate communications between humans can be transmitted and stored digitally. The falling cost per bit is related to the fact that increasing quantities of data are being store d . Figure 1 . 1 shows how the capacities of on-line data files have grown (on-line means that the data can be read d irectly by the computer, usually in less than a second, without any human intervention such as loading a tape). The curve shows the maximum amount of storage found on the l argest working commercial systems at any time. Note that the vertical scale is logarithmic, not linear, indicating that the grow th is becoming faster at a very rapid ra te .
1
I
M
"' Cl "'
£"' "'
.0 ·v; "'
"' u u "' > ;:; u
I
I
I
1 00,000
1 0,000
�
"O
0 � ii
1 000
0
"' c 0
::2
10
1 950
1 9 55
1 960 1 965
1 970
1 97 5 1 980
Year
Figure 1 . 1
Capaci ty of on-line data files d irectly accessible by the computer. The curve represe:1ts maximum amount of storage likely to be found on large systems in each area. ( N ote that the vertical scale does not ascend in steps of equal magnitude. E ach major division represents a tenfold increase . )
4
Prologue
I
I I I
1 ,000,000
..... "'
100,000
0 "O
Q) c. Q) Cl .....
"'
1 0,000
5 .... Q) c:
"'
c
0
1000
0
-
"' ....
ii 0
-
1 00
-lS c:
ll!
:J 0 .c: I-
10
1950
Figure
1955 1 960
1 965 1970 1 975 1980 Year
1.2 The drop in cost of on-line storage.
As the capacities go u p , the costs per bit of the storage in Fig. 1 . 1 come dow n ; this is shown in Fig. 1 .2 . The costs in Fig. 1 . 2 refer to the storage devices and media that are use d , and this cost is divided by the number of b its on-line. I f the costs were divided b y the number of bits w hich could be loaded onto the storage devices from the tape or disk library , the co sts would b e a fraction o f those in Fig. 1 .2 . The largest tape and disk libraries at any point in time have typically contained an order of magnitude more data than that shown on-line in Fig. 1 . 1 . The curves in Fig. 1 . 1 and 1 .2 will continue their upward trend , p ossibly with the same e xponential growth rate. New storage technologies that are appearing suggest that the e xponential growth could continue for a decade or two if we can find suitable applications for it, and it is fascinating to reflect u pon the implications of this for data-base design, software requirements, and the way data will be used in industry .
Chap. 1
I n troduction
5
There are two main technology developments likely to become avail able in the near fu ture. First, there are electromechanical devices that will hold much more data than disks but have much longer access time. Secon d , there are solid-state technologies that will give microsecond access times but capacities smaller than disk. Disks themselves may be increased in capacity somewhat. For the longer-term future (7 years, say ) there are a number of new technologies which are currently working in research Jabs which may replace d isks and may provide very large microsecond-access-time devices. A steady stream of new storage devi ces is thus likely to reach the marketplace over the next l 0 years, rapidly lowering the cost of storing data. Given the available technologies, it is likely that on-line data bases will use two or three levels of storage , one solid-state with microsecond access times, one electromagnetic with access times of a fraction of a secon d , and one electromechanical with access times which may be several seconds. I f two, three , or four levels of storage are used, physical storage organization will become more comple x , probably with paging mechanisms to m ove data between the levels. Solid-state storage offers the possibility of parallel search operations and associative memory . Both of these demand data organization techniques differen t from those in today's software. Both the quantity of data stored and the complexity of their organization are going u p by leaps and bounds. The first trillion-bit ( l ,000,000,000,000 bits) on-line stores are now is use. I n a few years' time, stores of this size may be common, and stores l 0 times as large may be emerging. To make use o f the huge quantities of data tli.at are being stored, two system facilities are needed in addition to the storage. These are data transmission -the ability to access the data base from remote locations where . its information is needed, by means of telecommunications-and man-com p u ter dialogues , which enable the users to make inquiries, browse in the files, modify the data store d , add new data, or solve problems which use the data. Both o f these subjects are as comple x in their own right as is the design of the data base i tself. This book d eals only with the organization of the data. The other two topics are dealt with in the author's related books. In all three areas the designer is confronted with a complex array of alternatives. The more alternatives he can consider in a rational fashion , the m ore likely he is to produce an optimal design. Many of the poor designs of d ata-base systems (and there are many) result from a designer considering only certain of the alternatives. The majority of system s analysts have a limited range of knowledge and are sometimes enthusiastic about particular techniques which they understand well to the e xclusion of others , some of which might be better.
6
Prologue
A particularly important consideration in data- base design is to store the data so that they can be used for a wide variety of applications and so that the way they are used can be changed quickly and e asil y . On computer installations prior to the d ata-base era it has been remarkably difficult to change the way data are used. Differen t programmers view the data in different ways and constantly want to modify them as new needs arise. Modifi cations, however, can set off a chain reaction of changes to e xisting programs and hence can be exceedingly e xpensive to accomplish. Conse quently, data processing has tended to become frozen into i ts old data structures. To achieve the fle xibility of data usage that is essential in m ost com mercial situations, two aspects of d ata-base design are importan t . First, the data should be independent of the programs which use the m , so that they can be added to or restructured without the programs being changed . Second, it should be possible to inte rrogate and search the data base without the lengthy operation of writing programs in conve ntional programming languages. Data-base query languages are used. The j ob of designing a data base is becoming i ncreasingly difficult, especially if i t is to perform in an optimal fashion. The software is becom ing increasingly elabora te , and its capabilities are often misunderstood, m isuse d , or n ot used t o advantage. There are m any d ifferent ways in which d a t a can be structured, and they have d ifferent advantages and d isadvantages. Not the least of the complicating factors is that there are so many different types of data needing to be organized in differen t ways. Differen t data have different characteristics, which ought to effect the data organization , and differen t users have fundamentally d ifferen t requirements. The needs are suffi ciently diverse that, o ften , no one data organization can satisfy all of them-at least with today's hardware . Hence, the designer s teers a delicate course through compromises. Given the falling cost of data storage and the increasing capability to transmit data, i t is clear that data banks will have a m ajor part to play in the running of industry . It is a formidable task to iden tify all the data items that are needed for the running of a corporation and to work out where and how they can best be recorded and stored. Today there is m uch red undan cy in the data that is used in organizations, and the same item of data is often defined slightly differen tly by different groups. It is often true that when computers are installed they reveal how vague was the earlier thinking or h ow imprecise the previous methods. Cleaning up the imprecision in the way data i� defined and used must go hand in hand with the design and stage-by-stage integration of data bases. This is one of the m aj or tasks in the development of data processing in the years ahead. Data d ictionaries defining
Chap. 1
7
Introduction
all the data items that are in use will be built up in corporations. The definitions will have to be agreed upon between one department and another. Many different ways of organizing the data items will be employed in the vast data stores. Frequently , one sees systems today in which data-base design decisions were made in a shortsighted manner. Indeed, in many situations straight forward design calculations do not provide the answers because there are intricate trade-offs between one aspect of the design and another-trade-offs, for e xample , between storage utilization and time utilization , between response time and complexity of data structure, between d esign which facilitates unanticipated inquiries and design for well-defined operational require ments. The trade-offs which involve user psychology are subj ective. They can be m ade confidently only by a systems analyst who is experienced , and probably also well read, in the art of designing man-computer d ialogues. There are many interlocking questions in the design of data-base systems and many types of technique that one can use in answer to the questions-so many , in fact, that one often sees valuable approaches being overlooked in the design and vital questions not being asked. It is the inte n tion of this book to make the reader familiar with the many alternatives p ossible in data organization and with the trade-offs between them . There will soon be new storage devices, new software techniques, and new types of data bases. The details will change, but most of the principles will remai n . Therefore , the reader should concentrate on the principles. The systems analyst must be able to adap t the techniques i llustrated to his own needs.
R E F E R E N C ES
1 . Markets for Data-Base Services, Frost and Sullivan Inc.,
2. Ibid.,
p.
11.
New
York, July 1 973.
2
BASIC TERMI N O LOGY
The terminology used for d escribing files and data bases has varied substantiaJly from one authority to another and even from one time t o another i n t h e same organization . I t is necessary that a consistant terminology b e used throughout this book , and it would help if the industry terminology became consistent . I n this chapter we will describe the terminology we will use , which has been taken, where possible, from the m ost widely accepted sources. The knowledgeable reader m ay skip the chapter with a glance at the figures. We will describe the wording used to d escribe the hardware first and then the data. Figure 2 . 1 shows wording used to describe the fi le hardware .
H A R DWAR E T E R M I N O LO G Y Secondary storage device
rj
Removable volume
_ _ _ _ _ _ _ _ _ ____ _
M od u l e ------------------,
J-�.....__
I l I I
I I I I I I I I I
I
L------------
Cyl i nder
Track
I
I I I I I I I I I I I I I I _____}
Physical record
Figure 2. 1 W ording which describes the file hardware . 8
Chap. 2
Basic
Terminology
9
Peripheral ( or Secondary) Storage Device
Because the main memory of a computer is relatively small , most data are stored in storage devices connected to the computer b y means of a channel. The latter are referred to as peripheral or secondary storage devices. They i nclude tape and d isk units, drums, and d evices on which data are stored in demountable cells or cartridges. Volu mes
The demountable tapes, d isk packs, and cartridges are referred to as volumes. The word volu me also refers to a drum or o ther nondemoun table storage medium . A volume is normally a single physical unit of any peri pheral storage medium . It has been defined as " that portion of a single unit of storage medium which is accessible to a single read/write mech anism"; however, some devices e xist in which a volume is accessible with two or m ore read/write mechanisms. Module
A module of a peripheral storage device is that section of hardware which holds one volume , such as one spindle of d isks. Di rect-Access Storage Device
A direct-access storage d evice (sometimes abbreviated to DAS D ) is one in which each physical record has a d iscrete location and a unique address. Disks and drums are d irect-access devices; tape units are not . Records can b e stored on a d irect-access device in such a way that the location o f any one record can be d etermined without e xtensive scanning of the recording mediu m . Records can be read or written d irectly, at random , rather than having to be read or written in a fixed serial sequence. Track
A t rack on a d irect-access device contains that data which can be read by a single reading head w ithout changing its position . It may refer to the track on a drum or disk which rotates under a reading head . Cylinder
An access m echanism m ay have many reading heads, each of which can read one track . A cylinder refers to that group of tracks which can be read without moving the access mechanism . Cell
Cell is used as a generi c word to mean either a track, cylinder, m od ule ,
10
Prologue
or other zone delimite d by a natural hardware boundary such that the time required to access data increases by a step function when d at a e x tend beyond the cell boundaries. I f the cells are such that data can be read from more than one cell at the same tim e , we will refe r to them as parallel cells. This use o f the word cell will appear in the names of techniques discussed later in the book , such as cellular multilist.
DATA
Logical and Physical
T E R M I NOLOGY
Descriptions of data and of the relationships b etween data are of one of two forms : logical or physical. Physical data descriptions refer to the manner in which data are recorded physically on the hardware . Logical data descriptions refer to the manner in which data are presented to the application programmer or user of the data. The words logical and physical will be used to d escribe various aspects of data, logical referring to the way the programmer or user sees it, and physical referring to the way the data are recorded on the storage medium . A physical record may contain several logical records in order to save storage space or access time. The structure of the data and the associations between the d ata may b e d ifferent in the programmer's view of the data and in the p hysical organization of the data . We use the terms logical relation ship, logical s truc ture, and logical data descrip tion to describe the program mer's or user's view. Physical relationship, physical struc ture, and physical data description describe actual ways in which the data are stored. Figure 2 . 2 gives a simple example o f difference between a logical and physical structure : physical records on a d isk contain shorter logical records which are chained together. The program requires a file of logical records i n the sequence of the chain . He does necessarily know about the chai n . The software presents his program with logical records in the req uired sequence. Other programs m ight b e given records in a differen t sequence . There can be many other types o f differences between the logical and physical organiza tion . The reasons the logical and physical views of the data are different will be d iscussed late r in the book. I t is a function of the software to convert from the programmer's logical statements and descrip tions to the physi cal reality . There are many a lternate words used for describing data. A widely accepted authority on data bases that is not associated with a specific computer manufacturer is CODASYL , and this book uses their wording where possible-in p articular the wording of the CODAS Y L Data Base Task Group.
WO R DS DESC R I B I N G DATA
I n ter-
LOGICAL R ECOR D LAYOUT
The software does the conversion
Record Z Record D
Figure 2.2 A n example of the difference between physical and logical
data organization.
Prologue
12
It is not always possibl e . Some concepts, such as the I BM concept of a "segment," do not have an exactly equ ivalent CODASY L word , and rela tional data bases need a vocabulary of their own, so som e non-CODASYL vocabulary app ears. The choice of the CODASY L wording is merely to achieve uniformity and hence clarity i n the text ; i t does not n ecessarily imply preference for the Data Base Task Group ( DBTG ) languages or techniques. Figure 2 .3 shows the words used to describe the application pro grammer's view of the data. Byte
A byte is the smallest ind ividually addressable group of bits-conven tionally e ight bits. Data I tem
A data i te m is the smallest unit of named d ata. It may consist of any number of bits or bytes. A data item is often referred to as a field or data elemen t. In COBOL i t is called an elemen tary item. Data Aggregate
A data aggregate is a collection of data items within a record, which is given a name and referred to as a whole . For example, a data aggregate called DATE may be composed of the data items M ONTH, DAY, and YEAR. A data aggregate is called a group or group item in COBOL. There can be two types of data aggregates: vectors and repeating groups. A vector is a one-d imensional, ordered collection of data items such
F i le
L:
�---__.,---R ecord
Data item , f i e l d , data element (elementary item in C O B O L )
Data -aggregate, group o f fields, segment (group item i n C O B O L )
Figure 2 . 3 W ording which describes the application programmer's view o f the data .
Chap . 2
Bas i c Terminology
13
as DATE above . A repeating group is a collection of data which occurs multiple times within a record occurrence, for example, deposits and withdrawals in a savings bank account record. A repeating group may consist of single data items, vector data aggregates, or other repeating groups. Record
A record is a name d collection of data items or data aggregates. When an application program reads data from the data base it may read one complete (logical ) record . Often, however, logical data-base record refers to a data structure which incorporates multiple groups of data items (segments), not all of which need be read a t one time. There is no upper limit to the number of possible occurrences of a particular record type (given sufficient hardware) , whereas there is normally an u pper limit to the number of re peating groups within a record . Segment
I t is the view o f some authorities that there is no need to d ifferentiate between a data aggr egate and a record because each is a collection of data items. Each is referred to as a segmen t i n terminology used by I BM and others. A segment contains one or more data items (usually more ) and is the basic quan tum of data which passes to and from the application programs under con trol of the data-base m anagemen t software. File
A file is the named collection of all occurrences of a given type of (logical) record . I n a simple file every logical record has the same number of data i tems, as in Fig. 2 . 3 . I n a more complex file the records m ay have a varying number o f data items because o f the e xistence of repeating groups. Dat a Base
A data base is a collection of the occurrences of multiple record types, containing the relationships between records, data aggregates, and data items. I n the next chapter we will discuss the nature of a data base. Dat a-Base Syst e m
I n most systems the term data base does not refer to all the record types but to a specified collection of them. There can be several data bases in one system ; however, the contents of the different data bases are assumed to be separate and disj oint. A term is needed for the collection of data bases, and data-base sys tem is use d .
Prologue
14
Data ban k sometimes refers to a collection of data bases. Other authorities interchange the meanings of data base and data ban k, saying that a data base is a collection of data banks. The term data ban k is often used in the literature in an impre cisely defined fashio n . To avoid confusion, the term data ban k will not be used i n this text.
The form in which the data are stored physically is often quite d ifferent from their logical form . The reasons for the d ifferences will be clarified later in the book . Figure 2 . 4 shows th e wording used to describe the p hysical storage of data.
PHYSICAL STO R A G E
1=rI\1
I
Physical record ( b l ock of stored records) I
I
{
Stored record
/ �
I
/ Data-item--.---+Data-aggregate, segment
D
I
,)......__ In this data set / the records are not physica l l y \ \ contiguous but I are scattered within one or ,' mor:e vol u mes
�:
�
Data sets
data base system
Figure 2.4 Wording which describes the stored d ata. Phys ical Record
A physical record is that basic unit of data which is read or wri t ten by a single input/output command to the computer. It is those data which are recorded between gaps on tape or address markers on disk. One physical record often contains multiple logical records, or segments. On most systems the length of a physical record is determined by the system programmer; on some devices it is of fixed length.
Chap. 2
Basic Termi nology
15
Block of Stored Records
The group of data which com prise a physical record is referred to as a block. Extent
An e xtent is a colle ction of physical records which are contiguous in secondary storage . How many records are in an extent depends on the physical size of the volume and the user's request for space allocation. Associated records are not n ecessarily stored contiguously; this dep ends on the storage organi za tion .
Data Set
A data set is a named collection of physical records. It includes data used for locating the records, such as indices. A data set may occupy one or more e xtents . It may be contained within one volume or spread over many volumes. Bucket
Some addressing and indexing techniques provide as their output the address of a stored record . Others are less precise and provide the address of an area where a group of records is store d . We will refer to an area holding a group of records which are addressed jointly as a bucke t. The bucket could b e a physical record , a track, or a cell, but often it is a grouping determined by an addressing technique such as hashing and not necessarily rel ated to the hardware. Some authorities use other words for this grouping, such as pocket or slot. Different data-base software systems employ differen t words to describe the data. Figure 2 . 5 shows equivalent words used in some of the most common software. This text uses the wording on the top line of Fig. 2 .5 e xcept when describing a specific software product. A data item m ay be a quantity on an invoice. A data aggrega te may be a line item on the invoice which is repeated multiple times and is hence a repeating group. The line item may be an individually addressable group of data in I BM (or other) software which refers to it as a segm ent. The logical record m ay be the enti re invoice . The physical record m ay be as many l ogical records as can fit on one track. The data set may be the en tire SUMMA R Y
16
P rologue
The word i ng of CODASY L Word ing commonly used for non- � u..
.... a.> (.'.)
c. a.> a.> :..::
.... .... a.> "' c
01 02
02
Data-items inval id or inconsistent
04
04
Violation of D U P L I CATES NOT A L LOWE D clause
05
06
06
End of set or area
a.> 0 ci5 01
01
Database-key i nconsistent with area-name
Current or set, area, or record-name not known
a.> > 0 E a.> a:
c a.> c. 0
02 05
05 06
07
Referenced record or set-name not in su b-schema
08
I ncorrect usage mode for area
09
Privacy breach attempted
10
10
10
09
09
10
10
10
09
09
09
10
10
10 11
Media space not ava i l able
12
Database-key not avail able No current record of run-unit
13
13
13
13
13
13
13
15
Object record is mandatory i n named set Record a l ready a member of named set
16
Deleted record involved I m p l icitly referenced area not avai l able
17 18
18
Conversion of va l u e of data-item not possi ble
18
18
20
20
20
Affected area not open
21
20
21
18 19
19
19
Current record of run-u n i t not of record-name
18 20
21
21
Record not current member of named or i m p l i ed set
22
22
22
I l l egal area-name
23
23
23
25
25
N o set occurrence satisfies argument values
28
Area al ready open
29
Violation of optional deadlock protection ru l e Unqualified D E L E T E attempted on non-empty set
30
Removed record i nvolved
50
50
Del eted record i nvolved
51
51
Value of string data-item truncated i n program work area
54
Figure 7 . 6 Typical error-condition codes w hich t h e data-base manage ment system returns t o the a p plication program run-unit (event I 0 in Fig. 7. I ). (Reproduced from Reference I . )
A variety of error conditions c an occur · when a data-base management system attempts to execut e the commands which an application program gives i t . It will return an error code to the application program to inform it of the status. Figure 7.6 shows some typical error codes for the above commands. (R un-unit in F ig. 7 . 6 is the CODASY L word for a single ERROR
CO N D I T I ON S
21
78
Logical Organization
Part
I
application program execution or task. A rea is a named grouping of records which could be independent of the schema . ) R E F E R E NCES
I . James Martin, Security, A ccuracy and Privacy in Computer Systems, Prentice-Hau, Inc. , Englewood Cliffs, .J . , l 974. 2.
Th e proposed CO DASY L specifications for a Data Manipulat ion Language: Proposal DBL TG-7300 I ,00 , avail able from Technical Services Branch , Dept . of Supply and Service , 88 Metcal fe Street , Ottawa, Ontario, Canada K I A05 5 .
3 . R. W . Engles, "An Analysis o f the April 1 97 1 Data Base Task Group Report," in Proceedings of the A CM SIGFIDET Workshop on Data Description, A ccess and Control. ACM (Association for Computing M achinery ) New York, London, and Amsterdam 1 97 2 . 4 . Information Management System Virtual Storage (IMS/VS) General Information Manual GH20-1260, I BM , Wh ite Plains, N . Y. 1 974 . 5 . Data Base Management System Requiremen ts, A report of the J oint GU I DE-SHARE Data Base Requirem ents Grou p , ov. 1 9 7 1 , available from G U I DE or SH ARE , ew York. 6.
EDP Analyzer, March 1 97 2 , The Debate on Data Base Management ( the whole issu e ) , Canning Publications , I n c . , California.
7. EDP Analyzer, February 1 9 74, Th e Curren t Status of Data Management ( the wh ole issu e ) , Canning Publ ications, I nc . , C alifornia. 8.
EDP Analyzer, March 1 974, Problem A reas in Data Management ( the whole issue) , Cann ing Publications I nc . , Californi a .
8
T R E E STR U C T U R E S
In this chapter a n d the next w e will discuss the types o f structures that are found in data-base relationships. We will then i llustrate how these structures can be described in formal lang uages. We described the data layout in Fig. 5 . 2 as a flat file. Each record has a similar set of fields, and hence the file can be represented by a two-dimen sional matri x. Many logical file structures are used which are not "flat . " They are described with words such as hierarchical files, CODA S YL sets, tree structures, and plex struc tures . All these types of structure can be classed as either trees or networks. We will d iscuss trees in this chapter and networks in the following chapter. In chapters 24 and 25 we will discuss the physical representations of trees and networks. It may be noted before we begin discussing trees and networks that these more complicated file structures can be broken down into groups of flat files with redundant data items. As we will see later, trees and networks may not be the best m ethods of logical representation of a data base . However, they are the methods in most common use today.
Figure 8 . 1 shows a tree. A tree is com posed of a hierarchy of e lements, called nodes. The upper most level of the hierarchy has only one node , called the roo t. With the exception of the root, every node has one node related to it at a higher level , and this is cal led its paren t. N o element can have more than one parent. Each element ca n have one or more elements related to it at a lower leve l . These are called children. (The terms father and son nodes were used in the days be fore women's liberatio n . ) E lements at the end of the branches, i .e . with TR EES
79
Part I
Logical Organization
80
Level 1
Level 2
Level 3
Level 4
Figure
8. 1
A t ree: no element has more than one parent.
no children, are called leaves. (The computer industry likes to mix i ts metaphors . ) I n Fig. 8. 1 , element 1 is the roo t. Elements 5 , 6 , 8 thro ugh 1 2 , a n d 1 4 through 2 2 are leaves. Trees are normally drawn upside down, with the root at the top and the leaves at t he bottom . Trees, such as that i n Fig. 8 . 1 , are used i n both logical and physi cal data descript ions. In logical data descriptions they are used to describe relations between segment types or record types. In physica l data organizations they are used to d escribe sets of pointers and relations between entries in indi ces. A tree can be defined as a hierarchy of nodes with binodal relat ionships such that 1 . The highest level in the hierarchy has one node called a root. 2. All nodes except the root are related to one and only one node on a higher level than themselves.
Knuth [ l ] defi nes a tree with a recursive definition as fol lows : "a finite set T of one or more nodes such that l.
There is o ne specially designated node called the root of the tree.
2. The remaining nodes are partitioned into m ;;;. 0 disjoint (i.e . , not connected) sets T1 , . . . , Tm , and each of these sets in t urn is a tree. The trees T1 , Tm are called the subtrees of the root . " •
•
•
,
81
Tree Structures
Chap. 8
A node of degree 4
/
6
A family of
dimension
___
17 ) '
/
Level 4 has a count of 5
This tree diagram has
height
weight
radix 1
Figure 8.2
4
� 22
16
(number of levels) (number of nodes) (number o f leaves) (number of roots)
Terms used for describing trees.
Knuth claims that d efining trees i n terms of trees seems most appropriate since recursion is an innate characteristic of tree structures. Any node can grow a subtree, and its nodes in turn can sprout , just as buds in nature grow subtrees with buds of their own , etc. Figure 8 . 2 shows the common terms that are used in describing trees.
The term ba lanced tree is sometimes used. I n a balanced tree each node can have the same number of branches, and the branching capacity of the tree is filled starting at the top and working down, progressing from left to right in each row . Figure_.... 8 . 3 shows ba lanced and unbalanced trees. It is somewhat easier to i mplement a physical data organization for a tree with a fixed number of branches than for one with a variable n umber. Most logical data organizations, however, do not fit naturally into a balanced tree structure but require a variable number of branches per node. I ndices and search algorithms can fit naturally into balanced tree structures, as w ill be discussed in Chapter 30. BALANC E D A N D
B I N A R Y TR E ES
Part
Logical Organ ization
82
BALANCED T R E ES
I
U N B A LANCED T R E E S
Figure 8. 3
A special category of balanced tree structure is one which permits up to two branches per node. This is called a b inary tree. Figure 8 . 4 shows an unbalanced b inary tree . Any tree relationship ca n be represented as a binary tree in the manner shown in Fig. 8 . 5 . A few logi cal data organizations fit naturally into b inary tree structures. A dog's pedigree , for example, could be represented as a binary tree . Binary trees, like other balanced trees, are mainly of interest in the physical representation of data, not the logical representation.
Chap . 8
Figure 8. 4
Tree Structures
83
A binary tree (unbalanced ).
Figures 8 . 6 to 8 . 9 show examples of tree struc tures, the nodes being data aggregates, segments, or records. Figure 8 . 6 shows a family tree. It can only be described as a t ree struc ture because each i tem is shown as having only one parent . I f two paren ts were shown it would be a m ore complex stru c ture . A tree struc ture usually impl ies that t here i s simple mapping from child to parent ( i . e . , a child has one paren t ) and that the inverse map is complex ( one-to-many ), as in Fig. 8 . 7 . Figure 8 . 7 shows a schema and an instance of that schema for a simple two-level tree stru cture . Occasionally there is a simple mapping in both directions as in Fig. 8 . 8 , where the tree struct ure relates t o records concerning the same entity, which are stored separately. S I M P L E AND
COMP L E X MAPP I N G
The term hierarch ical file refers to a fi le with a tree-structure relationship between the records. Figure 8 . 7 shows a master-detail file -a common type of hierarchical file with two record types. Figure 8.9 shows a four-level hierarchica l file. Some data-base software is designed to handle only flat files and hierarchica l files. This is satisfactory for many applications, but as we will see in the next chap ter some important data structures are not of tree form in that one record type can have more than one parent. Hence, software designed solely to handle flat and hierarchical files is limited in its capa bility. H I E RA R CH ICAL F I L ES
•
Heir pointers
Twin
Figure 8.5 Any tree relationship can be represented b y a binary tree in which each element can have an heir and a twin pointer.
84
Edward I l l K i ng died 1 377
I
.T
Edward The B l ack Pri nce " .,
I
.. f
Lionel Duke of Cl arence ., ""
R ichard 1 1 King died 1 399
Roger Mortimer Earl of March died 1 398
I I
Phi l l i ppa
I Henry V
K i ng died 1 422
I
I
John of Gaunt D u k e of Lancaster ., .• 1 399
Henry IV K i ng died 1 4 1 3
I
I John
Duke of Bedford died 1 435
John Beau fort Earl of Somerset died 1 4 1 0
I
I Humphrey
D u k e of G l oucester died 1 445
Edward D u ke of York k i l led 1 4 1 5
I
John Duke of Somerset died 1 444
Figure 8. 6 A tree: each item in a tree has only one pare n t . If two parents were shown for each person, it would be a plex structure. Note : this figure shows a homogeneous tree structure of variable dep t h , whereas the following figures show heterogeneous structures o f fixed depth. Different techniques of physical representation are applicable t o homogeneous structures.
8l
I
Thomas Duke of G loucester ., .. 1 391
Edmund Duke of York .,., 1 402
I
R ichard Earl of Cambridge executed 1 4 1 5
I
Edmund Duke of Somerset died 1 45 5
AN I NSTANCE O F TH E SC H E M A :
SCH E M A : Bank customer master record NUMBER
NAME
B A LA N C E
Deta i l record T R ANSACT I O N 1
Figure 8. 7
T R ANSACT I O N 3
T R ANSACT I O N 2
A hierarchical file with only two record-types.
Balance sheet
Figure 8.8 A tree with one-to-one mapping.
Department D E PT
#
D E PT N A M E R E PO R TS TO M A N AG E R
BUDGET
E mpl oyee E M P LO Y E E #
E M P LO Y E E NAME
D E PT #
SEX
S A LA R Y G R A D E LOCAT I O N
Job JOB # J O B D E SC R I PT I O N Job h istory JOB DATE T I T L E
Ch i l dren ������ CH I L D CH I L D CH I L D NAME AGE SEX
Salary h istory S A L A R Y DATE SALA R Y Figure 8. 9
86
A schema for a multilevel hierarchical file.
87
Tree Structures
Chap . 8
Department
Job
E m p l oyee
Contract
A D BTG se t ( C O D A S Y L d efi n i t i o n ) i s a t w o-level t ree of records. The pare n t record is referred to as the o wn e r ; the children records are referred to as members o f the se t . Figure 8. 1 0
The CODASY L Data Base Task Group makes much use of a relationship called a set. A set is a two-level tree of records. A file of sets all of the sam e type is a two-level hierarchical file. The parent record type is referred t o as the owner record type and the children as member record types . F igures 8 . 7 and 8 . 1 0 both show sets. Each set type is given a name . A multilevel hierarchical file can be regarded as being composed of multiple sets. Figure 8.9 can be regarded as being composed of three , four, or five sets. We will discuss CODASYL sets in detail in Chapter 1 1 .
The family tree of Fig. 8 . 6 is fundamentally different in structure from the trees in the subsequent figures. Each node of the family tree could be of t he same record type . [n the other diagrams each node is a different record-type (or segment-type) . Fig. 8.6 thus shows a homogeneous tree of variable depth, whereas the subsequent figures show heterogeneous trees of fixed depth. Most data-base software is designed to handle heterogeneous trees of stated depth. A different physical represe ntation could be used to represent a homogeneous tree . The d istinction between homogeneous and heterogeneous structures i s im portant in the next chapter as wel l, where p lex structures are discussed . An important examp le of a homogeneous p lex structure is a b il l-of-materials data base used in a manufacturing operation, showing the components and sub co m ponents in each product , as illustrated in Figure 9 . 1 3 . HOMO G E N E O US STR U CT U R ES
88
Logical Organ i zation
Part
I
R E F E R E NCES
I . Knut h , Donald E., The Art of Computer Programming; Volume 1, Fundamental Algorithms. Addison-Wesley, Reading, Mass. , 1 9 68.
9
P L E X ST R U C T U R E S
I f a child in a data relationship has more than one parent, the relationship cannot be described as a tree or hierarchical structure. Instead it is described as a network or plex s tructure. The words ne twork and plex struc ture are synonymous. As the term network is overworked in the data communications world , we will use plex structure. Any item in a plex structure can be linked to any other item . Figure 9 . 1 shows some examples of plex structures. As with a tree structu re , a plex structure can be described in terms of children and paren ts and drawn in such a way that the children are lower than the pare n ts. In a plex structure , a child can have more than one parent. I n the first e xample in Fig. 9 . 1 e ach child has two parents. I n the second
Figure 9. 1
E xam ples of plex structures (networks). One or more nodes have multiple parents.
89
Edward I l l King died 1 377 ,
I
Edward The Rlack Prince died 1 376
L i onel D u ke of Clarence died 1 368
I
I
]PP'
R ichard I I K i ng died 1 399
I
Roger Mortimer Earl of March died 1 398
I
Anne
I
I
Edward I V K i ng died 1 483
I
E l izabeth died 1 503
Henry I V K i ng died 1 4 1 3
I
I
Henry V King died 1 422
Catherine of F rance d i ed 1 432
I R ichard Earl of Cambridge executed 1 4 1 5
R ichard Duke of York k i l led 1 460
I
Owen Tudor died 1 46 1
I
I
I
John Beaufort Earl of Somerset died 1 4 1 0
Joh n Duke of Bedford died 1 435
I
I
Edward D u ke of York k i l led 1 4 1 5
�
I
I
Edward Prince of Wales k i l led i n battle 1 47 1
I
I
1
Joh n E d m u nd Duke of Somerset Duke of Somerset died 1 455 .
Equal t o , n o t equal t o , less than, greater t han .
The literal val ue of x.
Figure 1 3. 8 Symbols used in a relational calculus.
Q
( EM P LOYEE. EMP LOYEE NAM E , EMP LOY EE . SALA R Y ) : EMP LOYEE. DEPT# = 7 2 1 /\. EMP LOYEE .SALARY > 2000
3. Using the relations P I 23 and A l 2 5 in Fig. 1 3 . 7 , produce a relation, Q, showing CONVICTION TYPE AND CONV ICT I ON LENGTH for persons whose profession is ACCOUNTANT: Q (P I 23 .CONVICTION TYPE ,P I 23 .CONV ICTION LENGTH ) : 3 A I 25 (A I 2 5 . PROFESSION = 'ACCOUNTANT' A
A l 2 5 . I DENTIFICATI ON# = P I 2 3 . I DENTI FI CATI ON#)
4. Using the relations
STUDENT ( STUD#' STU D-NAM E , STUD-DETAI LS) INST RUCTO R ( I N ST#, I NST-NAME, INST-DETA I LS ) S-I (STUD#, INST#) produce a relation showing which students are taught by every instructor : Q
(STUDENT.STUD#, STUDENT. STUDENT NAME) : '
:;:;
0
10
20
30
N u mber of random f i l e references p e r second
Figure 18. 4 I f the majority of accesses are to the file in Fig. I 8 . 3 , t hen
the q ueuing ti mes when accessing these records will be greater with the conven tional serial organization than with the paraUel cellular organiza tion .
.------ Cy l i nder
O _______�
.----+--- C y l i nder
M od u l e
0
1 -------,
Modu l e 1
Figure 1 8 . 5 A record layout designed for fast response time and random access . The system in question has ten disk modules. A file consisting of a thousand or so records is spread across two of the cylinders.
M od u l e
9
D ifferences Between Physical a nd Logical Orga n i zat ion
Chap. 1 8
213
queue for the access mechanism . This que ue is much shorter with the paral lel organ ization than with the serial organization . The total time for accessing t he record is therefore shorter with the parallel orga ni zation, as shown in the two curves. As the throughput is ra ise d , the queue for the access mechanism will be the limiting factor in the serial organizat ion . The throughput will be choked at a lower level than with the parallel organization in which the main transact ion stream is distributed between several access mechanisms. In designi ng a system it is desirable that the queuing times and maximum thro ughputs be i nvestigated with queuing theory or simulation. ( See t he author's Design of R eal-Time Computer Systems , Section Y . ) Figure 1 8. 5 shows a more detailed illustra t ion o f a file o f fi xed-length records spread across I 0 d isk drives. The file in quest ion has I OOO or so records and o ccupies between one and two cylinders. First the top tracks of the first cyl i nd er are filled , then the next-to-top tracks are filled, and so forth unti l finally the bottom tracks of the cylinder are ful l . Then the top tracks of the next cylinder are fille d . The record-addressing scheme uses an arithmetic a lgorithm which converts a seque ntial record number into t he requisite modu le + cylinder + track + record position .
I n some storages which are organized in a para l lel fashion the data-base management system divides up the storage a ccording to frequency of use of the data. Figure 1 8 .6 illustrates this technique . The most frequently used data POSI T I O N I N G BY
FR EQUE NCY O F U S E
�-- Most freq uently used records --�
(
"' c=» � )J)
Figure 1 8. 6
An organization in which the most frequently referenced records are kept together on the i n nermost cylinders . The least frequently referenced records are on the outermost cylinders . The dotted cylinders show the positions o f the access mechanism at one instant in time.
214
Physica l Organization
Part 1 1
are stored o n the innermost cylinders o f the d isks i n Fig. 1 8 . 6 . The least frequently used data are stored on the outermost cylinders. Data with an i ntermediate freque n cy of use are stored between these inner and outer zones. Most airline reservation system s operate in this manner. They have a high transaction throughput and a need for fast response times. Some records are referred to very frequently, such as the records giving the availability of seats on today's flights. Some record s are referred to very infrequently, such as record s giving details of passengers booked on flights many months hence . The file-addressing scheme automatical ly allocates the records to the appropriate group of cylinders, and the records are spread across the cylinders as in Fig. I 8 . 5 . Sometimes a system uses more than one type of storage devi ce when one type can be accessed more rapidly than another. The frequently used items will be stored on the faster-access device and the infrequently used items on the slow-access d evice . Many systems employ more than one type o f storage devi ce , the types having widely d iffering access times. The frequently-used data are stored on the faster-access d evice , and the infrequently-used data o n the slow-access device. I f an index to the data is used , the index may be on a fast-access devi ce and the data on a slower-access devi ce. When very infrequently referenced data are kept, they may be stored o n a cheap off-line serial medium such as magnetic tape, whereas the more often-used data may be o n more expensive direct-access devices. Archival records, for example, are dumpe d onto magnetic tape. I t seems likely that many co mplex systems for the near-term future will have at least three levels of storage : ( 1 ) solid-state electronic storage with a capacity of m il lions of bytes (this figure rising rapidly as the cost of large scale-integration memory drops) , ( 2 ) d isks giving capacities totaling billions of bytes, and (3) tape or direct-access cartridges having m uch larger capac · ities than disks but longer access times. Data may be moved between the different levels of storage in fixed-length pages, as in today's virt ual memory systems. The page size is a parameter of the system and is not determined by the record lengths used by individ ual program mers. Where paging is used the data will be logically independent of page size but will have to be physica l ly packaged to fit into the pages. Where m ultiple levels of storage hardware are used blocks of data, like pages, may be passed across the interfaces between the levels. This technique is called staging and is d iscussed in Chapter 3 2 . PAG I NG
Chap. 1 8
215
D ifferences Between Physical a nd Logical Organizat i o n
Where paging or staging is used , t he cha ins, rings, .and indices that comprise t he physical accessing methods, as we wi l l d iscuss, may be tailored to page sizes so as to minimize the page swapping that is necessary . A chain, for example, which straggles backward and forward across many pages can seriously d egrade the performance of a paging syste m . Often an attribute can take on one of a finite set of val ues. It is often possible to save storage space by giving each of the attribute values a binary number. The binary number is then stored in the record instead of the attribute value. I f there are N possible attribute values, then the binary number has 1 log2 N l* bits, and this is often much smal ler than the number of bits needed to store the at tribute va lue itself. A penalty of storing the attribute values in the form of binary numbers is that a further conversion is needed to obtain the actual attribute value . I f N i s small enough , t h e table for converting t he binary numbers into attribute values may be in main memory . lf it is not smal l enough for main memory , it will be in a fast-access storage device . The possibility of storing data in this manner can present a trade-off between storage space and time. A F I N I T E SET
OF VA L U E S
*When x is n o t a n integer, rtog, 1 51 = 4. Employee number
Job category
4732 3119
r x l refers t o the next largest integer. Thus r2 . 7 1
Em pl oyee number
B i ts
B i ts
Job category
Accountant
4732
001
001
Accountant
Engineer
3119
010
010
Engi neer
0604
Clerk
0604
01 1
01 1
Clerk
7230
Consultant
7230
1 00
1 00
Consultant
61 1 2
Clerk
61 1 2
01 1
101
Secretary
1 1 47
Secretary
1 1 47
101
0991
Accountant
0991
001
1 237
Secretary
1 237
101
3743
E ngineer
3743
010
5 1 50
Clerk
5 1 50
01 1
1 751
Consultant
1 751
1 00
1 296
Engi neer
1 296
010
29 1 4
Clerk
29 1 4
01 1
21 1 7
Consultant
21 1 7
1 00
LOG I C A L R E P R E SE NTAT I O N
PHYSI C A L R E P R E S E N T AT I O N
Figure 1 8. 7 A data-ite m-type with a finite set o f possible values is replaced with a bit pattern which acts as a pointer to the value i n question.
3.
Part
Physical Organ i za t i on
216
II
The conversion from a ttribute value to b inary number could be performed b y the application programmer or by the data-base management software . If it is performed by software , the logical representation of data will contain the normal representation of the attribute value, and the physical representation will contain t he binary number. This difference is illustrated in Fig. 1 8 . 7 . The binary number in this case could be regarded as a pointer to the actual attribute value. Sometimes it is necessary to store a variable-length l ist of values of the same attribute, for example, a list of the part numbers of whi ch an item is
V A R I A B LE - L E NG T H ATTR I BUTE L ISTS
Method 1 :
Variable·length l ists
Student
Cou rses
n u m ber
54381
1 77,
1 79, 1 84,
1 85, 1 87
54407
1 77,
1 78,
181,
1 83, 1 87,
54408
1 7 6,
1 84,
1 89,
191
54503
181,
1 85,
1 88
54504
1 7 8,
1 83,
1 85,
I
Method 2 :
191
I
188, 191
-----
-
I
I
I
A b i t matrix
Student
Cou rses
n u m ber
CD
54381
0
54007
0
.....
54408
..... .....
.....
en
,.....
0
0 co
a;
N co
M co
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
54503
0
0
54504
0
0
.._____
co
0 0
0
0
0
0
0
CD co
.....
en co
0 en
en
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Ln co
co co
' a. Q) c
·5 � .... t.> a. · - Q)
*
..:. :i 9 'O 'O "'
c Q) LJ.J �
*
Figure 20. 3 Master records, with chained d etail records. 231
232
Physical Organization
Part I I
be inserted at the start of the chain, thereby avoiding the need t o fol low the chain when inserting it. I n Fig. 20. 3 , because new items are added at the beginning of the chain, the chain is in reverse chro nological sequence . When a bank statement is printed or the account details are d isplayed on a terminal , the items will be read in chronological sequence , i .e . , start ing w ith the far end of the chain. It is a first-in-first-out ( F I FO) chain. Such is the case with many chains. To speed up t he retrieval process the address of the end of the chai n is stored in the record at the head of the chain . This end-of-chain pointer is shown as a dotted line in Fig. 20. 3 . The physical ch ild (last) poi nters of Figs. 1 9. 3 and 1 9 .4 serve a similar function.
A nonsequenced chain can be organized so as to avoid, where possible, lengthy seeks, such as movi ng an access arm needlessly or skipping between pages in a pagi ng mechanism . The file may be periodi cally reorga nized and the chain l inkages between newly added items reconnected to Jessen the access time . No such optimization is possible on a sequenced chain. Automatic optimization is possible on a nonvolatile chain. One form of this is referred to as perco lation and has the objective of removing frequently referenced items close to the head of the chain. Each record may contai n a percolation count , say 3 bits. The count is updated each time the record is used. When the record has been used , say , eight times it is moved up one position in the chain and its percolation count is reset to zero . To move it up, no data are moved , but the pointers linking the records are changed . Perco lat ion w i l l tend t o cut down the time t o access the most active records, and inact ive records will languish at the far end of the chain . OPT I M I ZAT I O N
When a new record is added to a sequenced cha i n , the item prior to the addi tion must have its address changed to that of the new item , and t he address of the next link in the chain will be written in the new item . To find the item prior to the addition, the po inters will be fo llowed from the start of the chain. An item can be deleted from the chain with a similar process of rel inking the pointers. A problem arises if the item to be deleted was found witho ut going to the h ead of the chain. A different type of record may have poi nted to the item which must now be deleted , or the item may have been addressed by some mea ns other than following the chain. This often happens A D D I T I O NS AND DE L E T I O N S
Chap .
20
233
Cha i ns and R i ng Structu res
Head of c h a i n 1
1.
Head of c h a i n 2
2.
5.
Figure 20. 4 I ntersecting chains. Problem : How can record 3 be deleted?
*
Part 1 1
Physical Organization
234
Chain 1 address
Chain 2
Chain 1 delete bit Chain 2 delete bit Record delete bit
Figure 2 0.5
Delete bits in a record belonging t o t w o chains.
with records having more than one key , which w e will discuss in Chapter 26. When a simple cha in such as that in Fig. 20. 1 is entered in the middle there is no way of find ing the prior link. The only way to delete the record from the chain is to set a bit in it saying that it is effectivel y deleted but to leave the record in the storage o that the string of pointers is not broken . M arking the record in this way has two disadvantages. First , storage space is taken up with deleted record s, and , second, the chain is longer than it would otherwi e be. A file in which records are deleted from chains will be periodically reorganized , the chains being rethreaded and the unwanted records remove d . Figure 20 .4 shows two chains o f records which intersect. One record belongs to both chains. This record may be deleted , either from chain 1 or from chain 2 , or it may be deleted from the file entirely. Such a record may therefore contain positions for 3 delete bits , as shown in Fig. 20 . 5 . If records can belong to many chains, as is the case in examples we will discuss in Chapter 26, a delete bit for each chain is needed in the records. The delete bits and their associated inefficiency can be avoided if it is possible to find the prior link in each chain so that the records can be relinked around the deleted record . This can be accomplished in two ways. First , the chains could be linked in both d irections. Each record would contain a reverse pointer as well as a forward pointer , as shown in Fig. 20 .6. Starting from record 3 , in Fig. 20 . 6 , all four neighbors of record 3 could b e found a n d relinke d . Two-way chains have double t h e storage overhead o f one-way chains, a n d more operations are needed when items are added t o them or deleted from them . Often data-base designers have fel t that the I NT E RSECT I N G
CHAINS
Chap. 20
235
Cha i ns and R i ng Structures
Head of chain
1
1.
Head o f chain 2
2.
3.
4.
5.
Figure 20. 6
Chains with two-way pointers.
added overhead cannot be tolerated. A more important need for two-way chains, however, is to give recovery capability, as we will discuss shortly. A second method would be to link the end of the chains back to the beginning. Record 5 , for exam ple, would point to record I . When the end of
236
Physical Organization
Part I I
Head of the ring
Figure 20. 7 A ring with pointers in one direction only . Pro blem : What happens if a record is accidentally destroyed?
a chain i s linked back to the beginning, t he chain i s referred to a s a ring. To relink across the deleted record 3 , the pointers would be followed around the entire ring un til the record prior to record 3 were found and the chains could be relinked around record 3 . This method uses little extra storage overhead but is slow because the ent ire ring m ust be followed.
Chap . 20
C ha i ns and R i ng Structures
237
\
Head of the r i ng
Figure 20.8 A ring with pointers in both d irections.
Rings have been used in many file organizations. STR UCT U R ES They are used to eliminate redundancy , as we will i llustrate in a later chapter. Figure 20. 7 shows a simple ring with one-way pointers. R I NG
238
Part
Physical Organ ization
11
When a ring o r a chain i s entered a t a point som e distance from its head, it may be d esirable to obtain the information at the head quick ly , i . e . , without steppi ng through all the intervening links. A d irect linkage to the head of the ring or chain may be provided by means of a pointer in each record , as sho wn in Fig. 20. 8 . Figure 20.9 shows a n i llustration o f rings i n whi ch pointers to the head of the rings would be useful . Records of items which customers purchase are linked to the CUSTOM ER record by a ring, and this ri ng is scanned when the computer prod uces periodic reports about what a customer has purchased . Records i n an ITEM file are chained to the records of PURCHASES, and this ring is used for producing a periodic report of which customers have purchased each item . Whenever a new purchase occurs, it is linked into both a CUSTOME R ring and an ITEM ring. The file of PU RC HASES does not contain the customer name and location, which is in the CUSTOMER record at the head of the purchases ring for that customer. Nor does it contain the item name or d escription , which is in the ITEM record at the head of t he item ring. The report of what p urchases a customer has made , however, must give t he item names and descriptions, and so the head of each item ring must be read for each item printed on the report . F inding the head of t he ring I tem file
--00 0 0 0
Customer f i l e
00000000 These rings are used for periodic reports of what each customer has .... purchased.
0-0""'--.._ These ri ngs are used for periodic reports of who has pu rchased each item.
Figure 2 0. 9
I ntersecting rings. Pointers (shaded ) are needed , as in Fig. 20.8 .
to the head
of the ring
239
Chains and R i ng Structures
Chap. 20
would be a lengthy operation without pointers to the head . Similarly , the report stating who has bought each item m ust give the customer names and locations, and so the head of the purchases ring m ust be read for every item in the item ring.
A chain o r ring can b e broken b y either a hardware or a programming fault. A d isk track may become scratche d . A program error may erase a record . A machi ne failure may occur while a pointer is being written with the result that the pointer is not written correctly. In practice, many systems have lost records because of damaged pointers. lt is general ly important that a system sho uld be able to recover from damage to pointers without records being lost . Where a pointer is the only means of accessing a record , the record is always vulnerab le to pointer damage . Some of the bank account items in Fig. 20.3 will be inaccessible if one chain link is broke n . A chain is as weak as its weakest link. When the pointers are e mb ed ded in data records there is generally no quick way of periodically d um ping them for backup purposes. The need for recovery is the strongest argument for using two-way rather t han one-way l ists. Figure 20. l 0 shows a ring with pointers in both directions. If any one record in the ring is destroyed , the other records are sti l l a ccessible, and if any one pointer is damaged , it can be reconstructed . The ring in Fig. 20 . 1 1 com b ines some of the benefits of those in Figs. 20 . 8 and 20. 1 0 but still uses only two pointers. If record 1 or any even-numbered record is destroyed , all the others can still be reached . If an odd-numbered record is lost (except record 1 ), one other record also becomes inaccessible . This structure is referred to as a coral ring. R ECOVE R Y
The main disadvantage of chains and rings is that they can take a long time to search . If there are Ne items in the chain and every item has an equal probability of being searched , the mean number of items that m ust be examined in order to find the req uired item is SEARCH TIMES
Ne L
(k
X
Probability that the kth item e xamined is the one required)
k=I
(20. 1 ) k= I
240
Physical O rganization
++,\, I I
.......
\
\ \ \
\
11
......
ItI \\ !1 I \ ' I I \ \ '- ,
/ ; I _,,,,, / I / I / I I I I I I I I I //
\
Part
\
I I
I I
I I
I
I I I I
\.
'-. ' �-�� \
-......
\
\
\
\
\
-..... ......
\
,
\
-
\
I
I
d Figure 20. 1 0 A ring in which the second pointer points to the head o f the ring, so that a d a t a path en tering t h e ring part w a y around c a n b e routed quickly to information a t t h e head o f t h e ring. Problem : Agai n , what happens if a record i s accidentally destroyed?
S K I P-SEA R CH E D C H AI NS
If the items are chained together in seq uence , several techniques can be used for shortening the chain search time. The first enables the search to
Chap. 20
241
Chains and R ing Structures
I \ I \ I \ I ', '' I \
I
2.
I
\ \
..... _ _
- -
I
\ \ \ \ \ \ I
I I I I
I
4.
Figure 2 0. 1 1
A coral ring combines the advantages of the two-way ring ( Fig. 2 0 . 1 0 ) and t h ose of the ring with pointers t o its head ( Fig. 2 0 . 8 ) with only two pointers per record .
skip along t he chain (or ring), missing many of the items. The chain is divided into groups of items which are conne cted by reverse pointers in the first item of each group, as shown in Fig. 20. 1 2. The search begins at the high end of the sequence of chained items and fol lows the reverse pointers until an item is found that is lower than the item sought or is the item itself.
242
Part 1 1
Phys ical Organ ization
Box 20 . 1
O p t i m u m Sk i p Length in a Sk i p-Searched C h a i n
Let Ne be the number of items in a skip-searched chain and Ng be the number of items i n a group that is skipped. There are !Ne / Ng l groups.
The m ean number of groups that m ust be examined in order to find the required item is
1-Z;l L
(k X Probability that the kth group e xa mined conta i ns t h e required item )
k=l
I f the required record is equally l i kely to be i n any group, then t his mean number of groups examined is
r��1 L k=l
k
1
-
l�l
=
l�l + 2
Having found the right group i n the chai n , the search then e xamines the items in the group. The first item i n the group has already been examined. The search must therefore e xamine between 0 and Ng 1 other items. The mean number of items that m ust be examined is -
Ng- 1
L k=O
(k
X
Probability that the kth item examined is the one required)
I f the required record i s equally likely to be in any position in the group, then this mean number of items examined is
Ng - I
L: k= O
1 Ng k __ =
Ng
-
2
1
243
Cha ins and R i ng Structures
Chap. 20
Box 20. 1
continued
Let Ne be the number of items that m ust be examined in total i n searching the chain . The mean number of items that must be examined , E(Ne ), is t he sum of the mean number of groups that are exami ne d ,
I)
and t h e m e a n number of items within a group that a r e examined, I 2, is
(Ng
-
Ng J�l + I + Ng I - -I rNel + N g 2 2 2 2 -
_
(20.2)
W e can adjust Ng , t h e number of items in each group, to give the minimum value of E(Ne ) :
= 0
Ne + I 2N/ 2 when
Ne
2Ng 2
I
(20.3)
2
Thus, t h e optimum number o f items in a chain group i s number of items in the chain) . The skip pointers point to the item .JN; items awa y . The number of i t e m s in the chain that m ust b e inspected i f t here are [y'N; ] items in each group is [ from Eq . ( 20 . 2 ) ]
..J (the
E( Ne ) =
�1 I iy0Ycl
_!._ I 2
+
IVAfc l 2
:!:o .
fJT
v ive
(20.4)
�
"
T
0--
Figure 2 0. 1 2 A skip-linked chai n , for faster searching of sequential chained items. The optimum skip length is the square root o f the length of the chain.
Figure 20. 1 3 A skip-linked coral ring.
0-0--0
Chap. 20
Cha i ns and R i ng Structures
245
I f it is lower than the item sought, the forward pointers are followed to the item i n question . Where a chain is divided into groups of items for skip searching, the optimum size of a group is ..j!V;, where Ne is the number of items in the chain. (See Box 20. 1 . ) The mean n umber of items that must be examined is then approximately � . Rings may sim ilarly be organized with skip pointers. Figure 20 . 1 3 shows a cora l ring in which skip po inters alternate with pointers to the head of the. ring. A second approach to lowering the search time for a sequentially ordered chain is shown in Fig. 20. 1 4. The chain is divided into segments, and an index gives the value of the first item in each segment and its starting address. This is somet imes known as a m ultilist chain. lf there are Ni index entries and the chain is chopped into equal segments, the mean number of items that m ust be exam ined when searching the chai n is
M U LT I L IST CHAI NS
E(Ne )
� +I 2
( 20. 5 )
The total of the access times required when searching a cha in is more important than the total of items inspected . The segments of multilist chain may be organized so that no piece e xtends beyond a certain hardware cell or boundary selected to m inimize access times. For example, the segments may each be confined to a cylinder so that no seeks occur when following a chain. On a smaller scale, they may be confined to a track so that each segment is in core when it is searched. C E L L U LA R
CHAINS
Ce llular chains may b e organized so that different CE L L U LA R segments of them can be searched in parallel. The CHAINS segments are spread out in modules on which sim ultaneous seeks and reads can occur. Figure 20. 1 5 il lustrates such a layou t . This technique can be effective where the data set s are laid out with the sequence skipping from module to module, as
PARA L L E L
N
�
I ndex
� :7
�
!:\"
0
cg:i
;:;· � a· :i
o-- o--o-- o-- o-- 0--0--0-- o--o--
0-0
Figure 2 0. 1 4 A multilist cha i n , for faster searching of sequential chained items.
�
�
=
Chap. 20
247
Chains and R i ng Structures
I ndex
Figure 2 0. 1 5 Parallel cellular chains. Separate chain segments can be searched simultaneously . This type of organization can be effective when the file is spread across several cells, as in the bottom half of Fig. 1 8. 3 , rather than t he top half of Fig. 1 8 . 3 .
in the lower half of F ig. 1 8. 3 , rather than the more common sequent i al within-cyl inder layout of the top half of Fig. 1 8. 3 . We will discuss uses of these chain organizations later in the book .
SUMMARY
Box 20.2 summarizes the types of chains and rings.
Box 20.2
Summ ary of the Types of Chains and R i ngs
Number of Pointers Per Record
Illustrated in Fig.
Simple one-way chain One-way chain with tail link
20. 2 20.3
One-way ring
20 . 7
Two-way ring
2
20.8
Ring with h ead pointers Coral ring
2
20. 1 0
2
20. 1 1
Skip-searched chain
2
20. 1 2
Skip-searched coral ring
2
20. 1 3
Multilist chai n
20. 1 4
Cellular chains Parallel cel lular chains Optimized chains
248
20. 1 5
Pointer to the end of a seq uen ced chain enables new items to be added quickly. The chain head can b e found after locating a chain member. Recovery capability when the chain breaks. Quick access to the head of the ring. Combines the advantages of t wo-way rings and rings with head pointers. Faster searchi ng. Ne items per chain ; #c items per skip group. Combines the advantages of two-way rings, rings with head pointers, and skip searching. Fragm ented chain with index permits faster searching . Chain fragmented into physical cells, or pages. Chain fragmented into cells or pages which can be searched sim ultaneously. Most frequently referenced items percolate to head of chai n .
21
ADDRESSING TEC H N IQ U ES
Records in a logical file are identified by m eans of the unique number or group of characters, called a key . The key is usually a fixed-length field which is in an i dentical position in each record. It may be an account number i n a bank or a part number in a factory . It may be necessary to join two or more fields together in order to produce a unique key , and this is called a concatonated key . For example, the key which identifies the flight record in an airli n e is a combination of the flight number and date o f takeoff. The flight number alone i s n o t unique, as a flight with the same number m ay take off every day. In some files the records contain more than one key. A purchased item may have a different supplier's number and user's number, both of which are employed as keys. Many applications need to identify records on the basis of keys which are not unique. One key, however, m ust be unique because that is the key which is used for determining where the record should be located on the file unit and for retrieving the record from the file. This is called the prime key or iden tifier. The basic problem of file addressing is this: Given a prime key, such as an accoun t n um ber, how does the comp u ter locate the record for that key ? There are several different techniques for addressing records. I n the rem ainder of this chapter we will d iscuss them and their effect on the organization of the files. Technique 1 : Scanning the File
The sim p lest and cru dest way of locating a record is to scan the file inspecting the key of e ach record . This method is far too slow for m ost purposes and is only l ikely to b e used on a batch-processing operation using a serial file , such as tape , in which each record must be read anyway . 249
Part
Physical Organ ization
250
11
t F i rst i nspecti o n : K < K,
/
'
Second i n spection : K < K,
I
Th i rd i nspection : K < K,
\ I
Fourth i nspecti o n : K < K,
_ _ -
/
-..._
/
'
/
::..:: > "' """
> .0
"'
c:
u
"'
::i er
"' "'
(_:'::J.:!_ fL_:"'-�I c:
_ _ _ _ _ _ _
F i fth i nspect i o n : K > K, _
This area is scanned sequent i a l l y
o
er
�
c
"'
-0
0 u
�
0 2!
.D
"' (J c
"' ::i cr
� c
"' "'O
0 (J
�
0
"' :.;:
c fii
"' .i::: ClJ
£
> .0 "' -
ClJ
-"'. u :::J - .o -
8
"O
"' u
�
E iii
"' "O
0
u ClJ er:
Figure 2 1 . 5 Hashing.
j
'
\
' �
'S \ \.).. \�
\,7 o
\
'
"' ClJ -"'. u :i .0
s:
0
't ClJ >
0
will be found with one seek, but some need a second (overflow ) seek. A very small proportion need a third or fourth seek. In Chapter 23 we will discuss hash add ressing. A major concern in the design of file-addressing AND D E LE TI ONS schemes and their associated record layouts is how new records will be inserted into the file and old ones deleted. Inse rtions and dele tions, and the subsequent maintenance operations they cause, will be discussed when addressing techniques and indices are described i n more detail later i n the book. I NS E RTIONS
With some files, combinations of the above tech niques are used to address the records. An index , for e xample , may locate a n area o f the file, and that area is then scanned or binary-searched. A d irect-addressing algorithm m ay locate a section of an index so that i t is not necessary to search the enti re index. COM B I N ATIONS OF
T E C H N I QUES
Part
Physical Organization
262
II
An example o f combined addressing techniques i s the location o f the C ITY-PAI R records in an airline reservation syste m , illustrated in Box 2 1 . 2. Box 21 .2
The C ITY-PA I R records in an airline reservation system give details of what flights fly between any two cities served. Seat availability on the flights is kept in AVAI LA B I L I TY records, but all other information necessary for displaying schedules or availability is in the C ITY-PA I R records. The key for addressing a C ITY-PA I R record is the combina tion of two city names. Any city to which the airline flies can be included in the pair. A typical airline may serve up to 2 5 5 cities 2 and hence need up to 6 5 ,025 ( = 2 5 5 ) CITY-PAI R records. However, these records are of fixed length, and some city pairs have so many flights that not all flights can be i ncluded in one record , and hence o verflow records are needed. Some city pairs are not served by the airline and hence need no C I TY-PA I R record. To further complicate m atters the journey between some city pairs requires more than one flight via one or more connecting cities, and some of these flights could be with another airline. Figure 2 1 .6 illustrates the addressing mechanism that is used. The first step converts both o f the city mnemonics in the terminal operator's input into ordinal numbers. As not more than 2 5 5 cities will be served , I byte can be used for each ordinal number. The convertion is d one by means of a table look-up operation in core. Some of the cities, such as New York, are referred to very frequently , but the majority of the cities are small and referred to much less frequently. The table is scanned serially with the popular cities at the front and the infrequently referenced ones at the back. The same table can be used for converting city ordinal numbers back into city mnemonics. The pair of city ordinal numbers are then used as a key to locate a C I TY-IN DEX record. The fol lowing algorithm is used : (CON 1
x
NC ) + CON2 NI
=
Q+R
Chap.
Addressing Techniques
21
Box 21 .2
continued
where CON 1 = ordinal number of the departure city CON2
ordinal number of the arrival city
NC
total number of cities served (including an allow ance for expansion )
NI
number of en tries in the C ITY-IND E X record
Q
quotient of this division
R
remainder of this d ivision
The quotient Q is used for calculating the machine address of the C I TY-I N D E X record , using an algorithm called the file address compu te program . The remainder of the d ivision , R , is used for calculating the field address in the CITY-INDEX record , as fol lows : BA = R X NE + NH where NE
number of b ytes per entry
NH
number of bytes in the header
BA
relative byte address of the required field in the recon
The entry that is located in the C I TY-IN DEX record is an ordinal number for the CITY-PAIR . This number is given to the file address comp u te program to calculate the address of the required C ITY-PAI R record. Each C ITY-PA I R record contains spaces for pointers to overflow C I TY-PAI R records. The files of records are not stored contiguously but are d istributed across many d isks, as in Fig. 1 8. 5 , to maximize the likelihood of simultaneous access, and hence they lower the mean access time. The file address compute program converts the ordinal numbers given to it into the correct address of the scattered C ITY-IN DEX and C ITY-PAI R records.
263
·1 ;
1i
I N PUT
E
Departu re c i ty Arrival c i ty
E
c
:-0 o
Table I N M A I N M E M O R Y o f city ordinal n u m bers ( i n order of decreasing popu larity)
x Ne) + Con 2 Ni
= 0 (quotient) + R ( remainder)
F i l e address
CITY-I N D E X R ECORDS
Ca lculation of field address i n index record Ba = R x N e + N h
C I TY-PA I R R E CO R DS Overflow addresses ,....,,._.,
O V E R F LOW C I TY-PA I R R E C O R DS
Figure 2 1 . 6 A combination of direct and indexed nonsequential addressing tailored for a specific applicatio n . ( See Box 2 1 . 2 . )
264
Chap. 2 1
265
Addressing Techniques
There is a type of storage device with which no addressing of the types d iscussed is necessary. It is called associative storage. Associative m em ories are not accessed by an address but by con ten t. Associative storage is not widely used yet but will be very important one day. It is discussed in Chapter 3 6 . ASSOCIATIVE MEMORY
Box 2 1 .3 summarizes the fi le-addressing methods.
S U MMAR Y Box 21 .3
Sum mary of F i l e · A d d ress i n g Methods
Technique
Record Sequence in Storage
Serial scan
Key sequence
Suitable only for sequential batch processing.
Block search
Key sequence
Not recommended for searching data records ; used for searching index entries.
Binary search
Key sequence
Not recomme nded if it requires time-consuming seeks. Suitable only if the data are in solid-state storage.
I ndexed sequentia l (Chapter 2 2 )
Key sequence
The most commonly used method. A dvantages : Good storage utilization. Records in key sequence sui table for batch processing. Disadvan tages : Care needed with the handling of inserted and deleted records. Poor with highly clustered insertions.
continued:
Physical Orga n i zation
266
Box 21 .3
Part
continued
Indexed nonsequential
Any sequence
A d vantages : No problem with insertions. Physical sequence can be employed for som e other purpose, e.g . , optimizing access time. Useful for secondary-key addressing. Disadvan tage : M uch larger i ndices than indexed sequential.
Key = address
Key sequence
Very limited. Highly inflexible.
Algorith m
Sequence determined by algorithm
Addressing algorithms tailored to applications can give fast access times but often give poor storage utilization and destroy data independence.
Hashing (Chapter 23 )
Sequence determining by hashing transform
Useful and efficient technique. A dvan tages : Faster than indexing. No problem with insertions and deletions. Can take advantage of variations in reference density . D isadvan tages : May give lower storage utilization than indexed methods. Records not in sequence for bat ch processing.
11
22
I N D E X E D SEQ U E N T I A L ORG A N I ZATIONS
Records e xist on storage devices in a given physical sequence. This seq uencing may be employed for som e purpose. The most com mon purpose is that records are needed in a given sequence by certain data-processing operations, and so t he y are stored in that sequence. A weekly run may be made, for exam ple, list ing the details of all customer accounts i n account numb er order. Different applications m ay need records in different sequences. In batch-processing operations using tape or card files, much time is spent sorting the files from one sequence to another. On direct-access files the records are stored in one sequence only, and it is desirable to select the most useful sequence . The most common method of ordering records is t o have them in sequence by a key-that k e y which is most commonly used for addressing t he m . Unless t he keys follow a completely regular pattern, direct addressing i s not then possible, and an index is required in order to find any record without a lengthy search of the file. Two types of processing may be used with an indexed sequential fil e : ( 1 ) sequen tial pro cessing in which records will b e referred to i n t h e same sequence as their file layout , and ( 2 ) random processing in which records are accessed in any sequence with no consideration of t heir physical organiza tion. The ratio of the amount of sequent ial processing to random processing in the usage of the file may affect the choice of index sequential organization. Some files are used mainly for sequential processing with a small number of random accesses. Other files are addressed predominantly at rando m . This d ifference is reflected in the organization techniques that are SEQUENTIAL O R
RANDOM P ROCESS I N G ?
267
268
Physical Organ ization
Part 1 1
used. Some are basical l y sequential organizations with a n auxiliary index for random file accesses. Others are designed to maximize the efficiency of accessing at random , with sequential file processing being a secondary consideration . I f the data record s are laid out sequentially by key , the index for that key can be much smaller than if they are nonsequentia l . Figure 2 2 . 1 shows an index for sequential records. Figure 2 2 . 2 shows an i nde x for the sam e records laid o u t randomly ( perhaps in order of arrival ; perhaps in sequence by some other key) . The index for the nonseque ntial file is more than N times larger than that for the seq uential file, where N is the number of entries per index block.
Although the sequential data organization of Fig. 22. 1 requires a smaller index, it is more difficult to maintain. When new records are added the fil e m ust either be reshuffled to pla ce them in seq uence or else they m ust be placed in a separate location with pointers to t he m . F igure 2 2 . 3 shows four new records added to the seq uential file. They are p laced in an o verflow area. The d isadvantage of having records in an overflow area is that an e xtra read operation , and possibly an extra seek, is needed every time they are retrieved . I n practice a file with overflows l i ke that in Fig. 2 2 . 3 will b e reorganized periodi cally with all the records rewritten in seq uence and the index reconstructed. This operation is referred to as maintenance. Whereas maintenance of the file in Fig. 2 2 . 3 is not too d ifficult , maintenance of som e data-base struct ures, whi ch we will d iscuss later in the book, is a serious pro blem . Figure 2 2 . 4 shows the addition of new records to the nonsequential file of Fig. 2 2 . 2 . The new records are simply added on at the end of the fil e . No overflow pointers are needed and no subsequent maintenance run . Some reshuffling of the index entries is necessary , however, when some of the new records are added . If a file is very large·or has new records added frequently (is vo latile), then the avoidance of maintenance o pera t ions is worthwhile, and the da ta records may be stored nonsequentially. Another way to lessen the need for overflows and mai ntenance with a sequential file is to leave empty record positions throughout the fil e , as shown in F ig. 2 2. 5 . This technique lessens the n eed for overflows or record reshuffling b ut does not avoid it entirely because some of the insertions will be clustered. The addition of LESLIE and JOAN to the records in Fig. 2 2 . 5 causes n o problem , but then t h e addition o f J EN N I F E R cannot b e accomplished without either adding a n overflow record a n d pointer or moving records. The record group beginning with K R I STEN could be moved M A I N T E NANCE
DATA R E C O R DS
I NDEX
i
Relative address
2 3 4 5 6
ANNE B E TTY CAN D I C E CAROL C H LO E
C L EOPAT R A 7 DE LI LAH 8 DIANA 9 E L ECTRA 1 0 E L I ZA B E T H
1 1 ELLEN 1 2
____ 1 3 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
ANNE
F LO S S Y
FRED GEORG I E G E RT G R ACI E JA N E T JUNE K R I ST E N LARA L I ZA LO U I S E MAB L E - S A R A H MARY MO L LY NANCY N AT A L I A
NEFE RTITI 29 O L G A 30 PAM 3 1 PAT I E N C E 3 2 P E N NY 1-� - �����..�.. 33 PO L LY 34 P R I SC I L L A 35 P R U D E N C E
JANET
PO L L Y
36 37 38 39
R OS E M A R Y R UT H SAM A N T H A SCA R L ETT
40 TAMMY 4 1 VA L E R Y 4 2 VAN ESSA 43 W I L LY 44 XANTH I PPE 45 Y V O N N E 46 ZOE
Figure 22. 1 Fig. 2 2 . 2 .
An
indexed
seq uential
organization.
Com pare
with 269
DATA R E C O R D S
INDEX ANNE 1 5 B E TTY 1 CANDI 40
CAROL 24 CHLOE 3 CLEOP 38
Relative address
t
1 B ETTY 2 JUNE
D E L I L 44 D I ANA 7 ELECT 8 E L I ZA 27
3 C H LO E 4 K R ISTEN 5 YVO N N E 6 MOLLY
E L L E N 29
7 D I ANA
FLOSS 1 9
B E L ECT R A 9 OLGA
F R E D 16
10 G R AC I E
GEORGE 39
1 1 LARA 1 2 NANCY
G E RT 45
GRACI 1 0 JANET 20 JUNE 2
13 P R U D E N C E 1 4 SAMANTHA 1 5 ANNE 16 FRED
K R IST 4 LARA 1 1 L I ZA 43
1 7 M A B L E -SA R A H 18 MARY 1 9 F LOSSY 20 JAN E T
LOUIS 36 MABLE 1 7 MARY 1 8
MOLLY 6 NANCY 1 2 NATAL 41
21 PAM 22 XAN T H I PP E 23 P R I S C I L LA 24 C A R O L 25 ROSE M A R Y 26 R U T H 27 E L I ZA B E T H
N E F E R 28
28 N E F E R T I T I
OLGA 9 PAM 21
29 E L L E N 30 ZO E 3 1 PAT I E N C E
PATIE 3 1
32 PENNY
PENNY 32 POLLY 4 2
PR ISC 2 3 PRUDE 13
33 VAN ESSA 34 W I L L Y 3 5 VALE R Y 36 LOUISE
ROSEM 2 5
37 SCAR L E TT
R U T H 26
38 C L E OPAT RA 39 G E O R G I E
SAMAN 1 4 SCl\ R L 37 TAMMY 46 VALER 35 VANES 33
WILLY 34
40 CAN D I C E 41 NATA L I A 4 2 PO L L Y 43 LIZA 4 4 D E L I LAH 45 G E R T 4 6 TAMMY
XANTH 22 YVONN 5
ZOE 30
Figure 22.2 I n an inde xed n o nsequential organization, a much larger index is needed than with an indexed sequential organization. 270
I NDEX
DATA R E CO R DS Relative address
i
Overflow
1 ANNE 2 B ETTY 3 CAN D I C E 4 CA R O L 5 C H LO E 6 C L EOPAT R A 7 D E L I LAH
8 D I ANA
ANNE CH LOE E L E CT F R ED
E LECTRA E L I ZA B E T H 1 1 E LLEN F LOSSY F R ED GEORG I E 1 5 G E RT 1 6 G R ACI E 1 8 JUNE
1 02 �
1 9 K R ISTEN 20 LA R A 2 1 L I ZA
1 00 ,...
22 LO U I S E 23 MAB L E-SA R A H
1 03
1 7 JA N E T
JANET L I ZA MOLLY OLGA
1------�
24 M A R Y 25 MO L LY
'\
\
26 N A N CY ANNE JA N E T PO L L Y
27 NATA L I A 28 N E F E R T I T I 29 O LG A 30 PAM 3 1 PAT I E N C E 32 P E N N Y
PO L L Y 1------ 33 PO L L Y RUTH 34 P R I SC I L L A VA L E R 35 P R U D E N CE YVONN
36 R OS E M A R Y 37 R UT H 38 SAM A N T H A 39 SCA R L ETT 40 TAM M Y 4 1 VALE R Y 42 VAN ESSA 43 W I L L Y 44 XAN T H I PPE NOT USED
New records
'
�
added
Overflow
1 00 L E S L I E 1 0 1 JOAN 1 02 J E N N I F E R 1 03 M A RTA
45 YVON N E 46 Z O E
Figure 22. 3 Four new records added to the file illustrated in Figure 2 2 . 1 . 27 1
1 01
\
__/
INDEX
+
DATA R E CO R DS
Relative address
1
ANNE CAROL DELIL E U ZA
BETTY
2
JUNE
3
CHLOE
4
K R ISTEN Y VO N N E
6
MOLLY DIANA
FRED GRACI JUNE LIZA
ANNE FRED '•ARY RUTH
MARY NE F E R PATIE PAISC
272
B
E LECTR A
9
OLGA
10
G R AC I E
11
LARA
12
NANCY
13
P R UD E N C E
14
SA'JIANTHA
15
ANNE
16
FRED
17
M A B L E-SAR A '"'
18
MARY
19
F LOSSY
20
JAN E T
21
PAM
22
XANT H I PP E
23
P R I SC I L L A
24
CAROL
25
ROSEMARY
26
R UTH
27
E L I ZA B E T H
2B
N E F E RTITI
29
ELLEN
30
ZOE
31
PAT I E N C E
32
PENNY
33
VAN ESSA
34
WI LLY
35
VALERY
36
LO U I S E
37
SCARLETT
3B
CLEOPATRA
39
G E OR G I E
40
CAN D I C E
41
NATALIA
42
POLLY
43
GERT
44
D E L I LAH
45
LIZA
46
TAM M Y '
RUTH TAMMY
47
LESLIE
WILLY
48
JOAN
ZOE
49
JENNIFER
50
MARTA
Figure 22. 4 Four new records added to the file illustrated in Figure
22.3.
DATA R E CO R DS
INDEX 1
ANNE
2 BETTY
3 CANDICE 4
5 CAROL
6 CH LOE
7 CL EOPATRA
8 9
DE LI L A H
0 DIANA
1
2
E LECTRA
5
F L OSSY
7
FRED
3 E L I Z AB E T H
4 E LLEN 6
8 GEORG I E 9 GE RT 0 1
GRACIE
2
JA N E T
3 JUNE 4
5 K R ISTEN
6 LARA 7
LIZA
8 9 LOUISE 0 M A B L E -SA R A H 1
2
MARY
3 MO L L Y 4 NANCY
5 NATAL I A 6
7 NEFERTITI
8 OLGA 9 PAM 0 1
2
PAT I E N CE PENNY
3 PO L L Y
4
5 P R ISCI L L A
6 PRUDE NCE
7 R OS E M A R Y
8
9 RUTH 0 SAM A N T H A
1
2 3
SCA R L E T T TAMMY VALERY
5 V A N E SSA
6
7 WILLY
8 X A N T H I PPE 9 0 1
YVONNE
2 ZOE
3
4
Figure 22.5 Em pty positions left among the data records to accom m odate insertions. Clustered insertions will still require overflow records or record reshuffling. 273
274
Physical Orga n i zation
Part 1 1
down one position, but this group i s now full and so the group beginning with LOUISE would also have to be pushed down one position . Then t he addition of MARION would necessitate a further shuffle . The technique of leaving gaps in the fil e i n anticipation of new items being inserted is called distribu ted free space. Although overflow records could be avoided in this way, a periodic maintenance run would still be desirable to reestablish free space in the file.
An indexe d sequential organization can be tailored to fit a specific file hardware configuration. To do so saves both access time and storage space. The argument against fitting the organization to the hardware is that the type of storage devices may be changed or the file may be m oved from one type of storage device to another. In such cases it is advantageous if the physical organ ization is hardware-independent . W e will illustrate the differences i n indexed sequential organizations by reference to two "access methods" used with the I B M System 370. The first is I SAM (Indexed Sequential Access Method) in which the indexes and blocks are designed to fit specific file units. The second is VSAM (Virtual Storage Access Method), which is hardware-independent. H A RDWA R E
CO N S I DE RATIONS
With I SAM files the records are grouped so as to fit onto physical disk tracks, and one track on each cylinder contains an index to the records stored in that cylinder. When new records are i nserted after the original sequential file has been set up these are stored in an overflow area. The index track contains pointers both to the prime data area and to the overflow area. The overflow area, as shown in Fig. 2 2 . 6 , can be on a track in the same cylinder as the prime data into which the overflow items are inserted . Alternatively, it can be in an entirely separate location . The advantage of the former technique is that it is on the sam e cylinder as the track index whi ch refers to i t . As indicated in the top illustration of Fig. 2 2 . 6 , one read and no seek is needed to go from the track index to an i tem in the overflow area. I n the bottom i llustration o f Fig. 2 2 . 6 , a seek i s shown from the track index to the overflow item . However, the overflow items might be on a different module capable of being accessed in parallel with the prime data module. In this case the accessing of the next i tem could begin while the overflow item is being read . A n overflow track for each cylinder, a s in the upper diagram o f Fig. 22.6, is com monly used. Unfortun ately, however, the overflow track may ISAM
I ndexed Sequential Organizations
Chap. 22 0 .... Q) "
.s > u
v
L{)
c
'O
.... Q) "
c
N .... Q) "
c
(") .... Q) "
c
.... Q) "
> u
> u
u
> u
>
.... Q)
c
>
u
275
u
> u
u
c
>
(") .... Q) "
.s > u
v
x
c
.... Q) "
c
> u
.... Q) "
> u
> u
.... Q) "
>-
c
Track indexes
Prime data area
l ndlpendant overflow area
Figure 22. 6 The overflow area in an index sequential file can be either on the same cylinders as the prime data area or in an ent irely separate part of the file unit. Usually both are used.
become filled with overflow records. There is a lways a chance that this m ight happen quite suddenly. Therefore, many I SAM systems use bo th a cylinder overflow area ( upper diagram of Fig. 2 2 . 6 ) a nd an independent overflow area (lower d iagram of Fig. 2 2 . 6 ) . A track can then be found in the independent overflow area when the cylinder overflow track fills up.
B LOC K I N G
As discussed in Chapter 1 8 , a sequential file is often blocked , whereas a random file may not be,
276
Physical Orga n izat ion
Part
11
and the blocking substantially increases the packing density o n direct-access devices (Fig. 1 8 . 1 ). When an indexed sequential fil e is originally loaded the records are usually blocke d . I f variable-length records are used , they may be stored in fixed-length blocks (Fig. 1 8. 2 )
Most indexed files need m ultiple levels of index, i .e . , indices to the index as in Fig. 2 1 . 3 . The lowest level of index is usually dispersed among the data records in order to minimize seek times, as with the track indices in Fig. 2 2 . 6 . In I SAM t h e level o f index above t h e track index is called t h e cylinder index . The track index contains the highest-value k ey on each track and points t o that track. The cylinder index contains the highest-value key on each cylinder and points to the track index of that cylinder. The level of i ndex above the cylinder index is called the master index . The cylinder index is on a cylinder that is separate from most of the data. It may tie on the same module as the data, but the access operations are usua l ly faster if it is on a different module so that seeking the cyli nder index can occur while the previous data access is sti l l ta king place . The level of index above the cyli nder index is called the master index . The cylind er index is organized by track , and the master index contains the highest-value key on each track of the cylinder index, along with a pointer to that track. These indices are shown in Fig. 2 2 . 7 . There m a y be o n l y one level of master index, o r there m ay be up t o three levels, as in Fig. 2 2 . 7 . The master indi ces, like the cylinder index, are organ ized by track. The highest level is small enough to fit onto one track; it i.s this fact which determines how many leve ls of index are neede d . Let u s suppose that a computer using t h e I SAM files illustrated in Fig. 22. 7 is required to read the record whose key is 1 44. The seq ue nce of events is as follows : M U LT I PL E LE VE LS OF INDEX
1 . The computer examines the highest level of the master index, which will normally be in main memory. The lowest entry in this index is 30500, which has a pointer to the of the level 2 master index. 2 . The computer reads the relevant track of the level 2 master index. The lowest entry is 2 1 00, which po ints to the first track of the level 3 master index. 3. The com puter reads the relevant track of the level 3 m aster index, which is likely to be on the same cylinder as the level 2 master index in which case no seek is needed. The next highest entry above 1 44 is 230, which po ints to the first track of the cylinder index.
Chap. 22
I ndexed Sequential Orga nizations
277
4. The computer reads the relevant track of the cylinder index. The next highest entry above 1 44 is 1 64. The entry points to the track index of cylinder 3 . 5 . The computer seeks cylinder 3 a n d reads the track index. T h e next h ighest entry above 1 44 is 146. This entry points to track 6. 6. The computer reads the relevant block of records into core from track 6 and scans the block to find logical record 1 44 . The ISAM software gives this record to the application program which requested it. '
The way VSAM reads a record is somewhat similar to I SAM e xcept that , since it is hardware-indepen den t, we no longer describe the operation in terms of tracks and cylinders. Figure 2 2 . 8 illustrates VSAM . I nstead of cyli nders subdivided into tracks, the d iagram shows con trol areas subdivided into con trol intervals. The control intervals in one data set are all the same length, and widely varying lengths can be selecte d . There may be several control intervals per track, or the control interval may spread over several tracks. J ust as I SAM has a track index, so VSAM has a control interval index. I t is called a sequence se t . J ust as I SAM has one track index per cylinder, so VSAM has one seq uence set index per control area. The sequence set index is itself indexed b y a hierarchy (tree ) of indices not unlike the I SAM master indices. These are called the index set . The sequence set index contains the highest key value in each con trol interval and points to that control interval. The lowest level of the index set contains the h ighest-val ue key in each control area and points to the sequence set index block for that control area. To find a record , VSAM starts with the uppermost block of the index set , as d id ISAM , and works its way down . The structure of the VSAM index blocks i s d ifferent from those of I SA M . Index structures are discussed in Chapter 30. The most important difference between I SAM and VSAM , apart from the q uestion of d evice independence, is the way they handle the i nsertion and deletion of .r ecords. VSAM
The organization of an indexed sequentia l file DE LETIONS would b e a simple matter if no new records had to be added to the file. The handling of insertions and deletions is t he main reason for differences in indexed sequential techniques. In the traditional magnetic tape file, insertion and deletion is no problem . The required changes are sorted into the same key seque nce as the I NS E R T I ONS AN D
1 30500 I 6 1 60 1 I98765 I Master i ndex ( l evel 3)
Figure 22. 7 I B M's I S AM ( I nd e xed Sequential Access Method ). Other illustrations of I SAM are in Figs. 2 2 . 1 and 2 2 .3 .
Level 3
I ndex set (See F ig. 30. 7 )
Control i nterval 0 Control i nterval 1 Control i nterval 2 Control interval
3
Control i nterval 4 Control interval Control i n terval II.)
�
I
5 6
Control i nterval 7 Control i nterval 8 Control interval 9
Figure 22. 8 I B M 's V S A M ( V irtual Storage Access M ethod ). S e e also F i g .
22.5.
280
Physical Organization
Part
II
file ; they are read i n with i t , and a new file is written with the additions and deletions incorporated . With a d irect-access fil e i t may also be possible to delay the i nsertions until the end of t he day or the end of the month as with batch processing. (There is no need to delay the deletions ; a record can be marked to indicate that it is effectively deleted . ) Where such delay is permi ssible the file might be periodically rewri tten, l i ke a tape file, to incorporate the changes. This approach is o ccasionally used, but two snags occur on large and com plex files. First, the file may be so large that rewriting it is excessively time-consuming and should be postponed as long as the maintenance and recovery techniques permit. Second , rewriting it may be complex, and hence time-consu ming, because the file is structured with multipl e pointers, chains, or secondary indices. Except with very static (nonvolatile) files, it is generally desirable to avoid having to rewrite direct-access files in order to handle insertions. On some sy 7tems it is im perative the new records be inserted in real time. This is true on an airline reservation system , for example, but here the files are not seq uentially ordered . Where a very high volume of real-time file inserts are needed sequential files are usually avoide d . On m any , perhaps the majority , of the system s using sequential files, insertions can be made at the end of the day , rather than in real time.
As illustrated earlier, there are two methods of a ccom modati ng i nsertions without having to re write the file. First, they may be stored in an area specially reserved for o verflows, as in Fig. 2 2 . 6. A means of locat ing the items in the overflow area is needed . The items in it will sometimes be needed in sequence , although they were not inserted into file in sequence . To make matters worse the insertions sometimes arrive in clusters . I t is always more diffi cult to devise a satisfactory means to handle clustered insertions than insertions which arrive singly and with random key values. Some applications have highly clustered insert ions. In a factory , for example, a block of new part numbers may be a dded whose keys fit between two e xisting keys. There may be I OOO new part numbers in the insertion. This is a rare circumstance but one which has on occasions played havoc w ith overflow schemes. There are three main ways of addressing the overflow area. The I SAM method of addressing the overflow is to use pointers from the track index and then chains to ind icate the key sequence of the inserted items. A field exists in the track index for one pointer for each prime data track . OVE R F LOWS
Chap. 22
I ndexed S equential O rga nizations
281
The I SAM overflow procedure is illustrated in Fig. 2 2 . 9 . The data records have alphabetical keys of which the first three letters are shown. When the file is first loaded the overflow track is empty. The first record to be inserted has a key that begins with the letters ARK. It is therefore fitted i nto prime data track 1 between APU and ARM . The three records on the track with a key higher in sequence (alphabetical order) than ARK have to be slid to the righ t . There is then no room for the record ASP on the track, and so it is written on the overflow track. On the index track there is a position for an overflow entry for track I , and the new ad dress of ASP is written in that position. Next a cluster of inserts are added : BED, BEG, BEN, and BET. These go onto track 2 with the resul t that four records, B I N , BIT, BUZ, and CAD, go to the overflow track. The highest key of these four is written in the overflow index entry for track 2, along with the address of BIN, the one with the lowest k ey . BIN is chained to B IT, BIT to BUZ, and BUZ to CAD. After three more records have been inserted the overflow track is full , and so the next record t o b e d isplaced , BEN , i s written o n a track in the independent overflow area. It is chained to the other track 2 items on the original cylinder overflow track. The next record cast out to reside in the independent overflow area is A R L, and this is chained to the items from its native track 1 . This scheme is excellent for a not-too-large number of scattered insertions. If there is only one overflow from a prime data track, it can be retrieved as quickly as the items still on the prime data track. If several items have overflowed but still reside on the overflow track of the original cylinder, the chains between them can be quick ly fol lowed in core. If, however, there are m any clustered overflows scattered around the indepen dent overflow area, followi ng the chains requires repeated reads and seeks and can b ecome time-consuming. There are stories associated with I SAM files about computer operators watching the seek arm clicking away for many m inutes while the system waits for the record i t is attempting to read. The second way to address the overflow items avoids the chains between items by having multiple pointers to each of the overflow records. I nstead of having one pointer for each prime data track overflow, the index track i s designed so that there could be many pointers. This schem e complicates the index track organization and leaves t he possibility of running out of space for pointers. A t hird way to organize the overflow items is to put t hem into an independent overflow area which has i ts own index, as shown in Fig. 2 2 . 6 . T h e reading of any overflow record then requires a seek to t h e overflow cyl i nder, the reading and inspection of the overflow index, and then the
1.
The file as originally loaded
Track index Prime track 1 Prime track 2 Overflow track
2.
The file after the insertion of record A R K
Track index Prime track 1
Prime track 2
All BAD
APE BAZ
APU BIN
ARK BIT
ARM BUZ
ART
CAD
Overflow track
3.
Track index Prime track 1
Prime track 2 Overflow track
4.
After the insertion of records ACE, BAR, BAT, and AMY, in that order
Track index Prime track 1 Prime track 2
Overflow track
Independent overflow area
Figure 22. 9 The handling of insertions and deletions with separate overflow areas (the technique used in I B M ' s I SAM ).
reading of the track containing the record . This procedure takes longer than reading an I SAM overflow record without chains, but the possibility of having very lengthy chains is avoided. An I SAM file behaves well as long as the overflow chains are short . A maintenance run is therefore carried out periodically to rewrite the file, 282
1. The file as originally loaded: Sequence
{
Control area containing four control intervals
}
Distributed free space
set index
Control information
:>
Contr I area . contammg four control intervals
2. The file after the insertion of records ARK, BED, and B E G :
3. After t h e insertion of record B E N , a control interval splits:
4. Records BET, ARL, ACE, BAR, and BAT are inserted, and then the insertion of record AMY causes the control area to split:
"
New control area at the end of the data set
Figure 22. 1 0 The handling of insertions and deletions with distributed free space and cellular splitting (the technique used in I B M 's VSAM).
putting all t he record s on the prime data tracks. I f the interval between maintenance runs is too long, performance degradat ion in the form of lengthy file accesses will occur. D I STR I BUTED F R E E SPACE
The use of overflow areas always increases the access time needed to read some of the i nserted records. An alternative is to use distributed free 283
284
Physical Organization
Part I I
space , which incurs little penalty i n time. It would not be economical to have gaps betwee n every logical record ; therefore, the logical records are arranged into groups (possibly physical records), and a gap is left in each group. Some en tire groups may be left empty. When an item is inserted into the distributed free space this can be done i n one of two ways. First, t he item could be placed in the gap without reshuffling the other items. To find it then, the entire group would have to be scanned. Second, the othe"r items i n the group could be moved when the new item is inserted so that t he items in the group as a w hole remain in sequence. It takes longer to insert an item because data records are moved, but the subsequent reading of the items will be quicker. As reading occurs far more often than the insertion of new items, the latter approach is the better. VSAM uses two kinds of distributed free space, as shown in Fig. 2 2 . 1 0 . First , each co ntrol interval is not completely filled with records when the file is set up. Second, some entire control in tervals within a control area are left em pt y . Each index entry i n the sequence sets index points to one control i nterval, and if that control interval is empty, there will be a free-space entry in the index. The systems programmer who sets up a file specifies how m uch free space and how many em pty control intervals that fil e will have . When a record is deleted from a VSAM file the remaining records i n the control interval are slid to the left . All the em pty space in t he control interval is then contiguous and so is more easily allocated to new records. I f deletion o f records empties a control i nterval entirely, then a free-space entry will be made for that control interval in the ind e x . These processes are referred to as dynamic space reclamation.
If an appropriate amount of free space is left in the control intervals, most new records can be inserted by sliding existing records to the right within t he control intervals. Occasionally, however, there will not be enough free space left in a control interval to a ccom modate the new record . In this case a control interval split must take place. The software moves approximately half the records from that contro l interval into an em pty control interval in the sam e control are a . The software finds empty control intervals by means of the free-space e ntries that are in the inde x . Occasiona l ly there w i l l not be a free control interval left in the control area in question. In this case a con trol area split must take place, which is similar to a control interval split. The software establishes a new control area at the end of the data set and moves about half the control intervals from the full control area into it with all their data records. The new control area CE L L ULAR
SP L ITT I NG
Chap . 2 2
285
I ndexed Sequential Organizations
may be created from space already a llocated for it in the data set , or new space may be created by adding an additional extent to the data set . When a control interval or control area split occurs the indices are adj usted to reflect it. The technique of successively splitting groups of records is som etimes called cellular splitting There is, perhaps, a certa in esthetic appeal to cel lular splitting for it enables a file to grow rather like a biological organism splitting its cells. I t can be designed so that no periodic mai ntenance is required . The maintenance is done in effect when the splits occur. The upper diagram of Fig. 2 2 . 1 0 shows a VSAM file as originally set up with i ts empty space in the contro l intervals and , i n this case, one empty control interval i n each control area. The second diagram shows how the records with keys beginning with ARK, BED, and BEG are inserted without splitting any control intervals. Note that the records can be ful ly variable in length . An update operation may shorten or lengthen a record , and the record is refitted into the file like an insertion or deletion . When the record BEN is added to the file i n Fig. 2 2 . 1 0 , it splits the secon d control interva l . Two new control i ntervals result, each with five records in key sequence. Before the split the sequence set index ind i cated that t he fourth control interval was free . After t he split it gives the v .Jue of the h ighest key in the control interval. Five more records, BET, ARL, ACE, BA R, and BAT, are inserted with no further splitting. The next record is too long for the first control interval . As no free control interval is left , the control area splits. The control intervals are divided sequentially b etween the two control areas, and the index b locks are mod ified to reflect the change . After a co ntrol interval split, the records are in sequence within the control intervals but not within the control area as a whole. They can b e retrieved sequentially b ecause the entries i n the sequence set index are in sequence (third d iagram of Fig. 2 2 . 1 0 ) . When the control area splits, the control i ntervals i n the two resulting control areas are in sequence once more. The control area split is in effect a file maintenance o peration. After the control area split, however, the control areas are no longer in seq uence and neither are the sequence set index records which relate to the m . The entries in the lowest level of the index set are in sequence. The file can be left with its control areas out of sequence. This does no harm except to increase the access time for sequential processing very slightly because a see k is needed to go to an inserted control area and back. The sequence set index b lo cks, each of which relates to a control area , are chained together horizontally as shown in Fig. 2 2 . 8 . I f the control areas are out of sequence, these horizontal pointers wil l indicate the correct key sequence. Sequential pro cessing takes place witho ut using the index set by •
Part 1 1
Phys ical O r ga n ization
286
fol lowing the horizo ntal pointer o f the seq uence set . The index set i s used only for direct a ccessing. A VSAM key-sequenced file can thus be left with no periodic mai ntenan ce runs. The use of d istributed free space generally results in higher storage requirements than an overflow chaining m ethod . However, it permits m uch faster retrievals of i nserted records. The time taken to retrieve an i nserted record with VSAM is the same as the time taken to retrieve o ne of the original records. The control i nterval and control area splitting makes possible the accommodation of highly clustered i nsertions without paying the penalty of either leaving an excessive amount of d istributed free space or else having time-co nsuming overflow chains. Box 22. l summarizes the techniques for handling insertions in an indexed sequential file.
The positioning of the indices on the file units have a considerable effect on the access times with an indexed sequential file.
POS I T I O N I NG T H E I N D ICES
Repeated copies o f t h e sequence set index, to shorten access time.
F i rst track
Identical copy
Sequence set i ndex
Second track
Control i nterval
Th i rd track
Control interval
Fourth track
Control i n terval
F ifth track
Control interval
Figure 22. 1 J
·
11 II 11 11
Control interval
Control interval
Control interval
Control i n terval
11 II 11 II
Control i n terval
Control i n terval
Control interval
Control interval
The sequence se t index record is placed next to the control area it indexes to avoid unnecessary access time, and is replicated several times around a track to lessen the rotational d elay .
Control area
Box 22.1
A Summary of the Techniques for Handl ing I nsertions in an I ndexed Seq uential F i le
Time requirements:
New records can be inserted
\
/ In real time � At the end of the day
At the end of a longer period when file maintenance takes p lace
Tech niques: Serial-access file (e.g. , magnetic tape)
L Re w r i t e the fi le per i o d i ca l l y with t h e cha nges
m erged into it
Direct-access file 1 . Rewrite the file periodically with the changes merged into it 2. Store the new items in an overflow area Location of overflo w area
� Same cylinder L Separate file area
t
Both
O verflo w addressing method
Chains Multiple pointers (from index to each overflow record) Separate index in each overflow block 3 . Distributed free space so that new items can be merged into groups of items Free space in each physical block ( control interva l ) Free physical blocks Device( control intervals) independent Free groups of b locks ( control areas)
�
Free space in each track Device Free space in each cylinder independent Free space in separate cylinders 4. Cellular splitting (used with technique 3 ) Splitting physical blocks ( control intervals) Splitting gro ups of blo cks ( control areas) No te : Periodic maintenance is needed with te.c hniques 2, 3, and 4. 287
... Q) "'O c > 0.02 u
... Q) "'O ·= > 0.02 u
... Q) c.
�
�
0 c 0 ·;::; u �
... Q) c.
Kl �c...
� 0 c 0 ·;::; u �
0.01
Cy l i nder activity ratio:
0.01
Cylinder activity ratio:
u..
u..
Cy l i nder n u m ber
Cyl i nder n u m ber
200
200
1 90
1 90
1 80
1 80
1 70 u;
"'O c 0 fil .!!?
] �
8
: c 0
u; "'O c 0 u
1 60
.�
1 50
1 70 1 60 1 50
1 40
::! Q)
1 40
1 30
�
1 30
8 "'
:;::
Q) c 0
1 20
.ECl>
.ECl>
Q) E
Q) E
·;::; .:.
0 '° ... 0
� 0 '°
0
1 20 110 60 50 40
I-
I-
20 10
10 0
19
39
59
Position of cyl i nder i ndex
79
99
1 1 9 1 39
Cy l i nder n u m ber (a)
1 59 1 79 1 99
0
19
39
59
Position of cy l i nder i ndex
79
99
(b)
Figure 22. 1 2 The mean record-access time ( t o access the index and then the record) varies with the positioning o f the indl!x in the data. In this illustration the index is o n one cylinder o f a disk unit, and the data is on another cylinder of the same modul e . The best cylinder for the index varies with the distribution o f file activity. 288
1 1 9 1 3 9 1 59 1 79 1 99
Cyl inder n u m be r
.... .,
.... .,
"O c: > u
"O c: �
Kl
�
u .... ., a.
.... ., a.
�
8 "'
0 c: 0 ·.:; u
�
8 "'
0.01
-
0 c: 0 ·.:; u
Cylinder activity ratio:
�
u..
39
59
1 99
0.02 "'
�
"'
i:
0 't .,
0.01
Cy l i nder activity ratio:
�
u..
59
Cylinder n u m ber
79
>
0
1 1 9 1 39 1 59 1 79 1 99
99
Cylinder n umber 200
200 1 90 1 80 1 70
Ii>
"O c: 8 .!!!
] �
8 "'
� 0
Ii> "O c: 0
1 60
�
.!!!
.
1 50
1 4K>
E
�
8 Ol
130
.,
c: 0
1 20
5
5 �
-
.,
U)
·E
E
.:,£
·.:; .:,£ .,
0
r-
2l
�
0 ] 0
0 "iii r-
20 10
0
19
39
59
Position of cyl i nder index
79
99
1 1 9 1 39 1 59 1 79 1 99
Cy l i nder n u mber
0
19
39
59
Position of cy l i nder i ndex
79
1 1 9 1 39 1 59 1 79 1 99
99
Cyl i nder n u m ber (d)
(c)
Figure 22. 1 2 (co n t. )
289
290
Physical Orga n i zation
Part 1 1
The lowest-level index (track index or seq uence set inde x) is usually interspersed am ong the data records so as to avoid seeks in going from this index to the data . In addition, it may be repeated severa l times around a track to minimize the time waiting for track rotation when it is read , as shown in Fig. 22. 1 1 . The cylinder index in l SAM or lowest leve l of the index set in VSAM sho uld be place d in a posi tion that minimizes both the index seek time and the subseq uent data seek t ime. An effective way to do this is to place the index on a separate module where an access m echanism sits over it doing nothing but index reads. The i ndices have characteristics different from the data , and there is much to be said for storing them on different types of devices. lt is parti cularly important to do so when the data are on a file with a single access mechanism , such as a data cell drive . It is time-consuming to move the data cell access mechanism backward and forward between the higher levels of index and the data. The higher levels of index should be moved to a separa te, smal ler, and faster device such as small disk uni t . Often a data set is on one disk pack, a n d that disk pack contains its own cylinder index (or virtual equivalent ) . The systems programmer usually has a choice of where to position the cylinder index, and he should choose that cylinder which minimizes the seek t imes. Surprisingly often, the cylinder index has been placed on cylinder 0 and this is usually the worst place for i t . If the frequency of reference is evenly distributed across the fi le, the optimum posi tion for the cylinder index is in the center of data. Often the file has a far from even reference density, and then a calculation may be needed to determine the optimum position for the cylinder inde x . Figure 2 2 . L2 shows two cases. The upper chart for each plots the reference density across the file. The lower chart shows the m ean and standard deviation of access time for the two reads (seeking and reading the cylinder index and then seeking and reading the data cylinder). It will be seen that the best cylinder on whi ch to position the index varies with the distrib ut ion of file activity.
23
H AS H I NG
Hashing ( Fig. 2 1 . 5 ) has been used for addressing random-access storages since they first came i nto e xistence in the mid- 1 9 50s, but nobody had the temerity to use t he word hashing until 1 968. The word randomizing was used until it was pointed out that not only did the key conversion process fai l to process a random number, but , contrary to early belief, it was undesirable that it should . Many systems analysts have avoided the use of hashing in the suspicion that it is complicated . I n fact it is simple to use and has two important advantages over indexing. F irst, it finds most records with only one seek, and , second, i nsertions and deletions can b e hand led without added complexity . I ndexing, however, can be used with a file which is sequential by prime key , and this is an overrid ing advantage for some batch-processing applications. There are many variations i n the techniques available for hashing. They have been compared in many different studies [ 1 ,2,3 , ] , and from t hese studies we will draw certain guidelines about which are good techniques and which are best avo ided.
FACTORS A F F ECT I NG E F F I C I E NCY l
The factors which the systems analyst can vary when using hash addressing are as follows :
. The bucket size .
2.
The packing density, i.e . , the numbe r of buckets for a file of a given size .
3 . The hashing key-to-address transform . 4 . The method of handling overflows. 291
Part 1 1
Physical Organ i zation
292
Optimal decisions concerning these factors have a substantial effect on the efficiency of the file orga nization . We will review them i n the above sequence.
A certain number o f address spaces are made SIZE available, called home bucke ts . A b ucket can hold one or more records, and the systems analyst can select the bucket capacity . As shown in F ig. 2 1 . 5 , the hashing routine scatters records into the home buckets somewhat like a roulette wheel tossing bal ls into its compartments. Let us suppose that a roulette wheel has 1 00 balls which it will distribute to its compartments. Each ball represents a record , and each compartment represents a bucket. The w heel's compartments can hold 1 00 balls i n total ; however, we can vary the size of t he compartments. I f a ball is sent to a compartment whi ch i s ful l, i t must be removed from the rou lette wheel and placed in an overflow area . I f we have 1 00 com partments which can hold only one ball each , t he wheel will often send a ball to a compartment which is already ful l . There will be a h igh pro portion of overflows. I f we have 1 0 compartments which B U CK E T
x
V>
.0
30
"'
i:
�
QJ >
0
0 c QJ 1:: QJ
c..
20
\
x
x
'x x
10
0
�, x 2
4
5
10
2 0 25
50
Capacity of compartments on roulette wheel
Figure 23. J
1 00
Chap. 23
Hash i ng
293
can hold 1 0 balls each, there will be far fewer overflows. I t is an exercise in basic statistics to calculate the expected number overflows, and Fig. 23 . 1 shows the result. I f a system s analyst chooses a small bucket size , he will have .a relatively high proportion of overflows, which will necessitate additional bucket reads and possibly additional seeks. A larger bucket capacity will incur fewer overflows, but the larger bucket will have to be read into main memory and searched for the required record. Figures 23.2 and 23.3 i llustrate a simple hashing process with a bucket capacity of 2. If a direct-access d evice is used which has a long access time, it is desirable to minimize the numbers of overflows. A bucket capacity of 1 0 or more should be used to achieve this end. On the other hand, if the file is stored i n a solid-state, or core, storage, the overflow acce sses can b e carried out as rapidly as the read opera tions used w hen searching a b ucket . I n this case it is desirable to minimize the bucket-searching operation at the expense of more overflows. In such a case a b ucket size of 1 is economical. Later we will discuss systems using paging i n which a p age containing many items is read into solid-state storage and hashing i s used for finding an item on the page as quickly as possible. A b ucket size of l is used . I n practice, b ucket capacity is often tailored to the hardware characteristics. For e xample, one track of a disk unit, or half a track, may be made one bucket.
The proportion of overflows is also affected by the density with w hich records are packed into the home b uckets. If the roulette whee l can hold 1 00 balls in total and l 00 balls are spun into its compartments, there will be a high probability t hat som e balls will overflow-even if the compartments are quite large. I f only 80 balls are spun , the probability of overflow will be m uch lower.
PAC K I NG DENSITY
. Pa cki ng d ens1ty
=
Number of records stored in home buckets . Maximum number of records that could be stored m them .
When we use this ratio to refer to the home buckets only, ignoring overflow records, we will call it the prime packing density . The above roulette wheel spinning 80 balls is used with a prime packing density of 80%.
Key 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
t
Part
P h y s i ca l Organ i za t i o n
294 Key converted to digits
BETTY JUNE C H LOE K R ISTEN YVONNE M O L LY DIANA E L ECTRA O LGA GRACIE LARA NANCY PRUDENCE SAMANTHA ANNE FRED MABLE-SARAH MARY F L OSSY JANET PAM XANTH IPPE PR ISCI L L A CAROL ROSEMARY RUTH E L I ZABETH NE FE RTITI E L LE N ZOE PATI E N C E PENNY VANESSA WI LLY VALERY LOUISE SCAR LETT CLEOPATRA GEORGIE CAN D I C E NATA L I A POLLY HOPE D E L I LA H GERT DOB BY
Sequence of file loading
11
Remainder after dividing by 29
25338 1455 38365 2992355 856555 46338 49 1 5 1 5353391 6371 79 1 395 3 1 91 51 538 79445535 2 1 4 1 5381 1 555 6954 41 235021 9 1 8 4 1 98 636228 1 1 553 714 7 1 5389 775 79923933 1 3 1 963 962541 98 9438 5399 1 2538 556593939 53355 965 7 1 395535 7555B 51 55221 69338 51 3598 364925 231 93533 33567 1 39 1 7569795 31 54935 5 1 3 1 39 1 76338 8675 453931 8 7593 46228
21 5 27 19 11 25 25 20 20 14 1 5 6 12 1B 23 15 22 26 11 18 27 27 5 8 13 21 13 24 8 0 13 7 28 8 18 0 16 12 25 15 10 4 6 24 2
Sto1111111 Bucket capacity 0 PATI ENCE 1
•
2
SCA R L ETT
LA RA
2 DOB BY 3 4
HOPE
5 JUNE 6 P R U D E NC E 7 VAN ESSA 8 ROSE M A R Y
NANCY
I D E L I LAH I
1 ZO E
9 1 0 PO LLY 1 1 YVO N N E 1 2 SAMANTHA 1 3 RUTH 1 4 G R AC I E 1 5 MABLE-SA R A H
rJANET 1 GEOR G I E I N E F E RTITI I NATA L I A
1 8 C L EOPATRA 17 18 ANNE
I PA M
1 9 K R ISTEN
I
20 E L ECTRA 2 1 BETTY
OLGA E L IZABETH
22 M A R Y 23 F R E Q 24 E L L E N 25 M O L LY
I DI ANA
2 8 F LOSSY 27 C H LOE
1 XANTH I PPE
I
i
28 WI LLY
°'""-{ bucke
P R ISCI LLA PENNY
CAROL VALERY
LOU I SE
ICA N D I C E
The key is converted into digits by retaining only the four bits which represent numbers i n BCD code (see Fig. 3 1 .2') . This method should not be used i n practice because it throws away information in the key. It is used here to provide an easi l yfol lowed i l lustration.
Figure 23.2 A sim ple ill ustration of hashing to a storage with 2 9 prime buckets, each of capacity 2.
The systems analyst can exercise a trade-off between saving storage and saving time by adjusti ng the prime packing density. If the key-to-address conversion algorithm scatters the records into buckets at random like the rou lette wheel, we could ca lculate statistically the percentage of overflows
Hash i ng
Chap. 2 3 Key converted to digits
Key 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49
50
BETTY JUNE C H LO E K R ISTEN YVONN E M O L LY D I AN A E L ECTRA OLGA G R ACI E LARA NANCY PRUDENCE SAMANTHA AN N E FRED MAB LE-SA R A H MARY F L OSSY JANET PAM XANTH I PPE P R I SC I L L A CAROL ROSEMARY RUTH E L I Z A B E TH N E F E RT I T I ELLEN ZOE PATI E NCE PENNY VAN ESSA WI LLY VALERY LOU I S E SCA R L ETT C L EOPATRA GEORGIE CAN D I C E NATA L I A POLLY HOPE D E L I LAH G E RT DOB BY LESLIE JOAN JENNI F E R MARTA
295
Remainder after dividing by 29
2533B 1455 38365 2992355 856555 46338 49 1 5 1 535339 1 6371 791 395 3191 51 538 79445535 2141 5381
21 5 27 19 11 25 25 20 20 14 1 5 6 12
1 555
18
6954 41 23502 1 9 1 8 4 1 98 636228 1 1 553 714 71 5389 775 79923933 1 3 1 963 962541 98 9438 5399 1 2 538 556593939 53355 965 71 395535 75558 51 55221 69338 51 3598 364925 231 93533 33567 1 39 1 7569795 3 1 54935 5 1 3 1 39 1 76338 8675 453931 8 7593 46228 352395 1615 1 5559659 4 1 93 1
Figure 23. 3
23 15 22 26 11 18 27 27 5 8 13 21 13 24 8 0 13 7 28 8 18 0 16 12 25 15 10 4 6 24 2 16 20 28 26
-Bucklt CIPIDitV
0
PATI E N C E
1
LARA
2
DOB BY
•
2
I SCA R L ETT
3 4 HOPE
6 JUNE
' NANCY ' DE L I LAH
I
'
\
1 ZO E
i JANET
GEORGIE
1N E F E R T I T I
1 NATA L I A ' LESLIE 1 PA M I
! OLGA I E L IZABETH
M O L LY F L OSSY
I
I
1 D I AN A I MARTA 1 XANTH IPPE JENN I F E R
I CA R O L
VALERY
t CAN D I C E
Four new records added t o the file i n Fig. 23 . 2 .
for different prime packing densities. Box 2 3 . 1 shows the calculation , and the curves in Fig. 23 .4 give the results of this calculation. The systems analyst ough t to be able to find key-to-address conversion algorithms that do better than the roulette wheel, as we will see. Many are worse . The equations in Bo x 23 . 1 and the curves in Fig. 23 .4 give a useful guideline to the trade-off among prime packing density, bucket size, and the number of overflows.
Box 23.1
Pri me packagi ng density, bucket capacity and n u m bers of overflows
Let N = the total number of ba lls, M = the total number of compartments i n the roulette wheel, and C = the capacity of a compartment . Then the prime packi ng density = N/CM. Using the binomial d istribution, the probability that a given compartment will have x balls sent to it in N spins of the roulette whee l is Prob (x ) =
N! x ! (N-x)!
( I) ( x
M
I
-
1
M
)
n-x
The probability that there will be Y overflows from a given com partment is P(C + Y). The mean number of overflows from a given compart me nt is
L 00
Prob (C + Y)
·
Y
Y= l
The perce ntage of overflows for the entire roulette wheel is therefore 1 00
X
Z
00
·
L
Prob ( C + Y)
·
Y
Y= l
From this we can explore the relationship between bucket capacity, C, prime packing density, N/CM, and percentage of over flows in a hashed file . The results are p lotted in Fig. 2 3 . 4 .
If t he file i s on an electromechanical storage with a long access time, the primary concern may be to cut down the numb er of accesses. The systems analyst may decide to hold the overflow percentage to 1 %. As seen in Fig. 23 .4, he m ay do so by having a prime packing d ensity of 70% and a bucket capacity of 20 or more. On the other hand, access time may be of less concern than the efficient use of storage space. He may decide to use a 296
Hash i ng
Chap. 23
297
40 Prime packing density
1
t I
I I '&
10
0
30
1
20
10
2 Prime packing dentltV
Figure 23.4 The systems analyst can e xercise a trade-off between prime packing density , bucket capacity , and percentage of overflows. These curves are drawn from a key-to-address transform which perfectly randomizes the key set , like a roulette wheel. Compare with Fig. 2 3 . 7 .
prim e packing density of 9 5% a n d again use a large bucket size t o redu ce overflows. Conversely, he may want to avoid the com puter tim e needed to search a large bucket. If the file is in solid-state memory so that overflow access is not time-consuming, he may use a b ucket capacity of 1 and use a high packing density because this storage is expensive.
KEY-TO-ADDR E SS CONV E RSION
The key-to-address conversion a lgorithm general ly has three step s :
A LGO R I TH MS
1 . If the key is not numeric, i t is converted into a numeric form ready fo r m anipulation. The conversion should be done without losing information in the key. For example, an alphabetical character should not be converted into one digit (as in Fig. 2 3 .2) b ut into two. Alphanumeric data may be manipulated in the form of binary strings. 2. The keys are operated on by an algorithm which converts them into a spread of numbers of the order of magnitude of the address numbers required. The key set should be distributed as evenly as possible across this range of addresses.
298
Physical Organ izat i on
Part
11
3 . The resulting numbers are multiplied by a constant which compresses them to the precise range of addresses. The second step may, for example, give four digits when 7000 buckets are to be used. The four-digit number is multiplied by 0.7 to put it in the range 0000 to 6999 . This relative bucket number is then converted into a machine address of the bucket .
For t h e second step many transforms have been proposed and tested . It is d esirable that the transform distribute the keys as evenly as possible between the available buckets. Realistic transforms distribute the keys very imperfectly, and so overflows result. The fol lowing are some of the more useful candidates: 1 . M id-square method
The key is m ultiplied by itself, and the central digits of the square are taken and adjusted to fit the range of addresses. Thus, if the records have 6-digit keys and 7000 buckets are use d , the key may b e squared to form a l 2-digit field of which d igits 5 to 8 are used. Thus, if the key is J 7 2 1 48 , the square is 02963493 3904. The central four digits are multiplied by 0 . 7 : 3493 X 0.7 244 5 . 244 5 is used as the bucket address. This is close to roulette-wheel random ization, and the results are usually found to be close to the theoretiqal results of Fig. 23.4. =
2 . Dividing
I t is possi b le to find a method which gives better results than a random number generator. A simple division method is such . The key is divided by a number approximately equal to the number of available addresses, and the remainder is taken as the relative bucket address, as in Fig. 23 . 2 . A prime number or number with no small factors is used . Thus, if the key is 1 72 1 48 again and there are 7000 buckets, 1 72 1 48 m ight b e divided by 6997. The remainder is 4220, and t his is taken as the relative bucket address. One reason division tends to give fewer overflows than a randomizing algorithm is that many key sets have runs of consecutive n um bers. The remainder after dividing b y , say , 6997 also tends to contain runs of consecutive numbers, thereby d istributing the keys to different buckets. 3. Sh ifting
The outer digits of the key at both ends are shifted inward to overlap by an amount equal to the address length, as shown in Fig. 2 3 . 5 . The digits
Address length Key
Example: Key
I I I I I
Address
2
0
7
I
3
Fig u re 23.5 Key-to-address sion by shit ting.
conver-
are then added, and the result is adj usted to fit t he range of bucket addresses. 4. Folding
Digits i n the key are folded inward like folding paper, as shown in Fig. 2 3 . 6 . The d igits are then added and adjusted as before . Folding tends to be more appropriate for large keys. Address length Key :
E xample:
2
Key :
I
9 I I I I 7 I
Address:
I I a
0
7
Figure 23. 6
Key-to-address conversion by folding.
299
300
Part 1 1
Physi ca l Organ ization
5. Digit Analysis
Some attempts at achieving an even spread of bucket addresses have analyzed the d istribution of values of each digit or character in the key. Those positions having the most skewed d istributions are deleted from the key i n the hope that any transform applied to the other digits will h ave a better chance of giving a uniform spread . 6 . R adix Conversion
The radi x of a number may be converted , for example , to radi x 1 1 . The excess high-order digits may then be truncated . The key I 7 2 1 48 is converted to I x 1 15 + 7 x 1 14 + 2 x 1 13 + 1
X
1 12 + 4 x 1 1 1 + 8
=
266373
and the d igits 6373 are multiplied by 0 . 7 to give the relative bucket address 446 1 . Radix 1 1 conversion can be performed more quickly in a com p ut er by a series of shifts and additions. 7. Lin's Method [ 7 ]
I n this method a key is expressed in radix p , and the result i s taken modulo q m where p and q are prime numbers (or numbers w ithout small prime factors) and m is a positive integer. The key 1 72 1 48 would first be written as a binary string : 000 1 0 1 1 1 0 1 00 000 1 0 1 00 1 000 . Grouping the string into groups of three bits, we obtain OOO 1 0 I 1 1 0 0 I 0 OOO 1 0 I 00 I OOO 045205 1 0. This is expressed as a decimal number and divided by a constant q m . The remainder is used to obtain the relative bucket address. ,
=
8. Polynomial Division
Each d igit of the key is regarded as a polynomial coeffi cient ; thus, the key 1 7 2 1 48 is regarded as x5 + 7x4 + 2x3 + x2 + 4x + 8. The polynomial so obtained is d ivided by another uncha nging polynomial. The coeffi cient in the remainder forms the basis of the relative bucket address.
Chap . 23
Hash i ng
301
The best way to choose a transform is to take the key set for the file in question and simulate t he behavior of many possible transforms. For each transform, all the keys in the file will be converted and the numbers of records in each bucket counted . The percentage of overflows for given b ucket sizes will be evaluated . Several researchers have conducted e xperiments on typical key sets searching for the ideal transform [ 3 ] . Their overal l conclusion is that t he sim pl e method of division seems to the best general transform . Buchholz [ 1 ] recom mends dividing by a prime slightly smaller than the number of buckets. Lum et al. [ 3 ] say that the d ivisor doe s not have to be a prime ; a nonprime numb er with no prime factors less than 20 will work as well . Figure 2 3 . 7 shows some typical results. The dotted curve shows the theoretica l behavior of a perfectly randomizing transform like a roulette wheel . The points plotted show the average overflow percentages given by three common transforms on eight widely differing but typical key sets. The m id-square method is close to a theoretical randomizing transform . The division method performs consistently better than the randomizing transform. The folding method is erratic in its performance and so is the shifting method, probab ly b ecause of the uneven distribution of characters in the key sets. Shifting and folding almost always perform less well than division . The more complex methods such as radix transformation, Lin's method, and po lynomial division also perform less wel l , often because their behavior is close to that of an imperfect random number generator. The i deal transform is not one which distributes the key set randomly but one which d istributes it uniformly across the address space . CHO I CE OF
TRANSFORM
The behavior of the good transforms on act ual files is usually somewhat better than that of a perfectly randomizing transform but is fairly close to it. A systems analyst who is designing file layouts would therefore be employing a prudently conservative assumption if he used the roulette wheel calculation of Bo x 23 . 1 or the curves in Fig. 2 3 .4 for maki ng estimates of file packing density and percentage of overflows. He should use these curves along with knowledge o f t he hardware characteristics to select appropriate bucket sizes. DESIGN
R E CO M M E NDAT ION
w 0 N
40
: Fold ing, as in F ig. 23.6
I
�
40 : Mid-square hash i ng - l i ke a rani:fomizl � g transform -
s
: D ivision - better than a ra
30
l
rlctom iting tj-ansf
i
m I
t
i -1
a atic
'O
'O
10
= 90%
0
2
�x..-"\;: �
�x- -.a.'
""' ,/
.
�
x /
'1 ::T < � 0 QJ
J
0 �
"' QJ :J N QJ
lt
y random ransform ette whee 10
3 4 5
10
Bucket capacity
20 30 40 50
0
50
0 :J
60
70
80
90
Loading factor
Figure 23. 7 A comparison of t hree popular hashing algorithms with a perfectly random transfo r m . (Plotted fro m da ta averaging the results fro m eigh t different files, in R eference 3. )
'1 QJ �
Chap. 23
Hash i n g
303
There are two m ain alternative places to store overflows. They may be stored in a separate overflow area or in the prime area. The calculation of Fig. 23.4 assumes that they are stored separately from the prime area. If separate space is set aside for overflows the question arises: Should there be an overflow area for each bucket that overflows or should the overflows from many buckets be pooled? There are two primary techniques in use. One is called o verflow chaining, and the other we will call distributed o verflow space because it is similar to the distribu ted free space discussed i n the previous chapter. WHAT S H O U L D WE DO WI T H OVE R F LOWS?
Overflow chaining is straightforward when the overflow bucket capacity is 1 . If a record has the misfortune to be assigned to a full home bucket , a free bucket in the overflow area is selected, and the record is stored i n that overflow bucket. Its address is recorded in the home bucke t . If another record is assigned to the sam e full home bucke t , it is stored in another overflow bucket, and its address is stored in the first overflow bucke t . In this way a chain of overflows from the home bucket is built, as in Fig. 23 . 8 . The hom e bucket may have a capacity of one or many records. If the home b ucket size and load factor are selected appropriately, the mean chain length can be kept low. An overflow chain as long as that in Fig. 23 .8 should be a rarity. Nevertheless, the risk of multiple seeks to find one record can be red uced if the overflow buckets have a capacity greater than I . I n Fig. 2 3 .9 the home and overflow bucke t each have the same capacity , say , 1 0 records. The first bucket to overflow is assigned a bucket in the overflow area. It is unlikely to fill that bucket, so the next buckets to overflow are also assigned the same overflow bucket. The overflow bucket will be unlikely to overflow i tself, but such an event will happen occasional ly . When it does happen, the overflow bucket will be assigned another overflow bucket just as are the home buckets. If a record is deleted from the single-record chain in Fig. 2 3 . 8 , the chain is reconnecte d . If a record is deleted from the bucket chain of Fig. 23 .9, the chain cannot be reconnecte d . I nstead the empty record location is left, and another overflow may fill it at a later time. If the overflow buckets have many deletions, it may be desirable to reorganize the overflow area periodically . OVER F LOW CH A I N I NG
Part
Physical Organization
304
PRIME A R EA
Chain address
!
The hash ing routine al I ocates records E 2 , E 3 , E 4 and E s to a bucket which is a l ready fu l l .
11
OVE R F LOW AREA Chain address
o, 02 E 2
l
03 04 Os
\_
E,
02
)
Os 07 E 3 Oa Og 010 o,, o, 2 0,3 0 ,4 E4 O,s o,s E s 017
Figure 23.8 record.
Overflow
DIST R I B U T E D OVE R F LOW SPACE
chain with overflow bucket capacities o f I
In Fig. 2 3 . 1 0 there is no chain. Instead overflow b uckets are distrib uted at regular intervals among the home b uckets. If a home bucket overflows, the
HOME B UC K E TS
i
Chain address
o,
1
/
�
IU 3
9
U2
7
"-..
02
6
�
U3
Key K 1 Key K. 2 Key K� Key K 4
The numbers represent the sequence i n which overflows take place.
'I ..,
Hashing routine
o,
o,
�
OV E R F LOW B U C K ETS Chain
..,,
'-
r
1£_......
'-
r
1..-/
\... r
c
o,
t-----+-=i 0 2 ,_____, 0 3 1-----� o4 1-----+-=-1 05
05
07 Oa
o,
02 03
02
.LI 5
11
8
F----+-�1-----+--1
t-----t--1 09 1-----+--1 010
1-----+--1
�
�
,)
Figure 23. 9 Overflow chaining with overflow bucket capacities of many records .
305
Part I I
P h y s i ca l Orga n i za t i o n
306
Overflow bucket Key K 1 Key K 2
2 3 4 Hashing routine, which converts a key to a bucket number
Algorithm which converts a bucket number to a bucket address
5 6 7 8 9
f------1--i l'l "' -"" :J .c
"'
0 I
E
f------1--i
Fi rst overflow access
t-----i----
x ; A � x
. Attnbute A
Key compression
�Rear-end truncation � Front-end compression Parsing
Types offunction
Index output
/ M achine address Pointer to bucket � Rei a tive address Symbolic address
�Multiple pointer l ist (of variable length) Single pointer
Index output
Bit stream representing possible items
377
378
Ph y s i ca l Organ i zation
Part
11
I n any index i t is necessary to have a mechanism for insertions and deletions (we will d iscuss this in Chapter 3 0). A further advantage of the index in Fig. 2 7 .4 is that the file can be highly volatile with many names being added and deleted , and yet there can be very little change to the top three levels of index. The reader can observe this property by taking new n ames from a telephone dire ctory and adding them to the portion of the fi le shown . I f no index entries are ever deleted from the top three levels, these levels do not increase in size very much from that shown in Fig. 27 .4, and they eventually become completely nonvolatile . The insertion and deletion mechanisms concentrate on the fourth level of index.
SUMMA R Y
Box 27 . 1 summarizes the categories of index types.
28
A COMPA R ISON OF M U LTI PLE - K E Y O R G AN I ZAT I O N S
To illustrate i n a simple fashion the multiple-key organizations and indexing methods d iscussed in the previous two chapters, this chapter considers a data base of only 28 records, shown in Fig. 2 8 . 1 , and shows I 0 methods of organizing i t . There are five indexable attributes in the records : A 0 , man number (the unique identifier) ; A 1 , his nam e ; A 2 , the department he works in; A 3 , his skill cod e ; and A 4 , his salary. The remaining, and larger, part of the records consists of nonindexable details. The records in most cases are laid out i n ascending sequence o f the prime key, MAN NUMBER. They are shown occupying four hardware zones. The record addresses are written in the form X Y , where X is the number of the hardware zone and Y is the number of the record w ithin that zone. Figures 2 8 . 2 to 2 8 . 1 1 are d iagrams of organizations of these data. Many variations are possible on the organizations shown. When the reader inspects the diagrams he should extend them in his mind to a large file, perhaps occupying several d isk modules, with more secondary keys and many more attribute values. He should consider how the organizations may d iffer in the response times incurred in responding to multiple-key queries. The reader should also consider the question of maintenance. Some of the organizations are d ifficult or time-consuming to maintain , whereas others are easier. The attribute SALARY is a continuous range of numbers. To produce indices for this attribute it is quan tized into d iscrete ranges. Ranges of $ 2 5 0 are use d . Similarly , the attribute NAME i s referred t o b y alphabetical ranges. In some of the illustrations the first letter of the surname is used as an index entry . (In practice it would be better to d ivide both the SALARY and the NAME data-items into ranges chosen not for equal key separation, as here, but so that each range relates roughly to an equal number of records roughly equal chain lengths, for example.) ·
379
Part 1 1
Physical Organization
380
Keys:
Ao Man number
07642 07643 07650 07668 07670 0767 1 07672 07700 07702 077 1 0 077 1 5 077 1 6 07760 07761 07780 07805 07806 078 1 5 07850 07883 07888 07889 07961 07970 07972 08000 08001 081 00
A,
Name
M A R TJT G R E EJW H A LSPD F E I NPE SC H AW E MA RSJJ A L B E HA LON DAJ A N D EW F M A R TCH F L I NGA M E R LCH JONE KB REDFBB B LA N J E ROPE ES KA LNTD E DW A R B D A L LJ E J O N E TW W E I NS H KLEINM FREIHN M A N KCA F I KETE SCH E D R F LA N J E JOOSWE
Ai Department
220 119 210 220 1 19 1 19 210 220 1 19 220 1 19 220 119 1 19 220 220 119 220 1 19 210 1 19 220 220 1 19 210 210 1 19 210
A3
A4
Ski l l code
Salary
PL
1 900 2700 2000 1 950 3 1 00 1 200 2 1 00 3000 1 000 1 750 3000 2200 2200 2650
SE SE PL AD FI SE AD FI PL AD FO PL SE FO PL MA PL FI SE MA PL PL MA SE FI PL SE
Non-i ndexed deta i l s
2 1 00 1 900 2300 2040 1 050 201 0 2450 1 830 1 780 24 1 0 2500 2 1 00 1 920 3 1 50
Figure 28. 1
I t is assumed that fixed-length entries are used in the name index. The secondary key NAME is therefore truncated to the first four characters of the surname followed by the first two initials. The remainder of a person 's name is stored in the non indexable details portion of the record. Two people will occasionally have the same NAME key, in which case both their records will have to be e xamined to find the required name. Assume that a file consists of 1 00,000 records of 5 00 bytes each. The records have one primary key of 0 bytes in total.
I
A 1 I ndex
A2 I ndex
c "' o ..c u
c - "(ij 0 -5
.....
�o
"'
-
Name
"'O t: "'O "'
0 u
Department
A
1 .0
2
1 19
B
2.0
1
D
2.4
1
E
2.3
F
�
� "'O "'O 4:
.....
0 t: "' :;:;
A3 I ndex
A 4 I ndex
ro ....... ·
c
c · ro ..c o u
o ..c u "' .....
� "'O "'O
·;; "' a; a:
lU
Cl
N
0
2
x·
"'
The node is bl ock-searched
'O c "'
£ .c:
� "' ill
The node is binary-searched O �--�--�--� 0 60 50 30 40 20 10
N8 , n u m ber of entries in index block
Figure 30.4
An indication o f the effect of index block size on the number o f probes needed t o search the t h ree ty pes of tree index. Small blocks are preferable.
2
(I
IVRBl)
The mean number of probes for searching the entire index is therefore
E(Np ) As in Eq . ( 3 0 . 2 ) ,
=
_!:__ 2
Na
l
1 rv'NB 1
+
Therefore ,
E(Np )
(30.5)
Part
Physical Orga n i zation
414
11
"' "O c "' "' ::i 0
� c
E .�
1 40
0 0 0
6
s: 0
1 30
x "' "O c "'
�
"' c
1 20
"' "'
·;::: c
"'
0
� "'
1 10
..0
E
::i c
"'
0
f-
1 00 .__�----'-��-'-�---'��---' 0
10
20
30
40
50
60
N8 , n u m ber of entries in i ndex b l ock
Figure 30.5
A curve show ing how the index size varies with the num ber of entries per index block . Compare with Fig. 3 0 . 4 .
This equation, as before, can be used to cal culate the index block size which results in the r1 1 inimum number of probes. Figure 30.4 shows the result of this cal cu lation a lso . Again a fairly small block size is desirable. On the other hand, t he smaller the b lock size , the greater the total number of index entries. Figure 30.5 plots Eq . ( 3 0 . 3 ) for an index to 1 00,000 items. Comparing Fig. 30.5 with the curve for a block search in Fig. 30.4, it m ight be concluded that a b lock size of 20 to 30 items gives a reasonable com promise between saving index space and minimizing the number of probes. This is much sma l ler than the number of index entries tha t can be stored on a typical disk tra ck. A d isk track or other hardware subdivision should therefore contain more than one block of a tree index . I n reality the number o f entries per block may b e tailored t o fit the hardware that is used . One higher-leve l b lock may , for example , point to all the index blocks that are stored on a track. An index on each cyli nder is often used to point to each track on a cylinder. A hardware-independent
Chap. 30
I ndex-Searc h i ng Tec h n i q ues
415
storage organization, on the other hand , such as .that shown in Fig. 2 2 .4, may use the val ues suggested by the above equations. Technique 7:
A Balanced Tree I ndex with a Bi nary Search of the Nodes
One of the object ions to a b inary search is that time-consuming seeks may be required in the early stages of the search if the index is on an e lectromecha n i cal file. Most of the seeks can be avoided if a tree i ndex is used as in F ig. 30.3 and the nodes of the tree are binary-searched i n main memory . Suppose , aga i n , that the tree has L leve ls, and that the nodes contain Na items. As be fore , L
(30.2)
T h e number of probes needed t o b inary-search a block is g iven in Appendi x A [ Eq . (A. I ) ] as
L1 og 2 � J L:
j= I
j= 1
-
(We need to be more accurate than the approximat ion log2 Na I . ) The mean number of probes needed to search the index i s therefore
E(Np )
1-
l og2 N1 log2 Na
(Llog,
l
t''
N, J +
L
N1j
j= I
1) (1 -
j
U-1) -2 N1
+
Llog 2 N1 J L:
j= I
(30.6)
2�:' ')
1
The lower curve in F ig. 30.4 shows the effect .o f varying the block size on a tree i ndex with b inary searching. When binary searchi ng is used , the block size is less sensitive. If time is the overriding consideration, a b i nary search of an index tai lored i nto b locks so as to avo id seeks where possible is the best of the table look-up techn iques. However, as we commented before , a simple
Part 1 1
Physical Organ i zation
416
40 38 36 34
Serial scan
32 30
Balanced tree i ndex with a block search of the nodes
I
Balanced tree i ndex with a scan of
28 "'
26
0..
24
a.> .c 0
..... a.> .c
0
E::J
c: c: "'
E
a.>
�
u c: ., ., "' " "' 0- � " � -
The unbalanced tree o f Fig. 30.9 when the frequency of
423
0
3 4 1 4 4 4 4 4 4 4 2 4 4 2 4 4 4 2 4 3 4 4 4 3 4 3 4 4 4 4 3 4 3 4 4 4
4 3 4 4 3 4 4 4 2
Part
Physical Organization
424
11
Table 30. 1 . A tab le for Calculated Guess Addressing.
A B
c D E F G H
K L M N 0 p Q R s
T
u v w x y z
First Letter
Second Letter
0 0.0500 0 . 1 285 0.2005 0.2425 0.2661 0. 3 1 08 0.3688 0.4220 0.4 3 3 3 0.4554 0.4968 0.5495 0.6323 0.6608 0.6 7 5 3 0 . 7 2 20 0.7237 0.7952 0.8833 0.9 1 67 0.9 269 0.9403 0.9 8 7 1 0.9 8 7 1 0.9925
0 0.0781 0.0909 0 . 1 202 0.1613 0.29 1 8 0.3 206 0.3 345 0.3930 0.4607 0.4630 0.4672 0.5032 0.5 294 0.6022 0.6843 0.7058 0.7072 0.7736 0. 8 3 82 0.9 284 0. 9 5 6 1 0.9661 0.9 8 1 0 0.9840 0.9 9 9 1
alphabetical order. The second column relates to the first letter of the surname and indicates after what fraction of the index that name b egins. The third column relates in a similar fashion to the second letter of the surname . lt�ms beginning with K b egin approximately 0.45 5 4 o f the way through the index. Items beginn ing with KE b egin approximately 0.45 54 + 0. 1 6 1 3 X (0.4968 - 0.45 54) of the way through the index. Knowing the number of the e ntries i n the index, the address of the required index e ntry can be estimated. The search begins at that point i n the index. (The figures in Table 30. l relate to names i n New York and would be som ewha t d ifferent for names elsewhere). Technique 1 2: Algorith m I ndexi ng
I n Chapter 2 1 we discussed various algorithms used as a means of file addressing. Any such schemes could be used to locate an index block rather
Chap. 30
I ndex-Searc h i ng Tech niques
425
than the data themselves. They can give a faster means of locating the required record than a tree index because they provide a single-seek access to an index block which is equivalent to the bottom level of the tree (Fig. 30 . 1 1 ). On the other hand, they are not as fast as direct addressing without an index. What are the possible advantages of using this comb ination of algorithm and index? First , the combination makes possible a large variety of algorithms not sufficiently precise to find the record but which could locate an indexable group of records. Seco nd , the method overcomes a major d isadvantage of d irect addressing, namely that there is no independence between the physical positioning of a record on the files and its logical key. The physical positioning cannot be changed indepe ndently of the addressing a lgorithm. An example of algorithm inde xing which illustrates both advantages is t he location of passenger name records on an a irline reservation syste m . These records co ntain details of t h e passenger booking a n d are several hundred characters in lengt h . Fast access is needed to them because of the tight response tim e requirements. The file is both large (a million passenger name records are stored by the large a irlines) and high ly volatile (many tho usand new records are added each day) . Because of the high vo latility, it is undesirable to store the records seque ntially. Too many out-of-sequence additions would have to be chained to the records. Because of the fast response-time requirements, it is desirable that most accesses should require no more than two seeks, and the first of these seeks should be short if possible. In other words, an addressing scheme faster than a tree index is desirable . The passenger names fall into natural groups-those passengers who are booked on the same flight . An algorithm can be d evised based on the flight number and d ate with which a simple calculation and main memory table ' reference can produce a unique file address-the address of a passenger name index record for that flight. The passenger name index record is i n a fixed location on a part of the storage that is rapidly accessible. I t contains an index, or set of poin te rs. to the passenger name records for that fligh t , which, because o f their volatility , can be scattered anywhere within a large "pool" of fixed-length record locations. The index in this example is in segments among the data records. This is sometimes referred to as an embedded index. The variable pointer lists discussed in Chapters 24 and 2 5 , for the representation of tree and plex structures, for example, that in Fig. 2 5 . 8 , are in a sense embedded ind ices. Technique 1 3: Hash I ndexing
A particularly important type of algorithm is hashing, d iscussed in Chapter 2 3 . Hashing is usually used to locate a record d irectly. It could,
I ndex sequence set
An algorith m , or hashing, used to f i nd the exact or approximate position i n an index sequence set.
Figure 30. 1 1
426
Algorit h m or hash i ndexi ng.
Data records
Chap. 30
I ndex-Searc h i ng Tech niq ues
427
however, be used to locate an index block, thereby avoiding the need for higher levels of index ing ( Fig. 30 . 1 1 ). The combination of hashing and indexing can be better than indexing alone in that fewer index probes or access arm movements are needed . The main advantage of p ure hash addressing, namely that records can be found with one see k , is lost . However, hash indexing may be preferab le to pure hash addressing because it allows the records to be stored in an order determined by other considerations. The records could be stored sequen tially by prime key, for example, or stored with their parents in a tree structure. The most frequently referenced records can be placed on rapidly accessible parts of the storage devices. On some systems it may be valuable to be able to change the record layout without changing the hashing a lgorith m . Furthermore, the empty spaces in the buckets, which are characteristic of hashjng techniq ues, are spaces in the index, not spaces in the data storage location . Hash indexing is thus less wasteful of space than pure hash addressing of records. Furthermore, the overflows that are inevitable with hashing can be handled in the index rather than in the data records, where they incur t ime-consuming seeks. The bucket sizes can be large enough to give a low proportion of overflows (See Fig . 23.4.)
Where a tree index is used , there are various possible sequences in whlch the blocks can be laid out. In some software packages the levels of the LA I D O UT? tree have corresponded to sections of hardware such as a disk track. However, it is clear from Fig. 30.4 that the optimum b lock size is quite small, and hence two or possibly three compacted index levels could fit onto one disk track. The index may spread over many contiguous tracks or reside i n a large solid-state module , in which case four, five , or six levels m ight be contiguously stored . Consider the three-level tree index in Fig. 30. 1 2 . The subscript 1 indicates a pointer to level 1 , and the subscript 2 indicates a pointer to level 2 . Figure 30. 1 2 shows three common ways of laying out such an index seque ntially. Other ways are also possible. The first method lays out the three levels ea ch without a break. This method has the advantage t hat all the sequence set items are contiguous and hence facilitates sequential operations. The second method is sim ilar to the method of laying out tree structures sequenti a lly, d iscussed in Chapter 24 (Fig. 24. 7 ) . Such a layout is advantageous with data in certain application s because the records are HOW A R E T H E
I N D E X B LO C KS
Part 1 1
Phy sical Orga n ization
428
1 . Level-by-level layout: K2 R 2 Z2
B, H, K,
M1 P1 R,
T1 W 1 Z 1
A
B
G
H
I
K
L
M N
P Q
R
S
T
U
W
X
Z
2. Top-down-left-right layout:
_.,,/
_ _ _ _
3. Bottom-up-left-right layout: A
B G
H I
Figure 30. 1 2 T h ree sequential layouts for a tree index.
needed in a top-down-left-right sequence. The sequence set items may be chained together to facili tate sequentia l processing. The third m ethod stores the parent node after their children . If an entry is changed , all the necessary modifications can be made scanning in a left-right d irect ion . First the level 1 entry is changed , then the level 2 entry if necessary, then t he level 3, and so forth if higher levels exist. Left-right operations l i ke this are convenient on serial storage devices whi ch do not permit a backward sca n , l i ke some magnetic tape units.
Perhaps more important tha n the layout sequence DE LETI O N S is the method of inserti ng and delet ing entries. As before, either distributed free space or overflo ws are applicable, and often both are used together, overflows com i ng into operation when the free space in an area runs out . I NS E R T IONS AND
1 . 18 entries, with th ree entries per block:
A
B
2. Entry
A
C is added :
B
3. Entries D and E are added :
A
B
4. Entry F is added, and this insertion effects level 3 of the index:
A
G
B
�
�
H
F,
Figure 30. 1 3 A bottom-u p , left-right , uniform height tree index with distributed free space at the bottom level only .
1 . 1 8 entries, with 3 entries per block :
A
8
2. E n try C is added; and a level 1 overfl ow block is created :
A
B
H
3. E n tries D and E are added:
A
B
E
4. E ntry F is added, and another overflow block is needed :
A
E
B
Figure 30. 1 4 The same index and e n t ries as in Fig. 3 0 . 1 3 , but with distributed free space at the second level only.
I ndex-Searching Techniques
Chap. 30
431
If d istributed free space and overflows are used in conjunction, at what level in the index should the free space be d istributed? Figure 30. 1 3 shows the index of Fig. 30. 1 2 with free space distributed at level 1 . Figure 30. 1 4 shows t he same but with the free space at level 2. The shaded parts of the diagrams are the free space . The diagrams show four new e ntries being added , and, to make life d ifficul t , the entries are highly clustered . I n Fig. 30. 1 3 , b ecause the free space is at the lowest leve l , a shuffli ng of entries at the lowest level a ccommodates the new arrivals. Only when the fourth item in the cluster is added does level 3 of the index have to be modified . I n Fig. 30. 1 4 the total storage space utilized is less because the free space is at level 2 . However, lengthy jumps are needed w hen fol lowing the sequence set. If these j umps do not exte nd b eyond a cell boundary, i . e . , do not incur a seek, then they are of little concern, and the layout in Fig. 30. 1 4 is better than that i n F ig. 30. 1 3 . The free space could be pushed to h igher index levels or could be eliminated entirely in favor of overflows. This decision should depend on where seeks are incurred and on t he trade-off between saving space and saving time.
SUMMAR Y
Bo x 30. l summarizes index-searching techniques. Box 30 .1
Technique
I llustrated in Fig.
I ndex-Search ing Techniques
Comment Poor. Use only for small b locks.
Serial scan Block search
21.1
Suitable for a smal l index or a portion of a large index .
Binary search
2 1 .2
Advantage : Fast for an index i n m ai n memory. Disadvan tages : Very poor if it i nvolves mechanical seeks to and fro. Entry com paction not possible. I nsertions and deletions difficult.
432
P h y sical Organization
Box 30.1
Technique
l llustra ted in Fig.
continued
Comment
Binary-tree search
30.2
A d van tages : Fast insertions and deletions ha ndled efficiently. Disadvan tages : Space taken up with pointers. Entry compaction not possib le .
Balanced tree index
30.3
Nodes searched b y Serial scan, poor Block search, better Binary search, no compaction Fast . Economica l , Can be tailored to hardware . Techniques are needed to handle insertions and deletions.
Unbalanced tree index
30.9
A d van tage : Frequently used records found more quickly . Disadvan tages : Infrequently used records found more slowly. Com plex maintenance.
Algorithm index
30 . 1 1
Faster than a pure index. May d estroy data independence.
Hash index
30. 1 1
Faster than a pure index. Better space utilization than pure hashing.
30.8
Can take advantage of clustered file reference patterns to speed to addressing.
30.7
C a n speed u p sequential o r skip sequentia l operations.
Other techniques
Look-aside buffers
P a rt 1 1
31
DATA COM PACT I ON
A variety of tech niques is available for compressing data in order to reduce the space i t occupies. I n many instances data compaction can achieve dramatic savings. Most files can be cut to half their size , and some existing com mercial files can be cut by as much as 80 or even 90%. Data compaction techniques are not new, and in view of the savings they can give it is surprising that the y have not been used more frequently. Com paction techniques may be used whe n storing any data but are perhaps of most value wit fr large archival files that are infrequently read . We have already discussed the use of compact ion to reduce the size of indices ( Figs. 2 7 . 2 , 2 7 . 3 , and 2 7 .4). Compaction may also b e used when transmitting data and can often double or triple the effective speed of a comm unication line. Compaction methods fal l into two categories : first , t hose which are dependent on the structure of the records or the content of the data and hence m ust be specially written for a given app lication , and , second , those whicli. can be applied to many applications and hence can be built into general-purpose software, hard ware, or m icrocode. Packages for reducing file size are available from a number of software firms. The work for reducing the size of a file may begin with methods which are dependent on the content of the data and then continue by using application-independent coding methods. We will review the content dependent methods first. 1 . E l i m i nation of R edundant Data I tems
An important method of reducing data-base storage size is to eliminate t he redundancy that exists d ue to the multiple storage of identical data items 433
Part 1 1
P h y sical Organiza tion
434 J u l ian Date
4 packed decimal digits cou nting days sequen t i a l l y from M a y 24, 1 96 7 = 0000
Figure 3 1 . 1
I
J
B i nary Date Year 7 bits
Month 4 bits
I
Day 5 bits
Two ways o f represe nting d ates with 1 6 bits.
i n separate files. This is one of the major objectives of data-base management systems, and it uses the techniques discussed i n earlier chapters. 2. Conversion from Human Notation to Compact Notation
When fields are stored in the form in which humans prefer to read them they often contain more characters than are necessary. Dates, for example, we may write as 1 2 NOV 1 9 7 6 , or, in our most compact written form 1 1 . 1 2 .76, and so dates are often stored as 6 bytes in com puter files. In the machine , however, the month needs no more than 4 bits, the day can be encoded in 5 bits, and the year usually needs no more than 7 bits-a total of 1 6 bits, or two 8-bit bytes ( Fig. 3 1 . l ) . Conversion from the 2-byte form to human-readable form n eeds only a few lines of code . Another common way of representing dates is in the Julian form proposed by Joseph Scalizer in 1 5 82 for astronomical uses. Scalizer represented dates as the number of elapsed days since Jan. I , 47 1 3 B.C. Using this schem e , Jan. l , 1 976 is 2 ,442,779. Often, only the four low-order digits are used. May 23 , 1 968 is the n 0000, and dates are counted from that day, requiring 1 6 bits in packe d decimal notation. Many other items such as part numbers and street addresses can often be compressed similarly . 3 . Supression of R epeated Characters
Numeric fiel ds in some files contain a high proportion of l eading or tra i li ng zeros. More t han two zeros can be encoded into two ( packed decimal ) characters-one character to indicate repetition and the next to say how m uch repetition. Some files contain repetitive blanks or other characters, and these can be dealt with in a similar manner. One character suppression scheme uses a unique group of characters to indi cate that the character following that character is repeated . Where the conventional 8-bit E BCDIC character encoding is emp loyed the majority of bit combinations are usually not used to represent data characters (see Fig. 3 1 . 2 ) . Any character with a zero in the second position i s not normally
435
Data Compaction
Chap . 3 1
Bit positions 0 and 1 01
00
Bit positions 2 and 3
Bit positions 2 and 3 '
l II 00 01 1 0 1 1
0000
00 01 1 0 1 1 & -
I
0001
Bit positions 2 and 3 00 01 1 0 1 1
Bit positions 2 and 3 00 0 1 1 0
>
I 200 l
-
-
A
1
A1
THEN Ax THEN Ax
x
+
A1 - 1
x + A 1 - 1 200 1
In Fig. 34. l , the first incident on Monday is referred to by an incident number M I . There are 800 I incidents on Monday, and the last of these , number M800 1 , i s written i n relative file address 5 20 1 . The first incident o n Tuesday, number T l , i s then written i n relative file address 5 202. The 4000th incident on Tuesday overwrites incident M I . When reference to a specified incident record is communicated within the machines, program to program , the relative file address is used . When reference to the incident involves communication with people, the incident number, e.g. , T I , is used . I ncident records are normally dumped from the file onto tape from 24 to 25 hours after they occur. The dumping of a few hundred contiguous records thus takes place every hour. The tapes are then available for batch processing if needed. The records could be dumped onto a separate nonvolatile information system data base , which could be utilized by detectives. It is regarded as highly improbable that there will be more than 1 2 ,000 incidents in one day . If the city is in more turmoil than normal and the incidents pass the 1 2 ,000 count, then the last incidents for that day will begin to overwrite the first. The system will detect the high volume long
Volatile F iles
Chap. 34
497
Static file of locations
-
-
-
-
-
--
- ...... 1-------�
-- - - - - - �.-1------.j...-I /
,,, ......
,,.,.,. ,,,., ,,,,....
1
Pointer to most recent incident at this location I ncident type Time
Pointers to records of police vehicles assigned to the i ncident Figure 34. 2
The volatile circular file contains codes and poin ters to static files which permit a variety of inquiries to be p rocessed .
before the overwriting takes place and will ensure that in cidents are logged onto tape before they are destroyed . If the city repeatedly approaches a number of incidents near to 1 2 ,000, the size of the circular file will have to be increased . This volatile file i s accessed i n a variety o f different ways. I t may be accessed by INCIDENT-NUMBER, which is, in effect, its prime key . It may be accessed by pointers from the records of the status of poli ce vehicles which have been assigned to the incident. A member of the public may telephone and say "Why has nothing been done about the incident I reported half an hour ago?" The operator receiving the cal l does not know the number of the incident i n questio n , so he enters its location-for example, street i nterse ction. A record exists for every such location in the city, and this record contains a' pointer to the most recen t incident which occurred at that location ( Fig. 34.2). Each INCIDENT record is chained to
Part I I
P h y sical Organ ization
498
the previous I N C I D ENT which occurred at the same location. The computer can fol low this chain looking for an incident of a given type or types or an incident reported at a stated tim e . The INCI DENT records contain pointers to the records of police vehicles ( primary and second vehicles) assigned to the incident. Note that although the file is h ighly volatile , there is no problem with linking and unlinking chains. The major part of the incident record is a verbal description of incident and comments about i t . The incident records are of fixed lengt h , a n d sometimes t h e description extends far beyond the allotted record siz e . An overflow pointer is therefore used to a block or chain of b locks in which the verbiage can continue. This file of overflow blocks is as volatile as its parent file. I t could therefore be constructed as another circular file , as in Fig . 34.3 . In reality, however, it is often more economical to use a technique of assigning blocks at random from a pool of available blocks. This technique is called dynamic block allo cation. OVE R F LOW
/ M800
T3799
M2800 "..l
�
M 1 800
--;,
·a
'o ·Ooo
�
�
0 0 0
M3800
�
Relative file address
0
8
9000
T2799
Overflows from the fi xed·length blocks in the
�
3000
'i1ooo
ioCl\f)
§
"\
T 1 799
Ol 0
8
M 5800
add ress
�>;:,\:) M6800
� � M7800
T799 Figure 34. 3
Overflows from the file in Fig. 3 4 . 1 could be handled by a second circular file , as here . I n reality, it is usually more e conomical t o use dynamically allocated storage blocks a s in F i g . 3 4 . 6 .
SC2v
499
Volatile F i les
Chap . 34
Using dynamic block allocation, a pool of fixed length blocks is set up with a mechanism for A L LOCAT I O N allocating an unused b lock to any file which needs the space. Many d ifferent files may share the pool of blocks. A file may be stored in a chain of blo cks, as in Fig. 34.4. This file, unlike a circular file , can be expanded or contracted to meet changing requirements. On the other hand, the blocks are not contiguous as with a circular file , and so a seek may be needed in going from one record to the next. A circular fi le can thus give faster response time. A chain of b locks can be orga n ized as a first- i n -first-o u t ch a i n , as in F i g . 3 4. 4 , or as a push-down stack in which blocks are removed from the end of chain that new blocks are attached to , i .e . , first-in-last-out. The control mechanism must keep track of all the blocks which are not in use at any one time. The easiest way to do this is to chain together all the unused blocks, as in Fig. 34.5 . This chain is referred to as an un co m mitted storage list. Records which are an overflow from another file because the other file did not have large enough blocks, as with the police INCI DENT record overflows, would not be stored in one chain . I nstead , there would be pointers from the original file, as shown in Fig. 34.6 . A large number of extra blocks could be added to a record if it needed the space. When the parent record is deleted , the overflow blocks will be chained back to the uncommitted storage list. DYNAMIC
B LOCK
J
.......
-
�-
-"'""'"""
....
......,
..,
v
I
N ew items are
a dded to one e nd of the chain
..... ......
I/ ....... ,_....J.
J�
.......
.......
Old i terns are delet ed from the o ther end
-
-
I°'..
-
"
,.,.,,_
r--r--�
.....
_,,,..
i.--
...i.---
v I/
J
.....
/
,
-�
.ii
I
A volatile file may be constructed by chaining together blocks from a pool o f blocks, as here .
Figure 34. 4
I
New blocks are obtained from the end of the chain and rel eased blocks are added to i t
I '
�
,,.,...
,
"'-
,.,,. .....
/ /'
...... -�
I .,
�-lo-
.......
--...
�
.......
......
,,.,...
�
I
'
I I
�
r--.,.__ I" ,....._
�r-...
I'-.
-
I'
v
,,
Figure 34.5
Uncommit ted storage list . The blocks in t h e pool w h ich are not in use are also chained together. C i rcular f i l e
Overflows in dynamic block-a l l ocation pool
M2800 M3800
N
0
0 0
M 800
T 3799
Rel ative file add ress
9000
T 2799 en
8 o
� �
M 7800
T 799 Figure 34. 6
Overflows from a circular file handled by d y namic block
allocation .
The circular file itself could be constructed from blocks in the pool, but in order to eliminate needless seeks they may be contiguous blocks chained toge ther in a ring. With this mechanism the size of the circular file can be expanded when necessary . 500
50 1
V o la t i l e F i l e s
Chap. 34
The designers of the police INCI DENT file had one advantage . Although it was highly volatile , they I N L E NGTH could make an assumption about its length and use a fixed number of records. Some files are both volatile and vary greatly in length. For these it is usual ly uneconomical to use a fixed-length circular file like that in Fig. 34. 1 . I n an airline the passenger booking records constitute a volatile file. Thousands of new bookings are made each day, hundreds of cancelations occur, and thousands of old bookings are purged from the files. The record F I LES W H I CH
VA R Y G R EATLY
giving d e t a i l s o f a passe n ge r a n d h i s b o o k i n g m u st b e I .i n k e d t o the fl igh t
record . The flight record may have no passengers yet, or it may have any number up to 400 ( for certain 74 7s). We thus have a two-level structure, but the file which constitutes the lower level is both volatile and widely varying in length. In practice the higher-level record type is referred to as a PNI ( passenger name index) record , and the lower-level record type is referred to as a PNR ( passenger name record ) . The PN R contains full details about the passenger and his booking, and the PNI contains some brief information about the flight fol lowed by a set of pointers to the PNRs of passengers on that flight. Figure 34.7 shows the schema for this and has the added com plications of wait lists for a flight and "other airlines" (passengers may book journeys which include legs on other airlines). A typical airline reservation system handles its volat ile and variable length files by using two dynamically allocated block pools, one with larger
OTH E R A I R L I N E F L I G HT P N I
F L I G HT PN I
W A I T L I ST PN I
PNR ( PASS E N G E R D E T A I LS)
Figure 34. 7
Schema for airline booking records.
.c
C> c
·�
I nventory master
Other a i r l i n e PNI
PNI
·- -0 � -0 "' "' -c .... "' 0 c u 0 ., · .... ...
... ....
�
-0 ., > x c ·0 u. u
-----
- - - - - - - - - -- - - - - - - - - - -- - • - - • -
"' -"' u 0
� "O Q.) J? LO E"' ... in "' o c g _ >-
·-
o. O m ....
g
--
--
City-pai r record overflow
----
-
Other airline PNI overflow
PN I overflow ---
"' -"'
L_ _ _ _ - - - - -
I I City-pa i r I records I I I - - - - �- - �- - -- - - - - - - - ,I- - - - - - - - ---
Waitlist PNI
J5
' 0 ., > 0 ... = c. >
_ _ _ __
City-pai r table
I nventory detail
-
--
- - - - - -
--
-
I
�-
-- -
Passenger deta i l s ( PN R )
- - --- -
--- -
-
I nventory master overflow
-
- -- -
-· -
- - - - + - -- - - - - - - - - - - ---- - - -
E
0 a. -0 QJ ...
., .... > .0 '
� g�
�
> - .....
Om o
--
-
---
I I I I
I
I n ventory detai l overflow
I I I I
! i
' 0 J5
> = "' u ·-
1
PN R overflow
!....-.+-!
F l ight information
I I I I I
I
I
i I
- -- - - - - --- - - - - - - - --- - - - - - - - --- - - - - -
_ _
-.J _ _ _
- - -
- - -- - - - - - -
-
- - - - --- - - - - - - -- - - _ _ L _ _ _ - -
Figure 3 4 . 8 The volatile records in an airline reservation system are stored in two pools of dynamically allocated blocks, one of shorter and one of longer blocks. O nce store d , n o record need be moved.
- - - -- - - - - - - -- - -- - - --
Chap. 34
Volatile F iles
503
and one with smaller blocks. The left-hand side of Fig. 34.8 shows the PN R and PN I records with their overflows. The remainder of the figure shows the other major records which share the block pools. The flights which require few blocks will tend to balan ce the flights which require many blocks, and that is the reason for dynamic block a llocation. Once stored in a block, the records will never have to be moved until they are purged from the storage .
35
FAST - R E S PO N SE
SYST E M S
A design requirement o f many real-time systems i s that they should give suitably fast responses to messages originating at terminals. How fast the responses should be depends on the nature of the messages and the application. I have discussed this subject e lsewhere [ I ] . Many dialogue systems need a response in about 2 se conds-not much longer than the response time required in human conversations. The contracts for some systems have- stated that 90% of the messages will have a response time of 3 seconds or less, where response time is defined as the in terval between the last action the terminal operator takes on inpu t and the first character of the response being prin ted or display ed. I f a real-time system has this 90-percentile response time of 3 seconds, its mean response time will normally be less than 2 seconds [ 2 ] . The requirement for fast responses constrains the design of both the teleprocessing subsystem and the data-base subsystem. I f the teleprocessing subsystem introduces delays (perhaps of a second or so ), then the data-base subsystem m ust react appropriately faster. Sometimes the response requires the accessing of two or more files or the accessing of many records. In this case the access times must be low. The need for fast access is still greater on systems which handle a high transaction volume in which several , or many , responses must be generated each secon d . O f t h e techniques w e have d iscussed for physical storage organization, some are inherently fast and others inherently slow. High-volume, short response time systems m ust use the fast ones. Fast retrieval techniques can generally be found when all accesses are to single records via their prime keys. It is often much more difficult to find a fast retrieval method when files have to be searched or accessed with multiple keys. A general-purpose 504
Chap . 34
Fast-Response Systems
505
high-speed "search engine" has yet to be buil t , although the associative memory techniques discussed in the following chapter are promising. In genera l the methods for giving fast responses in clude combinations of the fol lowing: 1 . S toring frequently referenced data on fast devices. 2. Positioning data so that lengthy seeks are avoided where possible . 3.
Selection of addressing and search schemes which require few seeks, preferably only one per retrieval.
4. The use of multiple operations in parallel .
Box 3 5 . 1 categorizes techniques described in earlier chapters according to whether they tend to be good or bad for achieving fast responses. If Box 3 5 . 1 is compared with Box 34. 1 , it will be seen that some of the techniques which tend to be good for fast response are also good for volatile files. This fact may be helpful, as some fast-response syste ms are indeed volatile. The first block of items in Box 35 . I relates to embedded chains. As we have d iscussed before , there is no hope of achieving fast responses if lengthy chains have to be fol lowed from record to re cord with seeks between the reading of the records. Rather than use e mbedded pointers it is better to store the relationships separately from the data where they can be accessed rapidly . For fast multiple-key searches, inverted list files rather than chained files are use d . Inverted list structures take more storage , and, as is often the case , there is a trade-off between retrieval speed and storage utilization . EMBEDDED
CHAINS
A s indicated earlier, the trade-off between storage and time exists a lso in the techniques available for addressing single re cords. Indices can be designed to minimize storage space or to minimize index-searching time. Algorithm addressing can avoid index searching e ntirely but is likely to waste storage space . Binary searches of indices are fast but prohibit most forms of entry com paction. ADD R ESS I N G
TECH N I Q U ES
S E R I A L V E RSUS PARA L L E L OPE RATIONS
Where a lengthy search is necessary , the more it can be broken into fragments which proceed sim ultaneousl y , the faster the results will be
Physical Organ ization
506 Box 35. 1
11
(Compare with Box 34. 1 )
Techniques Which Tend To Be Poor for Fast-Response Systems
Embedded chains structures
Part
and
ring
Tech mques . which Tend To Be Faster
Relations in a unit separate from the data Inverted list structures
I n d ices designed for space optim ization (or low storage cost)
Indices in main memory Algorithm or hash addressing
Scanning large cel ls
B i n a ry s e a r c h e s memory
Single-server queues for channel or access mechanism Serial organization
Layout of files to permit multiple simultaneous operations Parallel organization
Machine-independent organizations Application-independent organizations
Organization tailored to hardware Organization tailored to application
Insistence on similar tuning for all responses
M an-machine dialogue t ailored to storage organization to minim ize the effects of timeconsuming actions
in
main
obtained. On systems in which a queue builds up, the more servers serve the queue simultaneously , the shorter the queuing time will be. Queues build up for the file channel and also for the file-access mechanism. The latter is usually the more serious because the access time is substantially longer than the channel time . It is therefore desirable on some fast-response systems to have multipl e channels and, more i m portant, multiple access mechanisms operating i n p aralle l . To a chieve this on a d isk unit the files are sometimes laid out spanning the disks as in Figs. 1 8. 3 and 1 8 . 5 , with the m ost frequently used files clustered together o n the central cylinders as in Fig. 1 8 . 6 .
Chap . 35
F a st-Response Systems
Box 35.1
507
con tinued
Techniques Which Tend To Be
Techniques which Tend
Poor for Fast-Response Systems
To Be Faster
Real-time updating Real-time insertions and de letions Poor techniques for storing insertions (especially clustered insertions)
Lengthy updates deferred until later Insertions deferred to off-line operation Informational data base separate from operational data base Redundant storage of certain items, designed to minimize response times Adaptive organizations Use of peripheral machines in handling dialogue Hardware designed to facilitate parallel searching
I t is generally desirable to construct data-base software which is application-independent. I n I N D EP E N D E N C E other words, i t can be used on a wide variety of different applications. Software producers also strive for techniques which are machine-independen t, i . e . , can be run on different com puters and with d ifferent file hardware. The more machines and app li cations a given software package can be used with, the greater the return on the cost of writing i t . Nevertheless, much o f t h e software which is successful in achieving fast response times has its p hysical storage mechanisms tailored to specific file units ( for e xample , takes advantage of specific track lengths or puts an index at the head of each cylinder). Further, some of the applications which are most demanding in their response-time requirements use physical storage MAC H I N E A N D
APP L I CATION
508
Physica l Organization
Part 1 1
techniques which are specifical ly tailored to the applications. This is especially true when short response-time req uirements are corn bined with a high transaction volume. It is usually easier to a chieve fast response times when the accessing techniq ues are tailored to the hardware and to the application. There may thus be a trade-o ff between generality and speed of response at a given cost .
The reason for fast response times i s usually t o provide a psychologically attractive man-machine dialogue. In dia logues, however, it is not neces sarily the case tha t every terminal action needs a fast retrieval from the data base . At certain points in the dialogue an operator may have completed a set of en tries and is mentally prepared to relax for a moment whi le the machine acts upon what it has been tol d . Such a strategic moment in the d ialogue is referred to by some phychologists as a closure [ 3 ,4 ] . The closure points can be useful in designing the programming in that e xceptiona lly lengthy retrieval operations can be saved for these moments. In an airline reservation dialogue, for example , a seq uence of fast (average < 2 seconds) responses is needed as the term inal operator establishes what seats are available and what flight connections the trave ler will make and as details of the passenger are entered . The operator then presses an END TRAN SACT ION key and relaxes. This is a moment of closure. At this point a 5-second response would be acceptable. The closure concept could have been put to use in the data-base design because it is when the closure occurs that the time-consuming updates of the files and the insertion of the new passenger record are performed. System designers often make the mistake of demand ing a blanket response-time requirement (90% of the responses in Jess than 3 seconds) rather than tailoring the response times to the dialogue structure. Again, a given reasonable file design may not be able to provide a fast response to certain retrieval operations. An operation may need a search which takes many seconds or a lengthy paging access to a backup store . l n such cases the dialogue should be designed to disguise t h e delay, saying so mething to the operator and perhaps e liciting his response while the retrieval is taking pla ce . The operator must not suddenly be left in prolonged suspense in the middle of an otherwise fast dialogue. l f the designer can "keep him talking" while the search goes on, he has gained time which might allow more economical but slower searching techniques. D I A LOG U E
CONS I D E RAT I O NS
Chap. 35
Fast-Response Systems
509
NON - R E A L-T I M E
It usually takes longer t o update a record than t o read i t . The updated copy has to be written back on the files and checked , and sometimes chains or secondary indices have to be modified . On some systems, however, it is not necessary to update all of the records in real time. Certain files can be updated later, or at least new records can be saved until later before insertion. The access mechanisms and channels are then less utilized, and it is easier to achieve fast responses on the remaining real-time operations. Figure 1 6 .2 illustrated the separating of an operations system and information system using the same data. This separation facilitated the handling of volatile fi les. It can also facilitate the provision of fast responses. The operational system files may be updated in real time, and informat ion system files may not because it does not matter if the information searched for is 24 hours out of date. In practice many information systems use this approach and may obtain their data input, off-line, from more than one operational system. UPDATES
Separating the information and operations data in troduces a form of redundancy into the storage . Much of the data is kept twice . One of the objectives of data-base organization is to minimize redundancy . Nevertheless, to achieve fast responses many systems d uplicate certain critical portions of the data and p lace a portion of it, often highly condensed , where it can be accessed quickly . Airline reservation systems, for example, have a high throughput of transactions as well as stringent requirements for fast response times. To meet the response-time requirements the data base contains a carefully planned measure of redundancy in the data elements, the data paths, and the data stru ctures. It is designed so that the most frequently occurring data accesses can be handled with one see k and less frequently occurring accesses can be handled with two seeks. As few accesses as possib le need more than two seeks. By far the most common transactions are the availability requests, which ask whether seats are available on certain flights. Inventory records are maintained for all flights, giving details of the seats sold and the numbers of seats on the plane. These records could provide an answer to the seat R E D U N DANT
DATA
510
P h y sical O rganization
Part I I
availability requests. Most airline systems, however, provide another record which gives the answer in a much more compact form. Only 4 bits are required to indicate whether seats are available for a given class of a given fligh t . The first 3 bits indicate the number of seats available from 0 to 7 (bit pattern 1 1 1 indicates 7 or more seats available ), and the fourth bit indicates whether there is any restriction on selling the seats (the seats may be for a single leg, e.g., London to Rome, on a long multileg flight , e.g., London to Australia, and selling them may be prohibited unless the passenger books a long multileg journey ; this restriction is designed to maximize the prof itability of the operation). The 4-bit groups for many flights, legs, and classes may be stored compactly in one record . In many airlines, the availability status of an entire day's flights is contained in a single record . These AVAI LABI LITY records are then stored in a location selected for fast access. The AVA I LA B I LITY record is entirely redundant, but it takes up only a small fraction of the storage and substantially improves performance .
Another way to reduce the mean response time is to e mploy an adaptive organization which uses some automatic method of moving the m ost frequently referenced items into locations in which they can be referenced most quickly. I n Chapter 20 we mentioned percolation for moving frequently referenced items to the head of a chain . This is an adaptive organization but is usually not fast enough for real-time systems. A multilist organization ( F ig. 2 6 . 3 ) can be made adaptive by using variable-length lists (chains). Short lists can be used for frequently used keys and Jong lists for infrequently used keys. I n other words, there is a higher degree of inversion for the frequently referenced items. Such an organization may be modified automatically at maintenance time on the basis of key-usage statistics, which are continuously recorded . A better technique than either of these , given appropriate hardware, is the use of the staging mechanisms d iscussed in Chapter 3 2 . An algorithm is used which attempts to keep those items most likely to be referenced in the fastest level of storage . A systems analyst today often does not have available to him adaptive multilist software or storage hierarchies with suitable staging control. To make his system adaptive he must construct some means of reorganizing the data periodically . One stockbroker system, for example , uses tape and d isk units. Data which are stored about stocks are sufficiently bulky that they have to be on ADAPT I V E
O R G A N I ZATIONS
Chap . 3 5
Fast-R esponse Systems
511
tape . On any given day , however, a relatively smal l number o f stocks are highly active . Data for this active group are stored on disk where they can be accessed quickly. The members of the active group change as new stocks become "hot." Consequently , a periodic batch run transfers the newly hot stocks from tape to disk . Sometimes the systems analyst can predict ahead of time that certain records will be more active than others but that the active group will change. For example, in an airline data base most of the activity re lates to flights that take off in the next few days. A large quantity of bookings may exist for flights months ahead , but these will be rarely read. Because of this, two types of inventory records are used which are entirely different in structure : DETA I L records, which relate to a se le cted number of days into the fu ture called the current period , and G ROSS records, which relate to the time beyond that period. The records contain similar information, but the DETA I L records are more quickly accessib le at the expense of taking up more store . Box 3 5 .2 illustrates these .
There are several ways i n which hardware can be designed i n the future t o facilitate fast-response file operations. First, disk or other electromechanical devices can be designed with more logic in their control units to permit parallel searching. With today's disks, if a track is being searched for a record with a given key, i t is not necessary for the entire contents of the track to be read into main memory. A controlling device can be used to recognize the requisite key and read only the record with that key. It would be possible to build a "search engine" which looks for specified secondary keys as well as primary keys and which searches more than one track at a time . The upper part of Fig. 3 5 . 2 shows a disk unit with a module search con troller on each module . The lower part extends the idea, showing a head search con troller on each track, all searching in parallel for records with certain specified key characteristics. The module search controller may have the capability to advance the access mechanism as the search progresses. A similar arrangement could be employed on any future electromechanical storage unit. Prior to the advent of large-scale-in tegration circuitry and m icrocomputers, search controllers on each read head were too expensive to be practical. As the cost of mass-prod uced logic circuitry drops, so parallel search controllers become more attractive. An entirely different approach to fast file operations is to build a very large solid-state buffer-let us say between 1 0 million and I billion bytes. In F U TU R E
H A R DWA R E
Box
�
35.2
Use of a Double Record Type to I mprove Mean Access Time
N
A selected n umber of days into the future on an airline reservation system are called the current period. Separate PASSENGER NAME I NDEX ( PN I ) and seat I NVENTORY records are main tained for the current period, which can be ac cessed quickly . Also the seat availability informat ion in the I NVENTORY records if stored in a highly abbreviated form i n the AVAILABI LITY records, which are referred to very frequently and are hence q uickly accessible . For the current period a D ETA I L I NVENTO RY record is main taine d . It gives ful l details of the seats sold for each class, leg, and segment of the flight (segment refers to a combination of a board ing and leaving point ; thus, a three-leg flight involv ing London- Rome-Athens-Tehran will have six seg ments, London-Rome, London-Athens, Rome Tehran . etc . . and the DETAI L I NVENTORY will keep a count for each). For the noncurrent period a G ROSS I NVEN TORY record is kept with a single combined total of all seats sold on a flight for any segment or class. One such record contains the gross inventory for a flight for 6 1 days, and six records keep it for the
whole year. When a flight moves into the current period a DETA I L I NVENTO RY record m ust be created from the passenger booking records. Figure 3 5 . 1 shows the separation of gross and detail records. DETAI L I NVENTO RY records are also created for flights outside the current period which are heavily loaded . DETA I L I NVENTO RY records will be automatically created for flights that have more than N seats booked, say 20. N, like the number of days in the current period, is a system parameter which can be adj usted to achieve the best com promise between speedy operation and conserva tion of storage. A similar strategy is used with the storage of passenger names. Two different structures are used , a DETA I L PASSENGER NAM E I N DE X for the current period and a G ROSS PASSENGER NAME INDEX for the noncurrent period. As with the inventory records a DETA I L record will be created in advance for peak days such as Christmas. The gross index contains an indication of those days for which a detail index has been redundantly created .
- - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - -
Current detail PN I
Gross PNI
1 1 I
Other airline PN I
I
I
- - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - -
I nventory master
Current detail inventory
Gross i. nventory
I
:
Availab ility
I
_ _ _ _ _ _
-" u 0 :0 �
. 0
:?" g_ >. � "O -'?
n8
�,
>- 0 ro o c
�
Gross PNI overflow
I
I
- - - ---
_ _ _ _
� - -- - - - - - - - - - ,
I
I I I
I I
I
Other airline PNI overflow
Detail PNI overflow
Passenger details (PN R )
- - - -
1 I I I I I
I I
Non-current detail PNI
Q)
I .L
I 1
1 I I I I I I I
I
- - -------
I I
- - - - - r -- - 1 I I I I I I
I I I I
I - - - --t - - - - - - - 1 I
I
Non-current detai l
I I I I
PNR overflow
I
I
I
I
I
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- -
Figure 35. l
I I I I I I I
Inventory master overflow
�- - -- - - - -
Gross
inventory overflow F l ight information
- - - - - - - - - - - - - - - - - - - - - -- - - - - -
Figure 3 4 . 8 red rawn to show how some of the data is stored redundantly to speed up the system response time.
�
w
L-
-- - - - -
CPU
Channel
F i le unit contro l l er
search contro l lers
CPU
Channel
File unit control ler
/
Modu l e search controllers
\ " '""" '""'"""' / "
'
Figure 35. 2
Distri buted intelligence in the file control hardware could speed up data retrieval by facilitating parallel operations. This tech nique could be e m ployed on any form of electro mechanical storage and become econom ical as the cost of large-scale i n tegration circuitry d rops.
this buffer search operations can proceed with an access time of a microsecond or so . The buffer may contain a relational data base, indices, and directories, so that a search takes p lace in the buffer and the records 514
Chap. 35
515
Fast-R esponse Sy stems
located are then read from a larger, more permanent backup storage _ The indices and directories may not reside permanently in the solid-state storage . This would probably be undesirable if only because most solid-state devices, unlike magnetic storage , lose their data when power fails. Instead, a paging mechanism would read them into the buffer when needed , as described in Chapter 3 2 . Segments of files would also be paged into the buffer on deman d _ The algorithms for controlling the demand paging will be critical to the goal of achieving fast response times. Again, prior to the advent of large-scale integration, such a large data buffer would have been far too expensive to be practical. Now it is economically feasible . Given the capability to build large solid-state storage devices, a technique which has existed for 20 years in laboratories begins to appear viable -associative memory . An associative memory is one in which records are addressable by their con tents rather than by their physical addresses. In the next chapter we will discuss associative memory . The mechanisms of data-base storage can be changed dramatically with the use of large solid-state buffers, demand paging, parallel searching both in the buffers and in electromechanical devices, and associative memory . R E F E R E NCES
I . J ames Martin, Design of Man-Computer Dialogues, Prentice-Hall , Inc., Englewood Cliffs, N .J . , 1 9 7 3 , Chapter 1 8 . 2.
J ames Martin , Systems A nalysis for Data Transmission , Prentice-Hall, Inc ., Englewood Cliffs, N.J ., 1 9 7 2 , Chapter 3 1 -
3 _ Robert B . Miller, "Response Time in Man-Computer Conversational Transactions," in A FIPS Conference Proceedings (Fall Join t Computer Conference 1 9 68) , Thompson Book Company , Washington , D.C., 1 968 . 4. J ames M artin, Design of Man-Computer Dialogues, op. cit. , Chapter
18.
36
A SS O C I AT I V E
M E M ORY
An idea can be powerful when its time has come. Associative memory is an old idea whose time does not seem to have come yet-but it may come soon . When it does it will change much in physical storage structures. Small associative memories have been in existence since the early 1 960s. In associative memory the storage locations are addressed by data conten t rather than by a hardware address. Figure 3 6 . 1 shows the d ifference between a conventional memory array and an associative memory array. Both arrays in the d iagram contain a se t of words of M bits. In the conventional array an address of N bits is used to read out one of the words. In the associative array a search argument of M bits is e mployed as input, and the output consists of a bit for each of the words, indicating whether it is iden tical to the search argument or not. A l bit indicates a match and a 0 bit a mismatc h . Associative arrays usually have facilities for masking the search argument so that only selected bits in the words are com pared . The output of an associative array may be employed to read data from a conventional storage array . I n Fig. 3 6 . 2 , 2 N lines could go from the associative array to the storage array. Often , however, this is far too many l ines, and so the output of the associative array is encoded and taken to the conventional storage on a smaller number of lines. N lines connect the associative and conventional arrays in Fig. 3 6 . 2 . One set of N bits may be transmitted for each match that occurs in the associative array, giving the address of the word in q uestion and causing the cell with that address in the conventional storage to be read.
516
Associative Memory
Chap. 36
Address: N bits
2N words of M bits
Output: M bits
Ii
j
517
Search argument: M bits
2N words of M bits
Associative array (add ressed by content)
Outpu t : A bit for
each word indicating whether it matches the search argument or not
Conventional array (directly add ressed) Figure 36. 1
Comparison of a conventional m e mory array and a n associative mem ory array .
The store which contains the output of the associative array in Fig. 3 6 . 1 is called the response store. In Fig. 3 6. 1 , the response store has at least 1 bit for each storage location in the array . In another design of associative storage the response is the con ten ts of the storage location satisfying the search criterion. I n this case the device must have the ability to output the contents of successive locations when more than one satisfy the search criterion .
TH E R ESPONSE STORAG E
MO R E COM P L E X OPERATIONS
each storage location.
Associative memory hardware thus carries out a series of compare operations in a paralle l-by-word , serial-by-bit fashion on a sele cted group of bits in
Part I I
P h y sical Organ iza t i on
518 Search argument: M bits
2N cells of storage
2N words of M bits
N l i nes
Associative array
-------
-
- -
Conventional array
Output: data Figure 36.2
An associative array used as an index to data stored in a conven tional memory .
In addition t o carrying o u t compare operations i t may also execute other o perations in parallel upon the se le cted bits. It may be able to test for equality , ineq uality , greater than, greater than or equal to, less than, less than or equal to , between limits, maximum, minimum, next higher, or next lower. All these o perations are of value when searching data. More complex logic may be use d which can carry out Boolean o perations within the words. However, such o perations could be carried out less e xpensively by multiple sea r ches of the same data with different search criteria . There have been proposals that associative memories carry out arithmetic or logical operations in parallel on the stored data as well as merely executing search fuctions. They could , for example, add a constant to a specific field in every storage location or increment the field if it contains a certain value . The greater the variety of o perations, the greater the cost , and as cost is the main objection to associative memory we will confine our attention to search functions.
519
Associative Memory
Chap. 3 6
Small associative arrays o n single L S I ( large-scaleintegration) chips are purchasable from semicond u c t o r m a n u facturers. As LSI technology improves, the number of bits stored per chip will increase . However, to be of value in data-base construction much larger associative memories are needed. These can be built from multiple chips. The cost of producing an LSI chip drops dramatically as the quantity produced rises. When associative storage becomes a widely accepted technology the cost of LSI associative arrays may become low because of the P H YS I CA L
CONST R UCT I O N
econ o m i cs
of
m ass
prod u c t i o n
o f m e m ories
containing
many
ident ical
arrays. Associative designs have also used magnetic d isks on which many tracks can be scanned in para l le l . Figure 3 6 . 3 shows a system built by the Goodyear Aerospace Corpora tion [ I ] that is used for m i litary applications. The d isk at the left of the diagram has 7 2 tracks of which 64 are used to store data. There is one head per track, and the 64 tracks are read in parallel into 64 storage locations of an associative array like that in Fig. 3 6 . 1 . Each sector of the d isk holds 2 5 6 bits per track , and each word of the associative array holds 2 5 6 bits. It takes l 00 microseconds to read 64 sectors into the array and I 00 microseconds to search the array. Therefore , as the disk rotates, Associative array Paral lel read/write electronics
\
\
384 sectors 256 bits per sector __ on each track
�
Paral lel in put/output unit
r--c::-...
Search argument up to 256 bits
64 words of 256 bits
64 parallel channels
64 data tracks read simultaneously 1 00 µsec to read a sector
Figure 36. 3
1 00 µsec to perform an associative search
An associative memory syste m using a head-per-track d isk implemented by the G oodyear Aerospace Corporation [ I ] .
Conventional computer
XDS """9 1
520
Physica l Organization
Part
11
alternate sectors are read and searched . The entire d isk may be searched in this manner in two d isk rotations. There are 384 sectors on the d isk, and the d isk takes approximately 3 8 .4 milliseconds to rotate. The contents of the d isk-about 6.29 million bits-may thus be searched associatively in about 76.8 milliseconds. The associative array is attached to a conventional computer, an XDS S IGMA 5, which controls the searches, giving appropriate search arguments and instructions to the array . The associative array has a capacity of 2 5 6 words, and only 64 of these are use d . If four disk surfaces were employed for four times the reading facilities, four times as much data could be searched associatively. I n view of the small size of associative arrays, the TWO WOR LDS way in which storage may be organized is to store data conventionally in pages but to read the pages into an associative memory for searching. This, in effe ct, is what is done in Fig. 3 6 . 3 . Any conventionall y organized storage could be used i nstead of the disk in Fig. 36.3 provided that i ts read rate is high enough to fill the associative memory sufficiently fast for e ffective searching. Richard Moulder [ I ] , discussing the system in Fig. 3 6 . 3 , claims "An associative processor working in conjunction with a sequential processor affords the best configuration. With this marriage comes the best of two computer worlds, each performing what it is best capable of doing. Associative processing brings to data base management designers and users the ability to query and update data bases, in a fast and efficient manner with a minimum amount of software ." No additional storage is needed for inverted files, and updates can be made straightforwardly without the problems of inverted files, described in Chapter 3 3 . T H E B EST O F
I f data are to be divided into pages which will be associatively searched , how should the data be organized ? An associative processor searches flat files-every "word" in the associative array must contain the same set of data-item types. The data should therefore be in a relational ( normalized) form , as described in Chapters 1 3 and 1 4. Some of the normal ized relations may contain more b its per tuple than the number of bits per word in the associative array . Such oversize relations m ust be d ivided physically into domain groups that fit into the associative array . I n Chapter 29 we discussed separating data and ASSOCIATIVE PAG E O R G A N I ZAT I O N
Chap. 36
Assoc iative Memory
521
relationships. If associative storage is used in data-base management systems, it seems likely that i t should be applied to a re latively small file of relationships, not to the larger file of data. I t is possible, for example, to imagine an "index box" built asso ciatively for searching primary or secondary indices. I f such a device were to become a mass-prod uced unit, it could have a major effect on the design of interactive information systems. Because of the limited size of associative arrays, it is desirable to load them with only those data which are actively involved in the search. If third normal form tuples are loaded into the array, some of the data items in the tuples may not be invo lved in the search when responding to a particular query . For this reason binary relations may be preferable to third normal form or n-ary re lations (although the logical data description may still be in third normal form as d iscussed in Chapter 1 4 ). As we d iscussed in Chapter 29, any data-base schema , no matter how com ple x , can be represented by binary relations. Furthermore , complex queries can be answered by dealing with the data-item types in the query one pair at a time. The binary re lations will be loaded into the associative memory for searching. The word length of the associative memory need not be long in this case . The 256 bit words used in the Goodyear associative array would be adequate for most binary relations but insuffi cient for most third normal form relations. BINARY
R E LAT IONS
Associative memory hardware large enough for M E MO R I ES da ta-base use has not been available for most systems. However, file storage with properties similar to associative memories can be built from conventional storage hardware by using software te chniques. Such file organizations are some times referred to as software associa tive me mories. Software associative memory is slow, clumsy , and often error-prone compared with its hardware eq uivalent, but at the time of writing it is cheaper. If a manufacturer provides only conventional storage , associativity has to be provided by appropriate data structures such as hashing and rings.
SOFTWA R E ASSO C I A T I V E
CON T E N T ADDR ESSABI L I T Y
The term con tent addressable is used to describe associative storage . The primary property of con tent addressable memory is
Physical Organ ization
522
Part I I
Property l : Th e address of a segment of data is a fun ction of its information con ten t.
With software associative memory the term i s sometimes used to imply a second property : Property 2:
Segmen ts similar in con ten t are stored close in address space.
The second property may imply that items similar in conten t, for example, having the same secondary key , are on the same page so that they can be searched for rapidly when that page is brought into main memory . This technique will minimize access time when retrieving related pieces of information. There have been several im plementations of software-associative mem ories [ 2- 7 ] . One has been built at the IBM Cam bridge Scientific Cen ter [ 8 ,9 ] and used for a variety of applications with com plex data structures requiring fast responses. The physical data structure employed provides both properties of content addressability . It is called RAM ( re lational access method). The system employs binary re lations, and these are in the form of triads, as discussed in Chapter 2 9 . A triad consists of three e lements, A , B, and C, of w hich one is the re lation name and the other two are related data items. These can be in any sequence , but in the fol lowing description A or B wil l usual ly be the relation name . To ensure that triads similar in content are close in address space ( property 2 of content addressability), the Cam bridge group required an addressing method which would place similar triads on the same page . The page could then be searched at high speed in main memory . The technique used is illustrated in Fig. 36.4. Triads with the same value of A are stored on the same page, with overflow to a page close in address space if necessary. A page-addressing algorithm is employed to convert the value of A to the page on which A is stored . Hashing on the combined values of A and B is used to convert their values to the location within the page where the value of B associated with A is store d . A l l t h e values of B associated with a given A value are chained together into a B ring, as shown in Fig. 36.4. The head of the B ring is the location where that A value is stored . All the values of C associated with a given A B pair are chained together into a C ring. The head o f the C ring i s the location where that related B value is store d . If the AB pair has only one associated C
Page addressing algorithm
Hash on A and B values
A entries (B-ring header)
B entries (C-ring header)
C entries
One page
Figure 36. 4 A software associate memory structure , using triads [ 8 9 ) .
523
524
Physical Organ ization
Part
11
value, there is n o C ring. The C value is stored along with the related B value , as with C6 and B7 in Fig. 3 6 .4 . The hashing technique allows A B pairs t o b e found most o f the time with one page read. The rings on the page can then be searched fast in main memory , thus providing an effective simulation of associative storage . Synonyms are dealt with by using a separate conflict ring, not shown in Fig. 36.4. As large-scale-integration circuits improve i n the years ahead the cost of associative arrays will drop. On the other hand , the cost of conventiona l solid-state storage will drop a lso , and it may still remain more economica l to build software associative memory than hardware . As technology changes there should be continual reassessment of the merits and role of associative storage . R E F E R E NCES 1.
Richard Moulder, "An I mplementation of a Data Management System on an Associative Processor," in Proceedings of the National Computer Conference, New York, 1 9 73.
2.
J . A. Feldman and P.D. Rovner, "An ALGOL-Based Associative Language," Comm. A CM 12 , No. 8, Aug. 1 969, 439-449 .
3.
W. L. Ash , "A Compiler for an Associative Object Machine," CONCOMP Technical Report 1 7, University of Michigan , Ann Arbor, May 1 969.
4.
W . Ash and E. Sibley , "TRAMP: A Relational Memory with an Associative B ase ," CONCOMP Technical Report 5, University of Michigan , Ann Arbor, June 1 9 67.
5.
A. J . Symonds, "Auxiliary-Storage Associative Data Structure for P L/ I ," IBM Systems J. 7, Nos. 3 and 4, 1 96 8 , 229 -24 5 .
6.
A. J. Symonds, "Use of Relational Programming T o Manipulate a Structure i n a Software Associative Memory ," presented at 1 969 ACM/SI AM/EEE Conference on Mathematical and Computer Aids to Design , Oct. 1 969.
7.
J ack Miuker, "An Overview of Associative and Content-Addressable Memory Systems and a KWIC I ndex to the Literature : 1 9 5 6- 1 970," Computing Rev. , Oct. 1 9 7 1 .
8.
M . F. C . Crick , R. A. Lorie, E. J . Mosher and A. J . Symonds, "A Data-Base System for I nteractive Applications," Report G 320-2058, Cambridge Scientific Center, I BM Corporation , July 1 970.
9.
M. F. C. Crick and A. J . Symonds, "A Software Associative Memory for Complex Data Structures," Report G 320-2060, Cambridge Scientific Center, I B M Corporation, Aug. 1 9 70.
Appendix
A
THE MEAN NUMBER O F PROBES I N A B I N A R Y SEARCH
Parts of this book discussed a binary search of files o r ind ices (Fig. 2 1 .2 ) and referred to the m ean number of reads or probes necessary in executing a binary search. This mean number of probes is calculated here. The calculation assumes that any of the items are equally l ikely to be searched for. Let N1 be the number of items that are to be searched , and let Np be the number of reads or probes necessary . The maximum possible number of probes are shown in the following table :
N1
Maximum Possible N umber of Probes
2-3
2
4-7
3
8-1 5
4
In general the maximum possible number of probes is L log2 N1J + 1 . * Le t Pi be the probability of finding the item on the jth probe. The mean number of probes is then * .LxJ is the next integer below x.
525
Appen d i x A
526
Ll og2 N1J + L
I j Pi
(A. I )
i= I
First Probe
There is o n ly one possib le candidate for inspection by the first probe. The probab i (j ty of the first probe finding the required item is therefore P 1 = l /N1 . Second Probe
There are two possible candidates for inspection by the second probe. The probability that the second probe will locate the required item is therefore P2 = 2/N, . jth Probe Where j �
Llog2 Nd
There are 2 u- i ) possible candidates for inspection by the jth probe if N1 J. The p robability that the jth probe will locate the required item is therefore Pi = 2 U- t ) /N1 . j �
L log2
Last Probe
If there is a final probe after the L log 2 N1 j th probe, there may be less than 2 < r 1 ) items remai ning to be i nspected . The probability that this ( L log 2 N1 J + 1 )th probe will locate the required item is ( 1 the sum of the probabilities that no previous item located it). That is, -
Pi
I -
( where j
Llog2 N1 J �
j= I
Substituting these probabil i ties into Eq. ( A . I ),
(A.2)
E(Np ) is plotted i n Fig . A. I . I t w i l l be observed that the mean number of probes approximates log 2 N1 1 when N1 is large. -
11
10
9
8 "' Q) ..c 0
c. 0 .... Q) ..c
E
:l c c "' Q)
7
6
5
E
-=
�"lij"
4
3
2
2
4
8
16
32
64
1 28
256
51 2
1 024 2048 4096
N1 , n u m ber of items searched
A . l The mean number of p ro bes in a binary search . When a large number, N1 , of items are searched , the mean number of probes approximates log2 Ni - I .
Figure
527
Appendix
B
SAM PL E LOGICAL DATA D E SCR I P T I O N S t
Different data-base languages have differen t m e th ods of d escribing data. The fol lowing pages give samples of typical data descriptions using the
�
I
ORG
CJ) z 0 CJ) a: UJ c..
I
O R G C O D E O R G N A M E R E PO R TO
I
BUDGET
SUBORG
�
SUBCODE
I
CJ) cc 0 ....,
JOB -
JOBCODE
AUTHOUAN
AUTHSAL
PE RSON -
E M P N O E M P N A M E S E X E M PJ C O D E LE V E L SA LA R Y
Figure t
B I RTH
I
I
S K I L LS
B. l
Reproduced from the CODAS YL Systems Committee Technical Report, May 1 9 71.
528
I
MONTH DAY Y E A R S K I LCO D E S K L Y R S
Appendix
529
B
schema in Fig. B . I . For various reasons the data .structures coded in the d ifferent systems are not exactly the same . 1.
COBOL
DATA D I V I S I ON F I LE
S E C T I ON
F E D S DC DATA O RG ORGCODE ; P I CTURE I S 9 9 9 9 O P G NAME ; P I C T U R E I S A ( 2 5 ) R E PO R T O ; P I C T U R E I S 9 9 9 9 B U D G E T ; P I C T U R E I S Z ( 8 ) ; U S AG E I S C O M P U T AT I O N A L - I J O B ; O C C U R S 1 T O 5 0 T I ME S A S C E N D I N G K E Y I S J O B C ODE J O B C OD E ; P I C T U R E I S 9 9 9 9 A U T H Q UAN ; P I C T U R E I S q 9 ; U S A C, E I S C O M P U T A T I O N A L A U T H S A L ; P I C Z Z Z Z Z Z ; U S AG E - C O M P - 1 S UB O R G ; O C C U R S 0 T O 2 0 T I ME S AS C E N D I N G K E Y I S S U B CO D E S U B CO D E ; P I C 9 9 9 9 P E R SON ; OCCURS 1 T O 9 9 9 T I ME S ASCEND I NG KEYS ARE EMPJ CODE , EMPNO E M P NO ; P I C Z 9 9 9 9 E M PNAME ; P I C A ( 2 0 ) SEX ; PI C A EMPJCODE ; P I C 9 9 9 9 L E V E L ; P I C AAAA S AL A R Y ; P I C Z Z Z Z Z U S AG E C O M P - 1 B I RTH MONTH ; P I C 99 DAY ; P I C 99 Y E AR ; P I C 9 9 S K I L L S ; O C C U R S 1 T O 9 T I ME S A S C E N D I N G K E Y I S S K I L C O D E SK I LCODE ; P I C 9 9 9 9 SKLYRS ; P I C 99
01 02 02 02 02 02 03 03 03 02 03 02 03 03 03 03 03 03 03 04 04 04 03 04 04
2.
The Proposed CODASYL Data Description Language (Chapter
S C H E MA N A M E A R E A NAME
01 01 01 01 01
ORG D A T A
I S O RG P A R T
R E C O R D N AM E P R I VA C Y
IS
11)
LOCK
I S O RG IS
S E S AM E
O RG C O D E P I CTURE I S "9 ( 4 ) " O R G N AM E T Y P E I S CHARACT E R 2 5 R E PORTO P I CTURE I S " 9 9 9 9 " B U D G E T T Y P E D E C I MA L F L OAT ; I S ON M E M A E R S O F P E R S O N S NOSUBORG T Y P E B I N AR Y
ACTUAL R E S U L T O F S A L S UM
S U B O RG O C C U R S N O S U R O R G T I ME S S U B CO D E P I C T U R E " 9 9 9 9 " 02 R E C O R D NAME I S J O R JOB CODE P I CT U R E " 9 9 9 9 " 01 A U T H Q UAN P I CTURE " 9 9 " 01 T Y P E F L OAT A U T H S AL 01 R E CO R D NAME I S P E R S O N EMPNO P I C T U RE " 9 ( 5 ) " 01 T Y P E CHARAC T E R 2 0 01 E M P N AM E 01 P I CTURE "A" SEX EMPJ CODE P I CT U R E " 9 9 9 9 " 01 01 LEVEL P I CTURE "X ( 4 ) " P R I V A C Y L O C K F O P. G E T r s 01 S ALARY P I C T U R E " 'l ( S ) V 9 9 " P R O C E D U R E A UT H E N T 01 B I RTH 02 MON T H P I CTURE "99" 02 DAY P I C T URE " 9 9 " 02 Y E AR P I CT U R E " 9 9 " T Y P E B I NA R Y N OS K I L L S 01 01 S K I L L S O C C U R S ND S K I L L S T I ME S 02 S K I LCODE P I CTURE " 9 9 9 9 " SKLYRS P I CTURE " 9 9 " 02 S E T N A M E I S J O B S ; O P. D E R I S S O R T E D OW N E R I S O RG M E M B E R I S J O B O P T I O N A L A U T OMAT I C ; A S C E N D I N G K E Y I S J O B C O D E D U P L I C A T E S N O T A L L OW E D S E T N AME I S P E R S O N S ; O R D E R I S S O R T E D OWN E R I S O P G ME M B E R I S P E R S O N O P T I ON A L A U T O M A T I C ; A S C E N D I N G K E Y I S E M P J C O D E , E M P N O D U P L I C A T E S N O T A L L OW E D 01
530
Append ix B
3.
531
I nformatics Mark I V F ile Management System
The data are described by filling in this form :
BW'
��
l�R�fiflt)
""
••LI 1m:NT.
--
�P&.MrA I •• "
l,/)p/!b."1r.4.
n 'Tr
�
DlllTU
C H A .. A C T l fll l l T I C I
Cil 'W
.
•
�g
" I L O N Allll
g �
'i
�
� �
LOCATION LINGTH
• � • 10 ' ' I I 11 10 JI n u l IIl/! PCL' ,> 7>£ II L. tV>o /!./A.HE 111
l.t'IPF ,,,,;p,-0 L I IPrP- �ro l 0 IR, . /J.C.E.r
L ,Cl .t -'n r:..,.RS. L /'.1.1., PL P,,�,., ' uE LL' 5" EA'. L t' EX PH.->.