235 12 13MB
English Pages 262 Year 2023
The Creation and Management of Database Systems
THE CREATION AND MANAGEMENT OF DATABASE SYSTEMS
Edited by: Adele Kuzmiakova
ARCLER
P
r
e
s
s
www.arclerpress.com
The Creation and Management of Database Systems Adele Kuzmiakova
Arcler Press 224 Shoreacres Road Burlington, ON L7L 2H2 Canada www.arclerpress.com Email: [email protected]
e-book Edition 2023 ISBN: 978-1-77469-673-6 (e-book)
This book contains information obtained from highly regarded resources. Reprinted material sources are indicated and copyright remains with the original owners. Copyright for images and other graphics remains with the original owners as indicated. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The authors or editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement.
© 2023 Arcler Press ISBN: 978-1-77469-442-8 (Hardcover)
Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com
ABOUT THE EDITOR
Adele Kuzmiakova is a machine learning engineer focusing on solving problems in machine learning, deep learning, and computer vision. Adele currently works as a senior machine learning engineer at Ifolor focusing on creating engaging photo stories and products. Adele attended Cornell University in New York, United States for her undergraduate studies. She studied engineering with a focus on applied math. Some of the deep learning problems Adele worked on include predicting air quality from public webcams, developing a real-time human movement tracking, and using 3D computer vision to create 3D avatars from selfies in order to bring online clothes shopping closer to reality. She is also passionate about exchanging ideas and inspiring other people and acted as a workshop organizer at Women in Data Science conference in Geneva, Switzerland.
TABLE OF CONTENTS
List of Figures.........................................................................................................xi List of Tables........................................................................................................xiii List of Abbreviations.............................................................................................xv Preface............................................................................................................ ....xix Chapter 1
Introduction to Database Systems.............................................................. 1 1.1. Introduction......................................................................................... 2 1.2. Why Databases?.................................................................................. 2 1.3. Data Vs. Information............................................................................ 4 1.4. Introducing the Database..................................................................... 6 1.5. Importance of Database Design......................................................... 12 1.6. Evolution of File System Data Processing........................................... 13 1.7. Problems with File System Data Processing....................................... 18 1.8. Database Systems.............................................................................. 24 References................................................................................................ 34
Chapter 2
Data Models............................................................................................. 45 2.1. Introduction....................................................................................... 46 2.2. Importance of Data Models............................................................... 47 2.3. Data Model Basic Building Blocks..................................................... 48 2.4. Business Rules................................................................................... 50 2.5. The Evolution of Data Models............................................................ 54 References................................................................................................ 66
Chapter 3
Database Environment............................................................................. 73 3.1. Introduction....................................................................................... 74 3.2. Three-Level Ansi-Sparc Architecture................................................... 74 3.3. Database Languages.......................................................................... 81 3.4. Conceptual Modeling and Data Models............................................. 86
3.5. Functions of a DBMS......................................................................... 91 3.6. Components of a DBMS.................................................................... 97 References.............................................................................................. 102 Chapter 4
The Relational Model............................................................................. 111 4.1. Introduction..................................................................................... 112 4.2. Brief History of the Relational Model............................................... 112 4.3. Terminology..................................................................................... 114 4.4. Integrity Constraints......................................................................... 125 4.5. Views............................................................................................... 128 References.............................................................................................. 132
Chapter 5
Database Planning and Design............................................................... 139 5.1. Introduction..................................................................................... 140 5.2. The Database System Development Lifecycle................................... 141 5.3. Database Planning........................................................................... 143 5.4. Definition of the System................................................................... 144 5.5. Requirements Collection and Analysis............................................. 145 5.6. Database Design.............................................................................. 149 References.............................................................................................. 154
Chapter 6
Data Manipulation................................................................................. 159 6.1. Introduction..................................................................................... 160 6.2. Introduction to SQL......................................................................... 160 6.3. Writing SQL Commands.................................................................. 165 6.4. Data Manipulation........................................................................... 167 References.............................................................................................. 173
Chapter 7
Database Connectivity and Web Technologies...................................... 179 7.1. Introduction..................................................................................... 180 7.2. Database Connectivity..................................................................... 180 7.3. Internet Databases........................................................................... 194 7.4. Extensible Markup Language........................................................... 204 References.............................................................................................. 208
viii
Chapter 8
Database Administration and Security................................................... 217 8.1. Introduction..................................................................................... 218 8.2. The Role of a Database in an Organization...................................... 220 8.3. Introduction of a Database............................................................... 222 8.4. The Evolution of Database Administration Function......................... 224 References.............................................................................................. 230
Index...................................................................................................... 235
ix
LIST OF FIGURES Figure 1.1. Taking raw data and turning it into information Figure 1.2. The database management system (DBMS) controls the interface between the end-user then the database Figure 1.3. The customer file’s contents Figure 1.4. The agent file’s contents Figure 1.5. A straightforward file structure Figure 1.6. Comparing and contrasting database and file management organizations Figure 1.7. The database system’s surroundings Figure 1.8. Using Microsoft SQL server express to visualize information Figure 1.9. Using oracle to demonstrate data storage management Figure 2.1. Creating connections between relational tables Figure 2.2. A diagram that shows how things are connected Figure 2.3. Notations for Chen and crow’s foot Figure 2.4. OO, UML, and ER models are compared Figure 3.1. The ANSI-SPARC 3-level design Figure 3.2. The distinctions between the 3 levels Figure 3.3. The ANSISPARC 3-level architecture and data independence Figure 3.4. This is an example of a relational schema Figure 3.5. This is an example of a network schema Figure 3.6. This is an example of a hierarchical schema Figure 3.7. The lost update Figure 3.8. Main elements of a DBMS Figure 3.9. Elements of a database manager Figure 4.1. Relationships between the branch and also the staff Figure 4.2. Some Branch and Staff Relations qualities have their domains Figure 4.3. A DreamHome rental database instance Figure 5.1. The steps in the building of a database system Figure 5.2. User views (1, 2, and 3) as well as (5 and 6) contain overlapping criteria (represented as hatched regions), however, user view four has diverse demands
Figure 5.3. Multiple user views 1 to 3 are managed in a centralized manner Figure 5.4. Controlling multiple users’ views 1 to 3 utilizing the view integration technique Figure 6.1. A graphical example of the manipulation of data files Figure 6.2. SQL: A basic overview Figure 7.1. ORACLE intrinsic connectivity Figure 7.2. Utilizing ODBC, RDO, and DAO to access databases Figure 7.3. Setting an oracle ODBC source of data Figure 7.4. MS Excel utilizes ODBC to link to the database of oracle Figure 7.5. OLE-DB design Figure 7.6. Framework of ADO.NET Figure 7.7. Framework of JDBC Figure 7.8. Web-to-database middleware Figure 7.9. Web server API and CGI interfaces Figure 7.10. The productlist.xml page’s contents Figure 8.1. The cycle of data-information-decision making Figure 8.2. The IS department’s internal organization Figure 8.3. The placement of the DBA function Figure 8.4. A DBA functional organization Figure 8.5. Multiple database administrators in an organization
xii
LIST OF TABLES Table 1.1. The characteristics of many famous file organization systems are compared Table 1.2. Terminology for basic files Table 2.1. Major data models have changed over time Table 4.1. Alternate names for terms in relational models Table 5.1. A synopsis of the principal activities connected with each phase of the DSDLC Table 5.2. The requirements for an optimum data model Table 6.1. Table of results for example #1 Table 6.2. Table of results for example #2 Table 6.3. With duplicates, the outcome table for example #3 Table 6.4. Duplicates are removed from the outcome table for example #3 Table 6.5. Table of results for example #4 Table 7.1. Example OLE-DB interfaces and classes Table 7.2. Example ADO objects Table 7.3. Features and advantages of internet technologies
LIST OF ABBREVIATIONS 3GL
3rd-generation language
4GL
fourth-generation language
ADO
ActiveX data objects
ANSI
American National Standards Institute
API
application programming interface
ASP
active server pages
ATM
automated teller machines
B2C
business to consumer
BNF
Backus Naur form
CGI
common gateway interface
COM
component object model
DBA
database administrator
DBLC
database life cycle
DBMS
database management system
DBTG
Data Base Task Group
DCM
data communication manager
DDL
data definition language
DFD
data flow diagrams
DLL
dynamic-link librarie
DM
database manager
DML
data manipulation language
DP
data processing
DSDLC
database system development lifecycle
DSN
data source name
EDP
electronic data processing
ER
entity-relationship
ERD
entity-relationship diagram
FIPS
federal information processing standard
fName
first name
GUI
graphical interface
HIPO
hierarchical input process output
INGRES
interactive graphics retrieval system
IRDS
information resource dictionary system
IRM
information resource manager
IS
information systems
ISAPI
internet server API
ISLC
information systems lifecycle
ISO
International Organization for Standardization
JDBC
java database connectivity
lName
last name
MDM
master data management
NDL
network database languages
ODBC
open database connectivity
OLAP
OnLine analytical processing
OLE-DB
object linking and embedding for database
OODM
object-oriented data model
QBE
query-by-example
RDA
remote data access
RDBMS
relational database management system
RDL
relational database language
RDO
remote data objects
SAA
systems application architecture
SAD
structured analysis and design
SDLC
software development lifecycle
SGML
standard generalized markup language
sNo
staff number
SPARC
Standards Planning and Requirements Committee
SQL
structured query language
UDA
universal data access
UML
unified modeling language
xvi
UoD
universe of discourse
W3C
world wide web consortium
WSAPI
WebSite API
XML
extensible markup language
xvii
PREFACE The database system has become one of the most significant advancements in software engineering due to the extraordinarily productive history of database research over the past 30 years. The database has now become the foundation of information systems (IS) and drastically altered how organizations store and access information. This technology has advanced significantly recently, resulting in more robust and user-friendly designs. Database systems are becoming increasingly accessible to a broader audience due to this. Unfortunately, users have created databases and apps without the skills essential to create a reliable and successful system because of the systems’ seeming simplicity. And thus, the “software crisis,” or “software slump,” as it is often frequently called, persists. The authors’ experience in the industry, where they provided database design consulting for new software systems and addressed shortcomings with existing systems, served as the initial impetus for this book. Additionally, the authors’ transition to academia resulted in the same issues being raised by other users—students. Therefore, the goals of this book are to create a textbook that explains database theory as simply as possible. The approach for relational Database Management Systems (DBMSs), the current industry standard for business applications, is explained in this book and has been tried and proven throughout time in both academic and industrial settings. Conceptual, logical, and physical database design are the three key stages that make up this process. The creation of a conceptual data model that is unrelated to any physical factors is the first step in the first phase. By eliminating components that cannot be represented in relational systems, this model is subsequently improved in the second step to becoming a logical data model. The physical design for the target DBMS is converted from the logical data model in the third step. The storage configurations and access techniques needed for effective and safe access to the database on secondary storage are taken into account during the physical design phase. The book is divided up into a total of eight different chapters. The reader is given an introduction to the principles of database systems in the first chapter. The data models are discussed in considerable depth in Chapter 2. The Database environment is covered in great detail in Chapter 3. In Chapter 4, the relational model is presented to the readers of the book. The planning and design of the database are given a lot of attention in Chapter 5. The concept of data manipulation is broken down and shown in Chapter 6. Database connectivity and web technologies are also discussed in Chapter 7. In the last chapter, “Chapter 8, Administration and Security of Databases,” we discuss database administration and security.
This book does an outstanding job of providing an overview of the many different aspects of database systems. The content is presented in such a way that even an untrained reader should have no trouble understanding the fundamental ideas behind DBMSs if they read this text.
xx
1
CHAPTER
INTRODUCTION TO DATABASE SYSTEMS
CONTENTS 1.1. Introduction......................................................................................... 2 1.2. Why Databases?.................................................................................. 2 1.3. Data Vs. Information............................................................................ 4 1.4. Introducing the Database..................................................................... 6 1.5. Importance of Database Design......................................................... 12 1.6. Evolution of File System Data Processing........................................... 13 1.7. Problems with File System Data Processing....................................... 18 1.8. Database Systems.............................................................................. 24 References................................................................................................ 34
2
The Creation and Management of Database Systems
1.1. INTRODUCTION Databases and the systems that manage them are an indispensable part of life within today’s contemporary civilization. The majority of us participate in so many activities daily that need some level of connection with only a database (Valduriez, 1993). For instance, if we’re in a financial institution to deposit money or withdrawal, if we make reservations at a restaurant or aviation, if we use computer-based library resources to seek a bibliometric item, or even if we buy anything online like a book, plaything, or computer, there is a good chance that our actions will engage somebody or computer simulation accessing a dataset. This is the case regardless of whether we are the ones trying to access it or not. Even just the act of buying things in a supermarket may sometimes immediately access the data that maintains the inventories of food products (Güting, 1994). Such exchanges are instances of what we could refer to as classic database systems. In such applications, the vast majority of data which is stored as well as retrieved seems to be either text-based or numerical. In the recent past, fascinating new uses of database management systems (DBMSs) have been made possible as a result of developments in technology (Zaniolo et al., 1997). The prevalence of online sites for social networks, including Facebook, Twitter, as well as Flickr, amongst many others, has necessitated the formation of massive databases that stockpile non-conventional data, including such articles, Twitter posts, photos, and short videos, amongst other things. These databases can be found on websites like Facebook, Twitter, and photo sharing sites. To manage information for social media sites, new kinds of database systems are developed. These new forms of database systems are sometimes referred to as large data storage devices or NoSQL platforms (Jukic et al., 2014). Those certain types of technology are often used by firms such as Amazon and Google to manage information needed in one’s search engine and also provide cloud services. Cloud storage is a method by which customers are supplied with systems that store on the web for the management of all types of data which include files, photos, video files, and e-mails. Cloud storage is supplied by companies like Google, Yahoo, and Amazon (Abadi et al., 2013).
1.2. WHY DATABASES? Imagine having to run a company without realizing what your consumers are, which items you offer, who your employees are, whoever you owe that
Introduction to Database Systems
3
money to, and also who users owe to. All firms must store this sort of data, as well as more, then they essentially make that information accessible to the executive when they require it. It may be contended that the real goal of any corporate data system is to assist organizations in using the information as a resource. Data gathering, collection, compilation, modification, distribution, and administration are at the core of all these technologies (Karp et al., 2000). That data might range from just a few megabytes on only one or two themes to tera-bytes spanning hundreds of subjects inside the firm’s micro and macro environment, based on the kind of data system and the features of the organization. Sprint and AT&T are both recognized for having organizations that stock statistics on billions of telephone conversations (Tolar et al., 2019), through fresh facts actuality uploaded to the network at rates of up and around to 70,000 conversations. Such business must non-solitary store and handle massive amounts of data, but they must also be able to swiftly locate any given information within that information. Consider the situation of Google, the Web’s most popular search engine. Although Google is reticent to provide many specifics regarding its digital storage specs, the situation is thought that the corporation replies to concluded 91 million inquiries each diurnal over an information set that spans many terabytes. Surprisingly, the outcomes of these queries are virtually instantaneous (Zhulin, 2015). How are these companies going to handle all this information? How will they be able to save everything and then rapidly retrieve just the information that judgment needs, whenever they require it? The solution would be that databases are being used. Databases were complex and interconnected enabling computer-based programs to save, organize, and retrieve information fast, as detailed in-depth inside this textbook (Grimm et al., 2013). Practically completely present commercial organizations depend on databases; therefore, each data management skill necessity consumes a systematic grasp of what way these frameworks are produced and how to utilize them properly. Although your professional path does not lead you down the exciting route of file design and construction, databases would be a serious constituent of the technology you deal with. In just about any event, you’ll possibly type choices in your profession created on data-driven knowledge. As a result, understanding the distinction between information and data is critical (Thomson, 1996).
4
The Creation and Management of Database Systems
1.3. DATA VS. INFORMATION You should first comprehend the distinction between information and data to comprehend that drives database structure (Madnick et al., 2009). Data are unprocessed facts. The term “bare” denotes that the information has not been filtered to disclose its significance. Consider the following scenario: you would like to understand what customers of a computer room think about its services. For most cases, you’d start by polling users to evaluate the computer lab’s efficiency (Changnon & Kunkel, 1999). The Website survey form is shown in Figure 1.1, Panels A, which allows people to answer your queries. All raw data from the survey questionnaire is stored in a data source, similar to the ones illustrated in Figure 1.1, Panel B after it has been finished. Although you will have the information, studying page after page of 0 and 1 is unlikely to bring many insights (Chen et al., 2008). As a result, you convert the raw data in the data overview shown in that Panel C of Figure 1.1. It was now feasible to acquire rapid answers to inquiries like “How diverse is our workshop’s consumer improper?” Popular this scenario, you can easily see that the majority of consumers (24.59%) are juniors and seniors (53.01%). You present the descriptive statistical bar diagram in Figure 1.1, Panels D, since visuals may help you rapidly extract information from data (Van Rijsbergen, 1977). Information remains the outcome of analyzing uncooked statistics to discover its significance. Dispensation might range from by way of basic as arranging the statistics to highlight trends to as complicated as forecasting or inferring conclusions using mathematical techniques (Batini & Scannapieco, 2016). Information needs context to show its significance. The temperature measurement of 105°C, for instance, means nothing until you know what it means in sense: This is in Fahrenheit or Celsius degree courses? Is it just a mechanical temperature, a physiological temperature, or the temperature of the outside air? Data may be utilized as a starting point for making decisions (Destaillats et al., 2008).
Introduction to Database Systems
5
Figure 1.1. Taking raw data and turning it into information. Source: https://www.researchgate.net/figure/The-transformation-process-fromraw-data-into-knowledge-3-US-Joint-Publication-3–13_fig1_281102148.
The information description for each item here on the survey questionnaire, for example, may highlight the workroom’s fortes and limitations, allowing you to type more knowledgeable judgments about how to improve service lab clients. Remember that original data should be correctly formatted before being stored, processed, or presented. For instance, in Figure 1.1, Panels C, the pupil classification is organized to present statistics based on the Freshmen, Junior, Junior, Seniors, and Master’s Student categories (McKelvey & Ordeshook, 1985). For data preservation, the yes/no replies of the participants might necessity to be translated into a Y/N arrangement. When dealing with sophisticated data types like audio, movies, or photos, more extensive formatting is necessary treated, or accessible. In Figure 1.1, Panel C, for example, the learner categorization is grouped to display data for freshmen, Juniors, Seniors, and Master’s Students. The yes/no responses of the respondents might have to be rehabilitated towards a Y/N platform for information retention. The additional thorough format is obligatory once employed through complicated statistics formats such as music, video, or pictures (Cheng et al., 2002). The ability to make appropriate decisions within that “information era” requires the development of reliable, useful, and fast data. As a result, competent judgment is critical to a firm’s success in a world market (Stowe et al., 1997). The “knowledge era” is now thought to be upon us. 2 Facts are the basis of knowledge, which would be the collection of facts and information
The Creation and Management of Database Systems
6
regarding a certain topic. Knowledge entails being acquainted with, conscious of, or comprehending data as it relates to a certain environment. The ability to generate “new” information from “old” information is an important feature of knowledge (Schultze & Avital, 2011). Let’s have a look at a few essential points (Rapps & Weyuker, 1985): • • • •
The building components of knowledge are data. Data analysis generates information. The purpose of the information is to explain the sense of data. The key to creating excellent decisions is having reliable, relevant, and real-time information. • In a global world, a wise decision is essential for organizational sustainability. Reliable data is required for timely and meaningful data. Relevant data should be correctly created and kept in an accessible and processable way (Sidorov et al., 2002). The data ecosystem, like every fundamental asset, should be properly maintained. The subject of managing data is concerned with the correct creation, preservation, and recovery of statistics. Specifying the importance of statistics, this would be unsurprising that data analysis is a vital task for every company, government organization, non-profit, or charitable (Kovach & Cathcart Jr, 1999).
1.4. INTRODUCING THE DATABASE A computer database is usually required for effectively managing data. This database is a centralized, interconnected computer system that holds a combination of (Cox et al., 1964): • •
Final data, or basic facts that are of importance to the customer. The final data is integrated and controlled via metadata, usually data around data. The metadata describes the data qualities by way of fine for example the series of relates that attach the information in the folder. This metadata section, for instance, preserves evidence about the identity of every statistics item, the caring of standards placed happening every data item (numbers, periods, or else textual), uncertainty the statistics component may be left empty, etc. The metadata works in favor of and utility of the data by providing additional information (Becerik-Gerber et al., 2014).
Introduction to Database Systems
7
Metadata, in a husk, delivers an extra complete view of statistics in a record. Since of the quality of data, a folder may be mentioned as a “gathering of identity statistics” (Martinikorena et al., 2018). A database management system (DBMS) is the usual program which maintains the folder’s design and then restricts admission to the information it stores. Popular about conducts, a catalog resembled a well-electronic folder cupboard, through a classy software program called a management system assisting in the administration of the coalition’s content (Zins, 2007).
1.4.1. Role and Advantages of the DBMS The DBMS acts a link between the operator besides the server. Every data model is kept as a set of files, with the DBMS being the sole mechanism to retrieve the data within these file systems. The DBMS provides the conclusion customer (or software applications) with a unified, combined model of the statistics stored in the database, as shown in Figure 1.2. All program requirements remain conventional through the DBMS, which before converts them hooked on the complex procedures essential to fulfill them. The folder organization system (DBMS) conceals an amount of the record’s fundamental difficulty after requests and operators. The software application could be developed by a developer in a computer language like Visual Studio, Java, or C#, or it could be generated by a DBMS software developed (Sigmund, 2006).
Figure 1.2. The database management system (DBMS) controls the interface between the end-user then the database. Source: https://slideplayer.com/slide/14646685/.
The Creation and Management of Database Systems
8
The employment of a DBMS among both the employer’s programs and also the file has many benefits. The DBMS, for starters, allows the database’s contents to also be shared across various applications and customers. Second, the DBMS combines the many distinct users’ perspectives on information hooked on a solitary, all-inclusive statistics source (Hübner, 2016). While statistics is the most important rare physical out of which data remains provided, you’ll need a strong data management system. The DBMS, as you’ll see in this book, makes data administration more efficient and productive. A DBMS, for instance, offers benefits such as (LaFree & Dugan, 2007): •
•
•
•
Data Exchange has Been Improved: The database server (DBMSs) facilitates the construction of an ecosystem wherein programs have much more efficient access to information. With this sort of accessibility, individuals can respond quickly to changes within their surroundings. Data Security has Been Improved: The risk of information leakage increases as the number of individuals who’ve had access to data grows. The company invested a significant amount of time, effort, and funds to ensure that its data is properly implemented. A database server (DBMSs) creates a basis for much more effectively implementing information security requirements. Integration of Data is Improved: The availability of excellent data offers a more comprehensive view of the employee’s performance as well as a better grasp of the larger picture. That’s a lot easier to see how actions in one section of the company affect the rest of the company this way (Freilich et al., 2014). Data Inconsistency is Reduced to a Minimum: Information discrepancies occur when several copies with the same data hub are in multiple places. Once a company’s sales department shops a trying to sell representative’s individuality as “Bill Brown” however its worker clothes shops the very same user’s selfidentify as “William G. Brown,” and when a company based on geography marketing firm displays an overall sales of $45.95, however, its geographic marketing firm displays the very same overall sales as $43.95, information corruption is rampant. The odds of having conflicting data are much reduced in a good system (Broatch et al., 2019).
Introduction to Database Systems
•
9
Data Accessibility Has Been Improved: T The Database allows for the quick production of replies to ad hoc questions. An enquiry is a demand for data manipulation submitted to a DBMS (Database server), including retrieving or changing data. Simply said, an investigation is an enquiry, and an ad-hoc basis investigation is a query posed in a spontaneous situation. The DBMS (Database server) replies to the request with such a reply (recognized as the exploration dataset). Ending people may require speedy responses to enquiries (ad hoc queries) when working with big numbers of sales statistics, for example (Degrandi et al., 2020). 1. How much money did each product sell in the last 6 months? 2. What was our selling bonus for the last 3 months for each of your sales representatives? 3. Why do so many of our clients owe us $3,000 or more on their accounts? 4. Better decision-making abilities. Information and greater information connect directly essentially allowing again for the creation of high information, that can then be used to make wise choices. The effectiveness of the data generated is determined by the honesty of the original information. Quality of the data refers to a comprehensive approach to guaranteeing data accuracy, authenticity, and accessibility. Whereas a database server (DBMSs) cannot guarantee data quality, it may provide a platform for such efforts. 5. End-user productivity has increased. Users will be able to make quick, informed decisions depending upon the availability of data as well as the technology that converts data into usable information, which could be essential to a good global economy. The advantages of using a database model (DBMS) aren’t limited to those listed above. In reality, as you get a better understanding of the technical aspects of systems as well as how to correctly create them, you may discover a slew of other advantages (Gerardi et al., 2021).
1.4.2. Types of Databases A file organization system (DBMS) may handle a wide range of data stores. Databases are categorized based on the number of customers, record
10
The Creation and Management of Database Systems
position(s), and projected kind besides the degree of usage. The amount of operators in such a database affects whether it is solitary or multiuser. Just one user may access a central database at such a time. To put it another way, if customer A used the data, customers B and C should wait unless he or she is through. A desktop database is a centralized repository that operates on such a personal computer (Chen et al., 2021). Multiple access databases, on the other hand, may accommodate several users simultaneously. A workplace database is just a multiple-user database that serves a limited amount of users (typically less than 50) or a single department within an organization. The organization database is used by the entire company and accommodates a high amount of users (usually hundreds) from various departments (Johnson, 2021; Kodera et al., 2019). The folder could potentially be classified based on its location. A central database, for instance, handles data from a single location. A dispersed database is a type that allows data that is dispersed over several places. Consolidated Database Systems define the extent to which a database may be extended and how that dispersion is managed (Gavazzi et al., 2003). Unfortunately, the most common method of categorizing databases nowadays is depending on how they’ll be utilized and the temporal quality of the data obtained from them. Events like product and service purchases, payments, and supplier purchases, for instance, indicate important day-today activities. Such activities must be correctly and promptly documented. An operating database is a type that is mainly used to assist a firm’s dayto-day activities (occasionally mentioned as a transactional or making database) (Urvoy et al., 2012; Beets et al., 2015). A database server, on the other hand, is mainly concerned with data storage that is utilized to create information needed to make strategic or tactical choices. To create price judgments, sales predictions, positioning strategies, and other decisions, substantial “data massaged” (data modification) is usually required (Joshi & Darby, 2013). Most judgment data is derived from database systems and kept in data stores over time. Furthermore, the database server may store information generated from a variety of sources. The data storage structure differs from that of an operating or operational database to make it simpler to obtain such data. Storage systems, Business Analytics, and Data Stores are all terms that refer to the design, deployment, and usage of data stores (Cole et al., 2007). Databases may also be divided into categories based on how organized the data is. Data that occurs in the situation unique (rare) form, that is, inside
Introduction to Database Systems
11
the structure in which it remained collected, remains mentioned as formless information. By way of an outcome, shapeless statistics happen in a form which organizes non-advance of the situation to the dispensation of material. Statistics processing is the outcome of collecting complex statistics and organizing (organizing) them to make it easier to store, utilize, and generate data (Rosselló-Móra et al., 2017). Architecture (format) is applied depending on the sort of information processing you plan to undertake. Approximately information may non remain prepared (shapeless) for about kinds of analysis, nonetheless, they may remain suitable (organized) for others. The data point 37890, for example, maybe a postcode, a sales amount, or a product code. You cannot do mathematical operations with just this value if it reflects a postcode or an item number and is saved as text. If this number indicates a sales transaction, however, it must be formatted as a numerical score (Yoon et al., 2017). Consider stacks of freshly printed bills to better show the structural notion. You may be scanned and save all the receipts in a visual format if you only wish to keep them as photos for search purposes and display. Using visual storage, on either hand, will not be beneficial if you wanted to generate statistics like quarterly totals or average revenue. Instead, you might save the invoicing data in an (organized) spreadsheet file and conduct the necessary calculations from there. Most data you see may be categorized as data that can be organized (Kim et al., 2012). Data that has been partially processed is referred to as semi-structured information. When you glance at a standard Website page, for instance, the material is displayed to you in a predetermined manner to communicate some data. The database kinds covered so far are all concerned with storing and managing highly organized data. Companies, on the other hand, are not restricted to the usage of structured data. Semi-structured or unstructured data are also used. Consider the very useful information included in corporate e-mails, memoranda, procedures, and regulations papers, Web sites, and so on. A younger group of databases termed XML databases has been developed to handle the storing and administration of unorganized and semistructured information. Expandable Mark – up language (XML) is a text-based mark-up language for representing and manipulating data components. Semi-structure XML data may be stored and managed in an XML library (El Haddad et al., 2017) (Table 1.1).
12
The Creation and Management of Database Systems
Table 1.1. The Characteristics of Many Famous File Organization Systems are Compared
1.5. IMPORTANCE OF DATABASE DESIGN The efforts centered just on the development of a database schema that’s used to store and organize finished data are referred to as data modeling. A collection that fits all of the user’s needs does not appear anywhere; its architecture must be meticulously planned. The majority of this textbook is devoted to the preparation of powerful network design approaches since the data model is an important component of dealing with networks. With an improperly built database, even just a decent DBMS will perform badly (Bell & Badanjak, 2019; Herodotou, 2016). To create a good database, the designer must first figure out what the database will be used for. The need for reliable and consistent information, as well as operating speed, is emphasized while building a transaction database. The utilization of archived and consolidated data is stressed when creating data storage databases. A centralized, solitary database needs a different strategy than a dispersed, multiple-access database. To use these double multiple tables common in most databases, decomposing relevant data sources of data and information is indeed a procedure. Each component of the data should be appropriately deconstructed and kept in its table (Bissett et al., 2016). In addition, the links between such tables should remain prudently researched and then executed consequently that the unified opinion of both the statistics may stand reproduced for example data again for the conclusion customer afterwards. A very well file types statistics management calmer while also producing precise and relevant information (Kam & Matthewson, 2017). A seriously built folder is more likely to become a breeding place for hard-to-trace mistakes, which may lead to poor making decisions, which
Introduction to Database Systems
13
can ultimately to a group’s collapse. It’s just too critical to leave database design to chance. This is why database design is a popular subject among college students, why businesses of all kinds send employees to database development workshops, and also why data modeling consultants may earn a lot of money (Tiirikka & Moilanen, 2015).
1.6. EVOLUTION OF FILE SYSTEM DATA PROCESSING Examine what a file isn’t to have a better understanding of what this is, what it accomplishes, and how to utilize it properly. Recognizing the shared data limits that databases strive to overcome may be aided by a quick description of the development of system files information processing. Recognizing these constraints is important for database design professionals since database technology does not miraculously solve these issues (Smith & Seltzer, 1997); rather, they make it simpler to devise solutions which avoid them. Developing database systems that avoid the errors of prior systems necessitates a designer’s understanding of the previous systems’ flaws as well as how to overcome them, or indeed the database technology will be no superior (perhaps even better!) than just the bits of knowledge and methodologies they have superseded (Maneas & Schroeder, 2018).
1.6.1. Manual File Systems Every organization should develop systems to handle fundamental business functions to remain effective. Such methods were frequently manual, document systems in the past. The articles in such systems were structured to make the data’s planned application as simple as possible. This has been usually performed using a filing system consisting of manila folders and filing systems. The manual approach performed effectively as a data source as soon as data gathering was limited and a group’s business customers had few reporting needs. Maintaining track of the data in a mechanical file system got increasingly challenging as firms developed and reporting needs became more sophisticated. As a result, businesses turned to digital knowledge aimed at assistance (McKusick & Quinlan, 2009).
14
The Creation and Management of Database Systems
1.6.2. Computerized File Systems Reporting after mechanical data files remained sluggish besides inconvenient. Whenever a well-designed system was employed, some company managers encountered state reporting obligations that demanded weeks of intense labor each quarterly (Kakoulli & Herodotou, 2017). As a result, a statistics dispensation (DP) expert remained employed to develop a computer-based organization for tracking data and generating essential bits of intelligence. The electronic records within system files were originally comparable to the physical files. Figure 1.3 shows a basic example of client information files for a modest insurance firm. (You’ll see immediately that, even though the file system illustrated in Figure 1.3 is common data storage, it’s unsuitable for a database) (Sivaraman & Manickachezian, 2014). The explanation of digital data necessitates the use of specialist terminology. To allow professionals to communicate effectively, each profession generates its language. The fundamental file vocabulary presented in Table 1.2 will make it easier for you to follow along with the rest of the conversation (Heidemann & Popek, 1994).
Figure 1.3. The customer file’s contents. Source: https://slideplayer.com/slide/16127191/.
Introduction to Database Systems
15
Table 1.2. Terminology for Basic Files
You may recognize the file elements displayed in Figure 1.3 by using the right file nomenclature listed in Table 1.2. There are ten entries in the User file illustrated in Figure 1.3. C PHONE, C DESIGNATION, C ZIP, C ADDRESS, A PHONE, AMT, TP, and REN are the 9 fields that make up every record. The ten entries are saved in a file with a unique name. The filename of the document in Figure 1.3 is Consumer since it includes customer information again for an insurance agency (Pal & Memon, 2009). Business customers issued demands to the DP expert for information from the electronic file. For every demand, the DP expert had to write programs to get data files, alter this as the consumer desired, and printed them out. When a user requests a previously ran report, the DP expert may restart the program and display the findings for them. Other corporate users want to examine the data in the same inventive ways that consumer data was being presented (Blomer et al., 2015; Zhang et al., 2016). This resulted in increased demands for the DP expert to develop electronic files of those other company data, which necessitated the creation of new data processing and demands for reporting. The insurer company’s sales team, for instance, developed a Sale file to monitor daily sales activities. The effectiveness of the sales team remained so apparent that the employment section wanted admission to a DP expert in automating payroll services as well as other employment duties. As a result, the DP expert was requested to produce the Agents file seen in Figure 1.4. The information in the Agents file was utilized for a variety of functions, including writing checks, keeping track of taxes collected, and summarizing health insurance (Adde et al., 2015).
16
The Creation and Management of Database Systems
Figure 1.4. The agent file’s contents. Source: https://slideplayer.com/slide/6206909/.
The drawbacks of this sort of file structure became evident as more digitized files were generated. Whereas these issues are discussed in further depth in the following section, in summary, the issues revolved around having a bunch of file systems containing relevant, frequently overlapped data and no way to control or manage data uniformly across whole files. Every file inside the network utilized one’s application software to store, access, and modify data, as illustrated in Figure 1.5. Each file belonged to the section or entity that had requested its development (Merceedi & Sabry, 2021).
Figure 1.5. A straightforward file structure. Source: https://slideplayer.com/slide/8951940/.
Introduction to Database Systems
17
The introduction of computer systems toward stock firm statistics was crucial; it not solitary marked a turning point popular the usage of processer technology then likewise marked a substantial advancement in a company’s capacity to process data. Employees had immediate, practical admission to altogether corporate statistics before. However, they lacked the necessary tools to transform the data into the data they required (Saurabh & Parikh, 2021). The development of computerized data files provided them with better tools for modifying firm data, allowing them to generate new data. Unfortunately, it had the unintended consequence of creating a rift among end consumers and their information. The need to bridge the gap among endusers’ data inspired the development of a wide range of technological tools, and designs, as well as the use (and misuse) of a wide range of technologies and methodologies. However, these advances resulted in a divide in how data was interpreted by DP professionals and end consumers (Ramesh et al., 2016). •
•
The electronic files inside the operating organization remained designed to remain identical to the physical records, according to the DP expert. To add to, edit, and remove data files, information management applications were invented. The technologies, from the standpoint of the end customer, isolated the people from the information. The postponement between when the consumers thought up a fresh way of creating knowledge from big data and when the DP expert could start creating the initiatives to start generating that information became a cause of great disillusionment as the consumers’ competitive landscape pushed the others to start making even more choices in even less time.
1.6.3. File System Redux The demand of operators aimed at direct, practical admission to the information assisted fashionable the development of desktop processors for corporate purposes. However, not straight connected to the development of data files, the extensive practice of individual efficiency apps strengthens sources similar subjects by way of older file types (Johnson & Laing, 1996). Spreadsheet applications for personal computers, including Microsoft Excel, are popular among business users because they enable users to input information in a sequence of columns and rows and edit it using a variety of purposes. The widespread usage of worksheet programs consumes allowed
The Creation and Management of Database Systems
18
users to do complex data investigations, considerably improving their capacity to comprehend information and make better choices. However, users have gotten so competent at dealing with a spreadsheet that they prefer to use them to do jobs about which sheets aren’t fit, although in the old saying “When all you have is just a hammer, every issue seems like a nail” (Prabhakaran et al., 2005). One of the most typical ways spreadsheets are misused is as a database replacement. Consumers, for example, often store the limited information to then they have immediate access to cutting-edge a worksheet in a structure like that of conventional, mechanical information storing organizations— exactly how early DP professionals did when building electronic data files. The resultant “folder scheme” of spreadsheets suffered from the very same difficulties as that of the file systems built by the initial DP specialists, which are discussed in the following unit, owing to the high quantity of operators using worksheets, respectively producing independent duplicates of information (Oldfield & Kotz, 2001).
1.7. PROBLEMS WITH FILE SYSTEM DATA PROCESSING The new approach to organizing and storing information was a significant advance over the physical process, and it helped a valuable role in information processing for more than two decades—a significant amount of time inside the computer age. Despite this, several flaws and limits in this technique became apparent. A criticism of the system files technique accomplishes two goals (Ghemawat et al., 2003): •
Knowing the file program’s flaws can help you comprehend the evolution of current databases. • Most of those issues aren’t exclusive to the file system. And although database technology is making it simple to evade of that kind problem, a lack of understanding of them will almost certainly result in their redundancy in a data center. The difficulties with data files, while produced by DP professionals or by a succession of spreadsheets, pose a serious threat to the sorts of evidence that may be derived after the statistics, and also the veracity of the data (Ovsiannikov et al., 2013): •
Prolonged Development Times: The first and most apparent fault including both file system systems is that even the most basic
Introduction to Database Systems
19
data restoration activity necessitates a large amount of software. Using older file systems, software engineers had to describe what eventually happened and just how much remains to be improved. As you’ll learn in the following chapters, modern databases employ a behavior elements data management vocabulary that allows the user to declare whatever needs to be changed while also describing how it should be done. Getting timely replies is tough. Because of the need to construct programs to produce these most basic statistics, ad hoc enquiries are challenging. DP experts who work with current data sources are often besieged with demands for extra reports. Researchers are frequently required to say that the research will be finished “in the following weeks” or “in the next month.” Having to wait till next week from now will not serve if you require knowledge right now (McKusick et al., 1984). •
•
Administration of a Complex System: System administration becomes increasingly difficult as the number of files in the system rises. Even just a simple file network with only a few files demands the establishment and maintenance of several document management subsystems (respectively binder requirement has its folder society strategies that certify the worker to enhance, regulate, and eliminate records; slope the file contents, and produce bits of intellect). Because ad hoc queries are not possible, document monitoring systems may quickly expand. The problem is worsened by the fact that each department of a business “owns” its data by creating its folders. Data Exchange is Restricted due to a Lack of Security: The second drawback of a systems documents knowledge depository is its lack of security and limited data interchange capabilities. Information security and data transmission are tightly intertwined. Data exchange among multiple individuals who are geographically distributed creates several security issues. Most spreadsheet software contains basic security protections, but they are seldom used, and even when there are, these are insufficient for successful data sharing between users. Because security and informational capabilities are hard to implement, they are often disregarded when developing data management and monitoring solutions in a systems files context. Password protection security, the capacity to turn down parts of files or whole systems, and other measures to protect data secrecy are instances of these kinds
The Creation and Management of Database Systems
20
of abilities. Furthermore, when using alerts to improve data and information protection, their scope and effectiveness are often limited. • A lot of Programming: Updating an existing filesystem in a systems files environment might be difficult. Modifying only one parameter in the main Client file, for instance, would need the use of the software (Ergüzen & Ünver, 2018): • •
Takings a track after the unique data and delivering it. Involves transforming data to encounter the storing needs of the new construction. • Converts the information and protects it in a different folder construction. • For every entry in the unique data, replication stepladders through 4. Respectively alteration to a folder’s format, no issue in what way diffident, requires variations in all requests that trust on the data within this folder. Problems (germs) are expected to happen as an outcome of alterations, and extra time is required to utilize a correcting technique to find these mistakes. These limits, in turn, cause physical and information dependency issues (Alange & Mathur, 2019).
1.7.1. Structural and Data Dependence The mechanical dependence of a folder organization designates that convenience to a document remains determined by its structure. Introducing a client date-of-birth column towards the Consumer file as Figure 1.3, for instance, would need the four procedures outlined in the preceding section. Neither of the prior apps will operate the with new Consumer file format as a result of this modification (Tatebe et al., 2010; Shafer et al., 2010). As a consequence, all organization files request determination essential to being rationalized to follow the fresh filesystem. In brief, system file software applications display structural dependency since they are influenced by changes in the file structure. Architectural independence, on the other hand, arises when alterations to the folder structure may be made without impacting the client system’s capability to obtain the contents (Eck & Schaefer, 2011). Adjustments in information properties, such as converting an integers ground to a decimals ground, need modifications in all applications that use the folder. The filing system is considered to be data-dependent since all
Introduction to Database Systems
21
shared data routines are liable to change when some of the application’s data storage properties change (that is, altering the type of data). Data integrity, on the other hand, arises when changes in digital storing possessions may be made deprived of affecting the agenda’s capacity to admissibility the information (Veeraiah & Rao, 2020). The contrast between both the conceptual file formats (how a person sees data) as well as the actual file format is the actual relevance of data reliance (in what way the processer must effort by the data). Any application that reads a file from an operating system must somehow express to the processor what to do, as well as how to do it. As a result, each application must include lines that indicate the creation of a certain folder kind, as well as records and feature descriptions. From the perspective of a developer and database management, data dependency makes a data structure highly inconvenient (Jin et al., 2012).
1.7.2. Data Redundancy The design of the storage device makes it hard to combine statistics from many bases. Additionally, the file program’s lack of protection makes it subject to security vulnerabilities. The organizing structure encourages diverse sites to store the same core data. (Database specialists refer to such dispersed data sites as “islands of knowledge”) (Mahmoud et al., 2018). The usage of a spreadsheet to store information adds to the dispersal of data. Through the data monitoring and management tools established either by DP experts, the whole sales team would have access to a Sales figures file on system files. Each employee of the sales team may produce him as well as her version of the sales figures using a spreadsheet (Magoutis et al., 2002). Although data kept in multiple places is difficult to be continuously updated, communication islands often include a distinct version with the same material. The agent identities and telephone numbers, for instance, appear both in the Client and the Agents files in Figures 1.3 and 1.4. Just one accurate copy of both the agent’s contact numbers is required. Redundancy is created when they exist in many locations. While the same data is collected redundantly in many locations, data redundancy occurs (Lin et al., 2012). Data redundancy that is uncontrolled lays the scene for (McKusick & Quinlan, 2009): •
Poor Data Security: Once you have numerous copies of the data, the odds of one of them being susceptible to hacking grow. The
The Creation and Management of Database Systems
22
concerns and strategies related to data protection are explored in Chapter 15, Data Management and Security. • Data Inconsistency: Once various and contradictory copies with the same information emerge in different locations, this is known as data inconsistencies. Consider changing the contact information or address of an operator in the Agents file. When you don’t make the necessary adjustments to the Consumer file, the information for the same agents will be different in the two files. Depending on which versions of the data are utilized, reports will provide conflicting findings. Complicated entries (including such 10-digit telephone statistics) that are made in many files and/or repeated often in one or even more files are much more probable to occur in entering data mistakes. In reality, the third entry in the Consumer file includes a reversed number in the owner’s contact quantity, as seen in Figure 1.3. (615–882–2144 rather than 615–882–1244) (Kim et al., 2021). Although it is conceivable to add the contact info of a fictional salesperson into the Client file, consumers are unlikely to be pleased if the insurance company provides the contact info of a fictitious agent. Must a non-existent agency be eligible for incentives and perks, according to the personnel department? In reality, a data input mistake, including an inaccurately spelt name or a wrong mobile number, might cause data security issues (Menon et al., 2003). •
Data Anomalies: “An irregularity,” according to the dictionary. A modification to an attribute value would generally be performed just once. Redundancy, on the other hand, creates an irregular situation by causing field changes in the price in several places. Figure 1.3 indicates the Customer file. The identity, location, and mobile number of agent Leah F. Hahn are likely to shift if she marries and moves. Rather than making a specific identity and/ or mobile transformation in a single document (AGENT), you should make the switch in the Consumer file every time a particular agent’s identity, mobile number, and location appear. You may be confronted with hundreds of revisions to make, one in each of the consumers handled by such an agent! Whenever an agent chooses to leave, the same issue arises. A new agent should be allocated to each client handled by the agent. To ensure data integrity, each update in an attribute value should be done accurately in several locations. Whenever not many of the essential modifications to
Introduction to Database Systems
•
•
•
23
the data redundancy are completed correctly, a data anomaly occurs. The following are some popular definitions for the data anomalies seen in Figure 1.3 (Welch et al., 2008): Anomalies Should be Updated: When agent Leah F. Hahn gets a new number, it must be recorded in each Consumer file entry where Ms Hahn’s contact information appears. Only three modifications are required in this situation. A modification like this might affect hundreds or thousands of entries in big system files. There is a lot of room for data discrepancies. Insertion Anomalies: When enhancing a new mediator, you would indeed construct a fake customer information entry to represent the new agent’s inclusion perhaps if the Consumer file exists. Again, there’s a lot of possibility for data contradictions to arise. Deletion Anomalies: You would also erase John T. Okon’s agent information when you remove the clients Amy B. O’Brian, George Williams, and Olette K. Smith. This is not an ideal situation (Maneas & Schroeder, 2018).
1.7.3. Lack of Design and Data-Modeling Skills Users generally lack the understanding of good techniques used for data abilities, which is a new challenge that has emerged with the usage of electronic productivity apps (such as spreadsheets and desktops database). People, by nature, have a holistic picture of the facts in their surroundings. Imagine a student’s school schedule, for instance (Mukhopadhyay et al., 2014). The student’s identity card and identity, as well as the classroom code, class descriptions, class credits, the title of the teacher conducting the classroom, the meetings and training dates and timings, as well as the regular classroom numbers, are all likely to be found on the timetable. These many data components form a single entity in the child’s consciousness. When a student group wants to maintain track of all of its members’ plans, an individual might create a spreadsheet to maintain track of the data. Even though the student ventures into the domain of desktops databases, he or she is unlikely to construct a structure consisting of a single database that closely resembles the agenda’s layout. As you’ll see in the next episodes, cramming this much information into a data flat table construction is a bad data architecture that results in a lot of duplication for numerous data items (Liao et al., 2010; Lu et al., 2017).
24
The Creation and Management of Database Systems
The ability to model data is also an important aspect of a design phase. It is critical to adequately record the style that is generated. To ease communication between the database developer, the end consumer, as well as the programmer, design documentation is needed. The most popular approach to describing database systems is data modeling, which will be discussed later in this work. The implementation of a consistent data-modeling approach guarantees that the information model performs its function of easing communication between the architect, customer, and programmer (Wang et al., 2014). When it comes to maintaining or upgrading a database as commercial technology changes, the database schema is a significant resource. Final data designs are seldom documented, and they are never accompanied by a consistent data-modeling approach. On a more positive note, if you are reading a book, users are undergoing the sort of training required to develop the database techniques used for data model construction skills required to create a database which guarantees consistency of data, imposes honesty, and provides a secure and adaptable framework for providing people with timely, reliable information (Salunkhe et al., 2016).
1.8. DATABASE SYSTEMS Because of the issues with data files, the DBMS is a far better option. The DBMS, unlike system files, which have many independent and unconnected files, is made up of legally connected data kept in a solitary rational data source. (The “rational” name points to the belief that the information repository’s information may well be physically spread among numerous data storage sites and/or regions, despite the article appearing towards the end-user to be a single unit) (Paton & Diaz, 1999). The database marks substantial changes to the way final information is stored, retrieved, and controlled since the database’s dataset is a single rational entity. The database’s DBMS, seen in Figure 1.6, offers various benefits over system file administration, as seen in Figure 1.5, by removing the majority of the filing system’s inconsistent data, data abnormality, data reliance, and organizational dependence issues. Better still, today’s DBMS software maintains not just database systems, but also connections between them and access pathways to them, all in one place. All essential access pathways to those elements are likewise defined, stored, and managed by the present generation of Database systems (Güting, 1994).
Introduction to Database Systems
25
Figure 1.6. Comparing and contrasting database and file management organizations. Source: https://slideplayer.com/slide/9384902/.
Keep in mind that such a DBMS is only one of the numerous essential aspects of a DBMS. The DBMS is sometimes described to this as the database program’s beating heart. Therefore, just like a human person requires more than a heartbeat to operate, a database system needs as much as a DBMS to work. You’ll discover come again what a DBMS is, whatever its elements are, as well as how the DBMS works through into the record management picture in the following section (DeWitt & Gray, 1992).
1.8.1. The Database System Environment The phrase DBMS describes a set of elements that describe and control how data is collected, stored, managed, and used in a data center. A database system consists of the five key pieces depicted in Figure 1.7: software, hardware, personnel, processes, and information from a basic managerial standpoint (Connoly et al., 2002).
The Creation and Management of Database Systems
26
Now let us look at the five factors in Figure 1.7 in more detail (Zaniolo et al., 1997): •
Hardware: Hardware includes all physical equipment in a network, such as processors (Personal computers, workstations, servers, and quantum computers), storage systems, printing, network equipment (centers, switching, router, fiber optic), and other components (automated teller machines (ATM), ID book lovers, and so on) (Bonnet et al., 2001).
Figure 1.7. The database system’s surroundings. Source: https://www.slideshare.net/yhen06/database-system-environmentppt-14454678.
•
•
Software Even though the DBMS has been the most well-known piece of software, the database system would require three kinds of software to work properly: an operational software program, a DBMS, plus applications and tools. Operating System Software All hardware devices are managed, and other such software can operate on the machines. Microsoft Windows, Linux, Mac OS, UNIX, and VMS are samples of operating systems.
Introduction to Database Systems
•
•
•
• •
•
•
27
The DBMS program administers the database inside the DBMS. Microsoft SQL Server, Oracle Corporation’s Oracle, Sun’s MySQL, and IBM’s DB2 are instances of Database systems (Abadi et al., 2013). Application programs and utility software are being used to govern the computing system wherein the access control and modification take place, as well as to access and modify data in the DBMS. A typical function of software applications is to retrieve data from a database to create presentations, tables, and charts, or other statistics to aid judgment. Utilities are software programs which are used to maintain the computer parts of a DBMS. For instance, every one of the main DBMS suppliers now offers graphical interfaces (GUIs) to assist in the creation of database architecture, security systems, and database monitoring (Özsu & Valduriez, 1996). People: Each component comprises all database management users. In such a DBMS, five groups of users may be distinguished based on their principal employment features: network managers, database managers (DMs), database designers, system analysts and developers, and home customers. Every user-defined has distinct and complementary functions, as outlined below. System administrators manage the overall operations of the DBMS. Database administrators, DBAs, or database administrators, are in charge of the DBMS and ensuring that it is up and running. The function of the DBA is significant enough to justify Data Processing and Protection (Silberschatz et al., 1991). Database designers construct the application’s design Engineers are, in a sense, database designers. Even the greatest app developers and devoted DBAs will be unable to create a functional DBMS if indeed the database architecture is bad. Because companies want to get the most out of their knowledge, the database developer’s position description has grown to include additional dimensions and duties. System Analysts and programmers: The software applications must be designed and implemented. Ending access to users and editing database data via data input screens, statistics, and processes that they design and develop (Özsu, & Valduriez, 1999).
The Creation and Management of Database Systems
28
•
End users are the individuals who utilize software applications to carry out the day-to-day activities of the company. Consumers include salespeople, administrators, managers, and executives, for instance. That information acquired form of the database is used by top-end customers to make operational and strategic choices. • Procedures: Processes are the guidelines and regulations that control the database program’s design and operation. Processes are an essential portion that is sometimes overlooked. Procedures are crucial in a corporation because they establish the rules by which the firm and its customers do business. Processes are often used to guarantee that either the data that enters the network or the knowledge that is created through the usage of that data are monitored and audited in an ordered manner (Tamura & Yokoya, 1984). •
Data: The term “data” refers to the document’s collection of information. Although the information is the main ingredient out of which knowledge is formed, the database developer’s work includes determining what data should be added to the system and how certain data should be arranged. A database system gives an organization’s operational system one new dynamic. The complexity of this administrative structure is affected by the size, responsibilities, and company culture of the firm. As a result, database systems may be built and operated at various layers of difficulty and with differing degrees of respect to strict rules. Consider a local film renting system to a state pension claiming system, for instance (Kießling, 2002). The movie streaming system could be run by two persons, the hardware would most likely be a single PC, the processes would most likely be basic, as well as the data volume would be little. The state pension claims framework is likely to obtain one technological dimension, many comprehensive DBAs, and many developers and designers; this same hardware is available in most countries across the U.s.a; the processes are highly probable to be multiple, complicated, and stringent; and the volume of information is likely to be significant (Abadi et al., 2009). Management must consider another significant factor in addition to various degrees of database management difficulty: database systems must be cost-efficient, operational, and conceptually successful. Creating a million-dollar resolution for a thousand-dollar issue that’s hardly a prime illustration of database management choice, database architecture, or
Introduction to Database Systems
29
database administration. Lastly, the database technology currently in use is likely to influence the DBMS chosen (Bitton et al., 1983).
1.8.2. DBMS Functions A DBMS performs numerous critical operations that ensure the database’s information’s continuity and accuracy. The majority of these tasks are invisible to target consumers and could only be accomplished with the help of a DBMS (Florescu & Kossmann, 2009). Database structure organization, digital storage strategic planning, the transformation of data and appearance, information security, multiple user security systems, backup, and restore managerial staff, data security strategic planning, database server language groups and application software, and database networking technology are some of the topics covered. The next sections go through each of those functions in detail (Kifer et al., 2006). •
•
Data Dictionary Management: A data dictionary is used by the DBMS to record the descriptions of data components and associated relations (meta-data). As just a result, the DBMS is used by all applications that Microsoft access information. The DBMS looks for the relevant data element structures that exist in the data definition language (DDL), which saves you the trouble of having to write such complicated relationships for each application. Furthermore, any modifications made to the data structure are immediately stored in the data dictionaries, saving you the trouble of having to alter all of the applications that use the modified structure. To put it another way, the DBMS abstracts data and frees the systems from architectural and data dependencies. Figure 1.8 depicts the data description again for Customer id in Microsoft SQL Express (Rabitti et al., 1991). Data Storage Management: The DBMS builds and administers complicated digital storage systems, freeing you from the laborious process of specifying and implementing physical data properties.
The Creation and Management of Database Systems
30
Figure 1.8. Using Microsoft SQL server express to visualize information. Source: https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sqlgraph-architecture?view=sql-server-ver16.
A contemporary DBMS stores, not just data, but also associated data input form or screen specifications, reporting meanings, validation criteria, operational code, and frameworks to manage audiovisual and image formats, among other belongings. Database performance optimization also requires digital stowage administration (Jagadish et al., 2007). Performance tuning mentions the movements that improve database performance in terms of both storage and retrieval speed. Even though the database appears to the user as a unified digital storage block, the data is stored in a variety of physical file systems by the DBMS. (For further information, see Figure 1.9.) These datasets can be saved on a variety of storage mediums. As a result, the DBMS does not have to wait for one disk demand to complete before moving on to another. In those other terms, the DBMS can handle several database queries at the same time. In Chapter 11, Performance Monitoring Engineering and Query Processing, you’ll learn about digital storage control and business tuning (Chaudhuri & Narasayya, 2007). •
Data Transformation and Presentation: The folder organization (DBMS) variations statistics such that it imitates the desirable data structures. You don’t have to worry about an individual between physical and Because the DBMS performs it just for you, you don’t have to worry about logic types. That is, the
Introduction to Database Systems
•
31
DBMS arranges the physically received data to fulfill the user’s logical needs. Consider a corporate database utilized by a global corporation. In England, an end consumer might input data like July 11, 2010, as “11/07/2010.” In U. S., the very same day would be recorded as “07/11/2010.” The DBMS must handle the data in the correct format for each nation, irrespective of the display type (Thomson & Abadi, 2010). Management of Security: The database server (DBMSs) increases the security framework that protects individuals and their data. Security practices govern whose users have access to the data, what data types they may view, and which data operations (read, add, delete, or modify) they can do. This is especially important in multiuser database systems. Database Administration and Security, examines data security and privacy issues in greater detail. All database users may be authenticated to the DBMS through a username and password or through biometric authentication such as a fingerprint scan. The DBMS uses this information to assign access privileges to various database components such as queries and reports.
Figure 1.9. Using oracle to demonstrate data storage management. Source: https://docs.oracle.com/cd/E11882_01/server.112/e10897/storage.htm.
All users should use a login and password or biometrics identification, including a fingerprint reader, to log in with the database systems. That information is used by the DBMS to give access rights to database elements including searches and reporting (Jarke & Koch, 1984).
The Creation and Management of Database Systems
32
•
•
•
•
•
Access Control for many Users: To guarantee consistency of data, computer DBMS utilizes a complicated mechanism that assures that multiple users may access the single database without endangering the security of the program. The Transactions Processing Usually includes the Administration section goes into the details of different users’ internet connectivity. Management of Backups and Recoveries: The Database offers security protocols and the rest preserve information security and integrity. Most DBMS platforms have capabilities that allow the DBA to perform both standard and bespoke backup and restore procedures. Restoration management is primarily concerned with the recovery of a database after a failure, such as a power loss or damaged disk sector. This functionality is required for the website’s security to be maintained (Kao & Garcia-Molina, 1994). Management of Data Integrity. The database server promotes and enforces authenticity principles, which lowers duplicate data and improves data integrity. The association between the data contained in the data definitions is used to assure data integrity. Maintaining data integrity is crucial in payment processing data stores (Abadi et al., 2006). Database Access Languages and Application Programming Interfaces (APIs): A querying language is used to access data in the DBMS. A programming language is a behavioral components language that allows users to indicate what needs to be done but not how it should be done. The Structured Query (SQL) is indeed the de facto querying language and shared data standards, and this is shared by the majority of DBMS providers. The usage of SQL is covered in the overview of Organized Enquiry Linguistic (SQL) and Chapter 8, Progressive SQL. The DBMS also support model languages including COBOL, C, Java, Visual Basic .NET, and C# via APIs. The DBMS also includes administration tools that the DBA and network designer may utilize to construct, organize, manage, and manage the database (Stefanidis et al., 2011). Interfaces for Communicating with Databases: End-user queries are accepted by existing DBMSs across a variety of network configurations. For instance, the DBMS may allow
Introduction to Database Systems
• • •
33
users to access the database via the Web using Web browsers like Mozilla Firefox or Microsoft Internet Explorer. Interactions in this setting may be performed in a variety of ways: Consumers may fill out screen forms using their favorite Net browser to produce replies to enquiries. The DBMS may issue planned statistics on a webpage regularly. The DBMS may interconnect with third-party platforms to direct material through e-mail or even other efficiency packages (Ramakrishnan & Ullman, 1995).
34
The Creation and Management of Database Systems
REFERENCES 1.
Abadi, D. J., Boncz, P. A., & Harizopoulos, S., (2009). Column-oriented database systems. Proceedings of the VLDB Endowment, 2(2), 1664, 1665. 2. Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., & Madden, S., (2013). The design and implementation of modern column-oriented database systems. Foundations and Trends® in Databases, 5(3), 197– 280. 3. Abadi, D., Madden, S., & Ferreira, M., (2006). Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (Vol. 1, pp. 671–682). 4. Adde, G., Chan, B., Duellmann, D., Espinal, X., Fiorot, A., Iven, J., & Sindrilaru, E. A., (2015). Latest evolution of EOS filesystem. In: Journal of Physics: Conference Series (Vol. 608, No. 1, p. 012009). IOP Publishing. 5. Alange, N., & Mathur, A., (2019). Small sized file storage problems in Hadoop distributed file system. In: 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT) (Vol. 1, pp. 1202– 1206). IEEE. 6. Batini, C., & Scannapieco, M., (2016). Data and Information Quality (Vol. 1, pp. 2–9). Cham, Switzerland: Springer International Publishing. 7. Becerik-Gerber, B., Siddiqui, M. K., Brilakis, I., El-Anwar, O., ElGohary, N., Mahfouz, T., & Kandil, A. A., (2014). Civil engineering grand challenges: Opportunities for data sensing, information analysis, and knowledge discovery. J. Comput. Civ. Eng., 28(4), 04014013. 8. Beets, G. L., Figueiredo, N. L., Habr-Gama, A., & Van De, V. C. J. H., (2015). A new paradigm for rectal cancer: Organ preservation: Introducing the international watch & wait database (IWWD). European Journal of Surgical Oncology, 41(12), 1562–1564. 9. Bell, C., & Badanjak, S., (2019). Introducing PA-X: A new peace agreement database and dataset. Journal of Peace Research, 56(3), 452–466. 10. Bissett, A., Fitzgerald, A., Meintjes, T., Mele, P. M., Reith, F., Dennis, P. G., & Young, A., (2016). Introducing BASE: The biomes of Australian soil environments soil microbial diversity database. Gigascience, 5(1), s13742–016.
Introduction to Database Systems
35
11. Bitton, D., DeWitt, D. J., & Turbyfill, C., (1983). Benchmarking Database Systems-A Systematic Approach (Vol. 1, pp. 3–9). University of Wisconsin-Madison Department of Computer Sciences. 12. Blomer, J., Buncic, P., Meusel, R., Ganis, G., Sfiligoi, I., & Thain, D., (2015). The evolution of global scale filesystems for scientific software distribution. Computing in Science & Engineering, 17(6), 61–71. 13. Bonnet, P., Gehrke, J., & Seshadri, P., (2001). Towards sensor database systems. In: International Conference on Mobile Data Management (Vol. 1, pp. 3–14). Springer, Berlin, Heidelberg. 14. Broatch, J. E., Dietrich, S., & Goelman, D., (2019). Introducing data science techniques by connecting database concepts and dplyr. Journal of Statistics Education, 27(3), 147–153. 15. Changnon, S. A., & Kunkel, K. E., (1999). Rapidly expanding uses of climate data and information in agriculture and water resources: Causes and characteristics of new applications. Bulletin of the American Meteorological Society, 80(5), 821–830. 16. Chaudhuri, S., & Narasayya, V., (2007). Self-tuning database systems: A decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases (Vol. 1, pp. 3–14). 17. Chen, J. M., Norman, J. B., & Nam, Y., (2021). Broadening the stimulus set: Introducing the American multiracial faces database. Behavior Research Methods, 53(1), 371–389. 18. Chen, M., Ebert, D., Hagen, H., Laramee, R. S., Van, L. R., Ma, K. L., & Silver, D., (2008). Data, information, and knowledge in visualization. IEEE Computer Graphics and Applications, 29(1), 12–19. 19. Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W., (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1, 2), 43–90. 20. Cole, J. R., Chai, B., Farris, R. J., Wang, Q., Kulam-Syed-Mohideen, A. S., McGarrell, D. M., & Tiedje, J. M., (2007). The ribosomal database project (RDP-II): Introducing myRDP space and quality controlled public data. Nucleic Acids Research, 35(suppl_1), D169–D172. 21. Connoly, T., Begg, C., & Strachan, A., (1996, 2002). Database Systems (Vol. 1, pp. 2–8). Addison-Wesley. 22. Cox, A., Doell, R. R., & Dalrymple, G. B., (1964). Reversals of the earth’s magnetic field: Recent paleomagnetic and geochronologic data
36
23.
24.
25.
26.
27.
28.
29. 30.
31.
32.
33.
The Creation and Management of Database Systems
provide information on time and frequency of field reversals. Science, 144(3626), 1537–1543. Degrandi, T. M., Barcellos, S. A., Costa, A. L., Garnero, A. D., Hass, I., & Gunski, R. J., (2020). Introducing the bird chromosome database: An overview of cytogenetic studies in birds. Cytogenetic and Genome Research, 160(4), 199–205. Destaillats, H., Maddalena, R. L., Singer, B. C., Hodgson, A. T., & McKone, T. E., (2008). Indoor pollutants emitted by office equipment: A review of reported data and information needs. Atmospheric Environment, 42(7), 1371–1388. DeWitt, D., & Gray, J., (1992). Parallel database systems: The future of high performance database systems. Communications of the ACM, 35(6), 85–98. Eck, O., & Schaefer, D., (2011). A semantic file system for integrated product data management. Advanced Engineering Informatics, 25(2), 177–184. El Haddad, K., Torre, I., Gilmartin, E., Çakmak, H., Dupont, S., Dutoit, T., & Campbell, N., (2017). Introducing amuS: The amused speech database. In: International Conference on Statistical Language and Speech Processing (Vol. 1, pp. 229–240). Springer, Cham. Ergüzen, A., & Ünver, M., (2018). Developing a file system structure to solve healthy big data storage and archiving problems using a distributed file system. Applied Sciences, 8(6), 913. Florescu, D., & Kossmann, D., (2009). Rethinking cost and performance of database systems. ACM SIGMOD Record, 38(1), 43–48. Freilich, J. D., Chermak, S. M., Belli, R., Gruenewald, J., & Parkin, W. S., (2014). Introducing the United States extremis crime database (ECDB). Terrorism and Political Violence, 26(2), 372–384. Gavazzi, G., Boselli, A., Donati, A., Franzetti, P., & Scodeggio, M., (2003). Introducing GOLDMine: A new galaxy database on the WEB. Astronomy & Astrophysics, 400(2), 451–455. Gerardi, F. F., Reichert, S., & Aragon, C. C., (2021). TuLeD (Tupían lexical database): Introducing a database of a South American language family. Language Resources and Evaluation, 55(4), 997–1015. Ghemawat, S., Gobioff, H., & Leung, S. T., (2003). The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Vol. 1, pp. 29–43).
Introduction to Database Systems
37
34. Grimm, E. C., Bradshaw, R. H., Brewer, S., Flantua, S., Giesecke, T., Lézine, A. M., & Williams, Jr. J. W., (2013). Databases and Their Application, 1, 3–7. 35. Güting, R. H., (1994). An introduction to spatial database systems. The VLDB Journal, 3(4), 357–399. 36. Heidemann, J. S., & Popek, G. J., (1994). File-system development with stackable layers. ACM Transactions on Computer Systems (TOCS), 12(1), 58–89. 37. Herodotou, H., (2016). Towards a distributed multi-tier file system for cluster computing. In: 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW) (Vol. 1, pp. 131–134). IEEE. 38. Hübner, D. C., (2016). The ‘national decisions’ database (Dec. Nat): Introducing a database on national courts’ interactions with European law. European Union Politics, 17(2), 324–339. 39. Jagadish, H. V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A., & Yu, C., (2007). Making database systems usable. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (Vol. 1, pp. 13–24). 40. Jarke, M., & Koch, J., (1984). Query optimization in database systems. ACM Computing Surveys (CsUR), 16(2), 111–152. 41. Jin, S., Yang, S., Zhu, X., & Yin, H., (2012). Design of a trusted file system based on Hadoop. In: International Conference on Trustworthy Computing and Services (Vol. 1, pp. 673–680). Springer, Berlin, Heidelberg. 42. Johnson, J. E., & Laing, W. A., (1996). Overview of the spiralog file system. Digital Technical Journal, 8, 5–14. 43. Johnson, J., (2021). Introducing the military mutinies and defections database (MMDD), 1945–2017. Journal of Peace Research, 58(6), 1311–1319. 44. Joshi, M., & Darby, J., (2013). Introducing the peace accords matrix (PAM): A database of comprehensive peace agreements and their implementation, 1989–2007. Peacebuilding, 1(2), 256–274. 45. Jukic, N., Vrbsky, S., Nestorov, S., & Sharma, A., (2014). Database Systems: Introduction to Databases and Data Warehouses (Vol. 1, p. 400). Pearson. 46. Kakoulli, E., & Herodotou, H., (2017). OctopusFS: A distributed file system with tiered storage management. In: Proceedings of the 2017
38
47.
48. 49.
50.
51.
52.
53.
54.
55.
56.
The Creation and Management of Database Systems
ACM International Conference on Management of Data (Vol. 1, pp. 65–78). Kam, C. L. H., & Matthewson, L., (2017). Introducing the infantbook reading database (IBDb). Journal of Child Language, 44(6), 1289– 1308. Kao, B., & Garcia-Molina, H., (1994). An overview of real-time database systems. Real Time Computing, 261–282. Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M., & Pellegrini-Toole, A., (2000). The EcoCyc and MetaCyc databases. Nucleic Acids Research, 28(1), 56–59. Kießling, W., (2002). Foundations of preferences in database systems. In: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases (Vol. 1, pp. 311–322). Morgan Kaufmann. Kifer, M., Bernstein, A. J., & Lewis, P. M., (2006). Database Systems: An Application-Oriented Approach (Vol. 1, pp. 4–8). Pearson/AddisonWesley. Kim, J., Jang, I., Reda, W., Im, J., Canini, M., Kostić, D., & Witchel, E., (2021). LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism. In: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Vol. 1, pp. 756–771). Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., & Chun, J., (2012). Introducing EzTaxon-e: A prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. International Journal of Systematic and Evolutionary Microbiology, 62(Pt_3), 716–721. Kodera, Y., Yoshida, K., Kumamaru, H., Kakeji, Y., Hiki, N., Etoh, T., & Konno, H., (2019). Introducing laparoscopic total gastrectomy for gastric cancer in general practice: A retrospective cohort study based on a nationwide registry database in Japan. Gastric Cancer, 22(1), 202–213. Kovach, K. A., & Cathcart, Jr. C. E., (1999). Human resource information systems (HRIS): Providing business with rapid data access, information exchange and strategic advantage. Public Personnel Management, 28(2), 275–282. LaFree, G., & Dugan, L., (2007). Introducing the global terrorism database. Terrorism and Political Violence, 19(2), 181–204.
Introduction to Database Systems
39
57. Liao, H., Han, J., & Fang, J., (2010). Multi-dimensional index on Hadoop distributed file system. In: 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage (Vol. 1, pp. 240–249). IEEE. 58. Lin, H. Y., Shen, S. T., Tzeng, W. G., & Lin, B. S. P., (2012). Toward data confidentiality via integrating hybrid encryption schemes and Hadoop distributed file system. In: 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (Vol. 1, pp. 740–747). IEEE. 59. Lu, Y., Shu, J., Chen, Y., & Li, T., (2017). Octopus: An {RDMAenabled} distributed persistent memory file system. In: 2017 USENIX Annual Technical Conference (USENIX ATC 17) (Vol. 1, pp. 773–785). 60. Madnick, S. E., Wang, R. Y., Lee, Y. W., & Zhu, H., (2009). Overview and framework for data and information quality research. Journal of Data and Information Quality (JDIQ), 1(1), 1–22. 61. Magoutis, K., Addetia, S., Fedorova, A., Seltzer, M. I., Chase, J. S., Gallatin, A. J., & Gabber, E., (2002). Structure and performance of the direct access file system. Management, 21, 31. 62. Mahmoud, H., Hegazy, A., & Khafagy, M. H., (2018). An approach for big data security based on Hadoop distributed file system. In: 2018 International Conference on Innovative Trends in Computer Engineering (ITCE) (Vol. 1, pp. 109–114). IEEE. 63. Maneas, S., & Schroeder, B., (2018). The evolution of the Hadoop distributed file system. In: 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA) (Vol. 1, pp. 67–74). IEEE. 64. Martinikorena, I., Cabeza, R., Villanueva, A., & Porta, S., (2018). Introducing i2head database. In: Proceedings of the 7th Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction (Vol. 1, pp. 1–7). 65. McKelvey, R. D., & Ordeshook, P. C., (1985). Elections with limited information: A fulfilled expectations model using contemporaneous poll and endorsement data as information sources. Journal of Economic Theory, 36(1), 55–85. 66. McKusick, M. K., & Quinlan, S., (2009). GFS: Evolution on fastforward: A discussion between kirk McKusick and Sean Quinlan About the origin and evolution of the google file system. Queue, 7(7), 10–20.
40
The Creation and Management of Database Systems
67. McKusick, M. K., Joy, W. N., Leffler, S. J., & Fabry, R. S., (1984). A fast file system for UNIX. ACM Transactions on Computer Systems (TOCS), 2(3), 181–197. 68. Menon, J., Pease, D. A., Rees, R., Duyanovich, L., & Hillsberg, B., (2003). IBM storage tank—A heterogeneous scalable SAN file system. IBM Systems Journal, 42(2), 250–267. 69. Merceedi, K. J., & Sabry, N. A., (2021). A comprehensive survey for Hadoop distributed file system. Asian Journal of Research in Computer Science, 1, 4–7. 70. Mukhopadhyay, D., Agrawal, C., Maru, D., Yedale, P., & Gadekar, P., (2014). Addressing name node scalability issue in Hadoop distributed file system using cache approach. In: 2014 International Conference on Information Technology (Vol. 1, pp. 321–326). IEEE. 71. Oldfield, R., & Kotz, D., (2001). Armada: A parallel file system for computational grids. In: Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid (Vol. 1, pp. 194–201). IEEE. 72. Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., & Kelly, J., (2013). The Quantcast file system. Proceedings of the VLDB Endowment, 6(11), 1092–1101. 73. Özsu, M. T., & Valduriez, P., (1996). Distributed and parallel database systems. ACM Computing Surveys (CSUR), 28(1), 125–128. 74. Özsu, M. T., & Valduriez, P., (1999). Principles of Distributed Database Systems (Vol. 2, pp. 1–5). Englewood Cliffs: Prentice Hall. 75. Pal, A., & Memon, N., (2009). The evolution of file carving. IEEE Signal Processing Magazine, 26(2), 59–71. 76. Paton, N. W., & Diaz, O., (1999). Active database systems. ACM Computing Surveys (CSUR), 31(1), 63–103. 77. Prabhakaran, V., Arpaci-Dusseau, A. C., & Arpaci-Dusseau, R. H., (2005). Analysis and evolution of journaling file systems. In: USENIX Annual Technical Conference, General Track (Vol. 194, pp. 196–215). 78. Rabitti, F., Bertino, E., Kim, W., & Woelk, D., (1991). A model of authorization for next-generation database systems. ACM Transactions on Database Systems (TODS), 16(1), 88–131. 79. Ramakrishnan, R., & Ullman, J. D., (1995). A survey of deductive database systems. The Journal of Logic Programming, 23(2), 125–149.
Introduction to Database Systems
41
80. Ramesh, D., Patidar, N., Kumar, G., & Vunnam, T., (2016). Evolution and analysis of distributed file systems in cloud storage: Analytical survey. In: 2016 International Conference on Computing, Communication and Automation (ICCCA) (Vol. 1, pp. 753–758). IEEE. 81. Rapps, S., & Weyuker, E. J., (1985). Selecting software test data using data flow information. IEEE Transactions on Software Engineering, (4), 367–375. 82. Rosselló-Móra, R., Trujillo, M. E., & Sutcliffe, I. C., (2017). Introducing a digital protologue: A timely move towards a databasedriven systematics of archaea and bacteria. Antonie Van Leeuwenhoek, 110(4), 455–456. 83. Salunkhe, R., Kadam, A. D., Jayakumar, N., & Thakore, D., (2016). In search of a scalable file system state-of-the-art file systems review and map view of new scalable file system. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (Vol. 1, pp. 364–371). IEEE. 84. Saurabh, A., & Parikh, S. M., (2021). Evolution of distributed file system and Hadoop: A mathematical appraisal. Recent Advances in Mathematical Research and Computer Science, 2, 105–112. 85. Schultze, U., & Avital, M., (2011). Designing interviews to generate rich data for information systems research. Information and Organization, 21(1), 1–16. 86. Shafer, J., Rixner, S., & Cox, A. L., (2010). The Hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) (Vol. 1, pp. 122–133). IEEE. 87. Sidorov, J., Shull, R., Tomcavage, J., Girolami, S., Lawton, N., & Harris, R., (2002). Does diabetes disease management save money and improve outcomes? A report of simultaneous short-term savings and quality improvement associated with a health maintenance organization–sponsored disease management program among patients fulfilling health employer data and information set criteria. Diabetes Care, 25(4), 684–689. 88. Sigmund, M., (2006). Introducing the database exam stress for speech under stress. In: Proceedings of the 7th Nordic Signal Processing Symposium-NORSIG 2006 (Vol. 1, pp. 290–293). IEEE.
42
The Creation and Management of Database Systems
89. Silberschatz, A., Stonebraker, M., & Ullman, J., (1991). Database systems: Achievements and opportunities. Communications of the ACM, 34(10), 110–120. 90. Sivaraman, E., & Manickachezian, R., (2014). High performance and fault tolerant distributed file system for big data storage and processing using Hadoop. In: 2014 International Conference on Intelligent Computing Applications (Vol. 1, pp. 32–36). IEEE. 91. Smith, K. A., & Seltzer, M. I., (1997). File system aging—increasing the relevance of file system benchmarks. In: Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (Vol. 1, pp. 203–213). 92. Stefanidis, K., Koutrika, G., & Pitoura, E., (2011). A survey on representation, composition and application of preferences in database systems. ACM Transactions on Database Systems (TODS), 36(3), 1–45. 93. Stowe, L. L., Ignatov, A. M., & Singh, R. R., (1997). Development, validation, and potential enhancements to the second‐generation operational aerosol product at the national environmental satellite, data, and information service of the national oceanic and atmospheric administration. Journal of Geophysical Research: Atmospheres, 102(D14), 16923–16934. 94. Tamura, H., & Yokoya, N., (1984). Image database systems: A survey. Pattern Recognition, 17(1), 29–43. 95. Tatebe, O., Hiraga, K., & Soda, N., (2010). Gfarm grid file system. New Generation Computing, 28(3), 257–275. 96. Thomson, A., & Abadi, D. J., (2010). The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1, 2), 70– 80. 97. Thomson, G. H., (1996). The DIPPR® databases. International Journal of Thermophysics, 17(1), 223–232. 98. Tiirikka, T., & Moilanen, J. S., (2015). Human chromosome Y and haplogroups; introducing YDHS database. Clinical and Translational Medicine, 4(1), 1–9. 99. Tolar, B., Joseph, L. A., Schroeder, M. N., Stroika, S., Ribot, E. M., Hise, K. B., & Gerner-Smidt, P., (2019). An overview of PulseNet USA databases. Foodborne Pathogens and Disease, 16(7), 457–462.
Introduction to Database Systems
43
100. Urvoy, M., Barkowsky, M., Cousseau, R., Koudota, Y., Ricorde, V., Le Callet, P., & Garcia, N., (2012). NAMA3DS1-COSPAD1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3D stereoscopic sequences. In: 2012 Fourth International Workshop on Quality of Multimedia Experience (Vol. 1, pp. 109–114). IEEE. 101. Valduriez, P., (1993). Parallel database systems: Open problems and new issues. Distributed and Parallel Databases, 1(2), 137–165. 102. Van, R. C. J., (1977). A theoretical basis for the use of co‐occurrence data in information retrieval. Journal of Documentation, 1, 3–9. 103. Veeraiah, D., & Rao, J. N., (2020). An efficient data duplication system based on Hadoop distributed file system. In: 2020 International Conference on Inventive Computation Technologies (ICICT) (Vol. 1, pp. 197–200). IEEE. 104. Wang, L., Ma, Y., Zomaya, A. Y., Ranjan, R., & Chen, D., (2014). A parallel file system with application-aware data layout policies for massive remote sensing image processing in digital earth. IEEE Transactions on Parallel and Distributed Systems, 26(6), 1497–1508. 105. Welch, B., Unangst, M., Abbasi, Z., Gibson, G. A., Mueller, B., Small, J., & Zhou, B., (2008). Scalable performance of the panasas parallel file system. In: FAST (Vol. 8, pp. 1–17). 106. Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H., & Chun, J., (2017). Introducing EzBioCloud: A taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. International Journal of Systematic and Evolutionary Microbiology, 67(5), 1613. 107. Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R. T., Subrahmanian, V. S., & Zicari, R., (1997). Advanced Database Systems (Vol. 1, pp. 4–9). Morgan Kaufmann. 108. Zhang, J., Shu, J., & Lu, Y., (2016). {ParaFS}: A {log-structured} file system to exploit the internal parallelism of flash devices. In: 2016 USENIX Annual Technical Conference (USENIX ATC 16) (Vol. 1, pp. 87–100). 109. Zhulin, I. B., (2015). Databases for microbiologists. Journal of Bacteriology, 197(15), 2458–2467. 110. Zins, C., (2007). Conceptual approaches for defining data, information, and knowledge. Journal of the American Society for Information Science and Technology, 58(4), 479–493.
2
CHAPTER
DATA MODELS
CONTENTS 2.1. Introduction....................................................................................... 46 2.2. Importance of Data Models............................................................... 47 2.3. Data Model Basic Building Blocks..................................................... 48 2.4. Business Rules................................................................................... 50 2.5. The Evolution of Data Models............................................................ 54 References................................................................................................ 66
46
The Creation and Management of Database Systems
2.1. INTRODUCTION When designing a database, the primary emphasis is already on the data structures and how they will be utilized to store records for end-users. The very first phase in building a database is known as data modeling, and it relates to the procedure of establishing a unique data representation for a particular issue domain. (A proposed approach is a very well-defined region inside the real-world situation which is to be methodically handled. It also has well-defined scope and bounds) (Mullahy, 1986). An information model is an excellent simple depiction, often in graphical form, of more intricate data structures that exist in the actual world. Any model is, in its most fundamental sense, an approximation of a more complicated realworld phenomenon or occurrence. The primary purpose of a model is to improve one’s ability to comprehend the intricacies of the ecosystem that exists. An information model is an example of data structures together with their attributes, relationships, restrictions, transforms, as well as other constructs that are used within the context of a database system to assist with a particular problem area (Peckham & Maryanski, 1988). The process of data modeling may be thought of as iterative and continuous. You begin with a basic comprehension of the issue domain, but as your knowledge of the given problem grows, so will the degree of detail inside the data structure that you create. If everything is done correctly, the resulting data structure will serve as a “blueprint” that contains all of the directions necessary to construct a database that satisfies the needs of the end-users. This design is both story and visual, which means that it includes not just text explanations written in clear, straightforward language but also diagrams that are both completely obvious and helpful and that illustrate the primary data pieces (Brodie, 1984; Blundell et al., 2002). In the past, database designers would rely on their sound judgments to assist them in the process of developing a reliable data model. Unfortuitously, having sound judgment is frequently in the eyes of society, and it frequently emerges because of a great deal of practice and the accumulation of errors. If all the participants in this class were given the assignment of developing a database structure for a video shop, for instance, it is quite possible that every one of them would come up with an original model for the business (Bond, 2002; Hellerstein & Mendelsohn, 1993). Whichever one of these would be the appropriate choice? The straightforward response is “one which satisfies all the needs of the end-user,” however there could be more than the right option! The possibility for mistakes in database modeling has, thankfully,
Data Models
47
been significantly cut down because database administrators (DBAs) make use of already established data-modeling components and effective data modeling tools. In the following paragraphs, you will discover how various degrees of data extraction may help ease statistical modeling and how current data models can be used to reflect actual information (Elhorst, 2014). Additionally, you would learn why current data models can be used to describe information. For illustration’s sake, let’s say that each person in a class is tasked with developing a database schema for a video shop. It’s quite probable that each student would devise a unique model (Bachman & Daya, 1977).
2.2. IMPORTANCE OF DATA MODELS Communication between the designer, the program developer, as well as the end customer may be made easier by data structures. A well-designed data model may even help you get a better knowledge of the company for whom you’re creating the database architecture. In a nutshell, data structures are a means of communication. A customer’s response to this crucial feature of representing data was just as follows: “I founded this company, I worked with this business a long time, so this is the first time I’ve fully grasped how all parts fit altogether” (Blundell & Bond, 1998). Information modeling’s relevance cannot be emphasized. Data are the most fundamental information units used for a network. Programs are designed to assist organize data and translating it into useful information. However, different individuals see data in various ways. Compare and contrast a business owner’s (data) perspective to that of a professional clerk. Even though both the management as well as the clerk operate for the same organization, the management is much more probable than just the clerks to provide a corporation perspective of the user’s data (Aubry et al., 2017). Although different managers have varied perspectives on data. A film director, for instance, is likely to be taking a broad perspective of such data since he or she should be able to connect the firm’s divisions to a single (database) perspective. A buying manager, as well as the firm’s inventory management, are going to have a much more limited perspective of the data. Every management, in effect, deals with a portion of the user’s servers. The inventory management system is more interested in stock levels, whereas the buying management is more interested in item costs and close connections with that item’s providers (Tekieh & Raahemi, 2015).
48
The Creation and Management of Database Systems
Data is seen differently by application developers, who are more concerned about data placement, formatting, and specialized reporting needs. In essence, application developers convert business rules and processes from many sources into suitable interfaces, analytics, and inquiry screens. The “blind people, as well as the elephant” metaphor, frequently applies to research and data consumers and producers: because the blind person that touched the elephant’s trunk had a different image of the animal than the blind man who feels the elephant’s leg or tail. A full vision of an elephant is required. A home, likewise, is not a collection of random chambers; if somebody is planning to construct a house, they should first get a general idea from blueprints. Similarly, a good data ecosystem necessitates a database layout based on the right data model (Wooldridge, 2005). It doesn’t issue if such an applications developer’s vision of a data differs from those of the management and/or the final user whenever a suitable database design is provided. Whenever a suitable database plan isn’t accessible, though, issues are likely to arise. For example, a contradictory commodity scheme between an inventory management software and also an ordering key system might cost a significant amount of resources (Brown & Lemmon, 2007). Please remember that a home layout is only a representation; you never live in one. Conversely, the database model is a representation from which you cannot extract the needed data. You won’t be able to construct a nice building without such a blueprint, and you won’t be able to establish a better database before even designing a suitable database structure (Feeley & Silman, 2010).
2.3. DATA MODEL BASIC BUILDING BLOCKS Organizations, characteristics, connections, and restrictions are the fundamental components of all data structures. Everything (an individual, a location, an object, or perhaps an activity) over which data should be gathered and maintained is referred to as an entity. In the actual world, an entity denotes a certain sort of item. Organizations are “differentiated” because they represent a certain sort of thing. Every incidence of such an entity is uniquely different. A Consumer entity, for instance, might have numerous distinct customer instances, including John Smith, Pedro Dinamita, Tom Strickland, and so on (Kiviet, 1995).
Data Models
49
Entities might be actual items like consumers or goods, but they can also be abstract concepts like airline paths or musical performances. Every entity’s property is one of its characteristics. For instance, properties including such customers’ last surname, customer initial name, consumer telephone, customer location, and customer’s credit limitation might be used to define a Consumer entity. With file systems, properties are the counterpart of fields (Chang et al., 2017). A relationship is a term used to indicate a connection between two or more things. Clients with agents, for instance, have a connection that can be listed as follows: a single agent can service a large number of customers, and each consumer may be handled by a single agent. Individual, numerous, and one-to-one connections are used in data structures. These abbreviated mnemonic devices 1:M or 1‥*, M: N or *‥*, and 1:1 or 1‥1 are often used by application developers. (While the M: N sign is the most common labeling and for so many connections, the tag M: M can also be used.) The differences between the three are seen in the following instances (Sudan, 2005). •
•
•
One-to-many (1:M or 1…*) relationship. A painter creates a variety of paintings, yet each will be created by a single artist. As a result, the painters (the “single”) and the paintings (the “numerous”) are linked. As a result, data analysts provide a 1:M value to the association “Artists paints Portrait.” (It’s worth noting that organization names are frequently capitalized as a practice to make them stand out.) A client (the “one”) may create a large number of bills, but each statement (the “several”) is produced by an individual client. The connection “Consumer produces Receipt” would likewise be designated 1:M (Monakova et al., 2009). Many-to-many (M:N or *…*) relationship. Every worker may acquire a variety of work skills, so each professional skill can be learnt by a large number of people. The connection “Individual learns SKILL” is labeled M: N by system developers. Consequently, a student may enroll in several courses, and each classroom can enroll multiple students, resulting in the M: N connection identifier for the connection described by “STUDENT attends CLASS.” One-to-one (1:1 or 1…1) relationship. Each of a retailer’s storefronts may be handled by a single worker, depending on the
The Creation and Management of Database Systems
50
company’s current structure. Every manager, who is also a worker, is responsible for just one store. As a result, “EMPLOYEE controls STORE” is designated as a 1:1 connection. Each connection was discovered in both ways in the previous explanation; that is, connections are multidirectional: • Many INVOICES may be generated by a single CUSTOMER. • Only one CUSTOMER creates each of the several INVOICES. A restriction is a limitation imposed on data. Restrictions are significant because they aid in the preservation of data security (Falk & Bassett, 2017). Usually, constraints are specified in the form of regulations. Consider the following scenario (Makarova et al., 2013): •
The compensation of a worker must be between 6,000 and 350,000 dollars. • The GPA of a student should be between 0.00 and 4.00. • A single instructor must be assigned to each class. What are the best practices for identifying entities, characteristics, connections, and constrictions? A first step is establishing the business rules that apply to the issue area you’re modeling (Moonen et al., 2005).
2.4. BUSINESS RULES Whenever DBAs are deciding which entities, characteristics, and connections to employ to develop a data model, developers may begin by learning about the many kinds of data that exist within the organization, how well the data is utilized, and under what periods the data is used. However, this information and data do not provide the essential knowledge of the whole firm on their own (Nemuraite et al., 2010). The gathering of data becomes useful from the standpoint of a system only if it follows correctly specified business principles. A procedure is a clear, concise, and unambiguous definition of a policy, practice, or principle inside a company. In some ways, procedures are misidentified: essentially relate to any institution that keeps and utilizes data to produce information, whether it’s a big or small firm, a national state, a religious body, or a research facility (Whitby et al., 2007). Business requirements, which are generated from a thorough description of an organization’s strategy, assist in the creation and enforcement of activities within that context. Requirements must be written down and modified when the operating environment of the firm changes. Entities, properties, relationships, and restrictions are all defined using well-written
Data Models
51
business rules. You’ll see business requirements at work whenever you encounter comments like “an agent can service multiple clients, but each consumer could only be serviced through one agent.” Instructions are used extensively throughout this whole book, particularly in the chapters on representing data and database architecture (Araujo‐Pradere, 2009). Business rules should be simple to grasp and extensively communicated for everyone in the company to have the same understanding of the rules. The essential and differentiating qualities of the data as regarded by the corporation are described by business rules in plain terms. Here are some kinds of business rules (Pulparambil et al., 2017): • • •
A single client might create a large number of invoices. Only one client generates an invoice. A training exercise for less than 10 workers or even more than 30 individuals cannot be arranged. These business rules define organizations, connections, and limitations, to name a few. The very first two different business rules, for instance, create two objects (CUSTOMER and INVOICE) and a 1:M connection between them. The third business rule specifies a limit (no less than 10 persons but no and over 30 persons), two things (EMPLOYEE and TRAINING), and a connection between them (Bajec & Krisper, 2005).
2.4.1. Discovering Business Rules Company management, legislators, management staff, and paper records including an industry’s policies, regulations, and operational guides are the primary sources of business requirements. Informal interviews involving end-users are a quicker and much more immediate part of business rules (Taveter & Wagner, 2001). Unfortunately, since people’s views vary, end users aren’t always the most dependable source for defining business rules. A maintenance team mechanic, for instance, may assume that any technician may commence a maintenance operation while only technicians with inspection authority can do so. Although such a discrepancy may seem little, it might have significant legal ramifications (Knolmayer et al., 2000). Even though end-users are important contributors to the establishment of business rules, it is prudent to double-check end-user views. Interviews with numerous persons who do the same work can give completely diverse perspectives about what the job entails. Though such a finding may indicate “management issues,” the DBA does not benefit from such a diagnostic. The system designer’s task is to reconcile these discrepancies and double-check
The Creation and Management of Database Systems
52
the findings to verify that the business requirements are correct and suitable (Herbst et al., 1994). For various reasons, the act of discovering and documenting business requirements is critical to database configuration (Wan-Kadir & Loucopoulos, 2004): •
They aid in the standardization of the business information perspective. • They may be used as a means of communication between consumers and designers. • They enable the designer to comprehend the information’s nature, function, and scope. • They make it possible for the designer to comprehend business operations. • They enable the designer to establish a reliable data description and define suitable relationship participation criteria and limitations. Not even all instructions, of obviously, can be represented. A business rule such as “no pilot may fly and over 10 hours in just about any period of 24 hours” could be represented, for instance. The software systems, on the other hand, may implement such a business rule (Zur Muehlen & Indulska, 2010).
2.4.2. Translating Business Rules into Data Model Components Business rules provide the groundwork for identifying entities, properties, connections, and constraints correctly. Identities are being used to identify items in the actual world. There will also be special business rules for the entities if indeed the business climate needs to keep note of them. Every noun in a commercial contract will, on average, convert into an organization in the system, as well as a verb (passive or active) connecting nouns could very well convert into a connection between the organizations (Rosenberg & Dustdar, 2005). The business rule “a client may create several payments” for example, has two objects (customer and invoices) as well as a verb (produce) that connects them. You may conclude the following from such a commercial contract (Van Eijndhoven et al., 2008): •
Customers and invoices are both environmental items that should be expressed by the entities.
Data Models
53
•
Between both the client and the statement, there is a “produce” connection. Remember that connections are unidirectional, meaning they include both ways when determining the sort of connection. The business rule “a client may create several invoices” is, for instance, supplemented by the business rule “just one consumer generates an invoice.” The connection is one here situation (1:M). The “one” side is indeed the customer, as well as the “several” part seems to be the invoice (Charfi & Mezini, 2004). As just a basic rule, you must ask two fundamental questions to appropriately define the connection type (Kardasis & Loucopoulos, 2004): • How many B cases are linked with one A example? • How many instances are linked with one B instance? Two problems, for example, may be used to examine the connection between student and class (Rosca et al., 1997): •
What is the maximum number of courses a student may take? Many courses, to be sure. • What is the maximum number of students that may be enrolled in a single class? Several students, to be sure. As a result, the student-class connection is one-to-many (M: N). As you progress through this textbook, students will have several chances to identify the connections between objects, and the procedure will quickly become part of the routine to you (Herbst, 1996).
2.4.3. Naming Conventions Visitors identify entities, properties, connections, and restrictions while translating business requirements to data model elements. That method of identification comprises naming the item in a manner that makes it distinct from other things in the issue area. As a result, it’s critical essential you pay close attention to how you label the items you find (Demuth et al., 2001). The names of entities should be indicative of the items within the business climate and utilize vocabulary which the users are acquainted with. The naming of the property should be informative of the data it represents. Prefixing the title of a property with both the names of the object (or an approximation of the identifier) where it appears is also a recommended practice. The customer’s credit card limit, for instance, could be designated CUS CREDIT LIMIT in the CUSTOMER object (Graml et al., 2007). The CUS specifies that now the property is informative of the Customer
54
The Creation and Management of Database Systems
id, and the CREDIT LIMIT denotes that the values in the property will be straightforward to detect. Whenever we address the necessity to utilize common properties to indicate connections between objects in a subsequent section, this becomes more significant. The capacity of the data structure to promote communication among some of the architects, application developers, and end consumers will increase with the implementation of a correct filename. Using a consistent naming standard might help your model become a personality (Grosof et al., 1999).
2.5. THE EVOLUTION OF DATA MODELS The search for improved data administration has resulted in several models that aim to address the file system’s key flaws. Those models reflect different schools of thinking on what a system is, how it should accomplish, what sorts of patterns this should utilize, and how to execute these patterns using technology. Those models, like the graphic models we’ve been examining, are termed data models, which may be perplexing (Navathe, 1992). In this part, we’ll go through the primary data models in approximately historical sequence. Several of the “latest” database ideas and concepts are very similar to several of the “old” database design concepts and patterns, as you shall see. The development of the key data models is shown in Table 2.1 (Fry & Sibley, 1976). Table 2.1. Major Data Models Have Changed Over Time
Data Models
55
2.5.1. Hierarchical and Network Models Within the 1960s, the hierarchical model was constructed to handle vast volumes of data for complicated industrial projects like the Apollo rocket which landed on the moon in 1969. A positive side tree represents its core logical structure. There are levels, or sections, in the hierarchical system. A segment is the counterpart of a specific record in a system file. A greater layer is regarded also as the parent of a part right underneath it, known as the kid, within the hierarchy. A collection about one (1:M) connection between such a parent as well as its children sections is shown in the hierarchical data model. (Every parent may have several babies, but every kid does have one) (Zinner et al., 2006). The networking model was developed to better express complicated data linkages than that of the hierarchical data model, increase query processing, and establish a system standard. Every user understands the networking database is a set of entries with 1:M associations inside the network structure. The network structure, unlike the hierarchical approach, permits a reference to having several parents. Whereas networking data modeling is no longer widely utilized, the networking model’s descriptions of common database principles are nevertheless employed by current data models. The following are some key topics that were defined during the time (Kumar & Van Hillegersberg, 2000): •
The schema would be the database supervisor’s conceptual arrangement of the whole database. • The subschema specifies the “visible” area of the database even by application applications. • create the necessary information from the database’s data • A language for data management (DML), specifies the context wherein data may be handled and database information can be worked with. • A schema data description language (DDL) is a programming language that allows a DBA to create schema elements. The network approach became too unwieldy as information demands rose and more complex databases and services were needed. The absence of ad hoc querying placed a lot of pressure on developers to provide the code needed to create even the most basic reports. However, although old
56
The Creation and Management of Database Systems
databases allowed some data isolation, any fundamental changes to the database may cause chaos in any software applications that took data from it. The hierarchical database models were mainly supplanted either by the relational schema in the 1980s due to their drawbacks (Stearns, 1983).
2.5.2. The Relational Model E. F. Codd (of IBM) first proposed the relational database model in his seminal article “A Relational Structure of Data with Large Sharing Datasets” in 1970. Both consumers and designers hailed the relational approach as a big advance. To take an example, the relational paradigm resulted in the creation of an “automatic transmission” system to replace the previous “standard transmitting” databases. Because of its theoretical simplicity, it paved the way for true database revolutions (Pinzger et al., 2005). A relationship is a mathematical construct that serves as the cornerstone of the relational data model. You may conceive of a relation (also known as a table) as a matrix made up of overlapping columns and rows to escape the complication of theoretical mathematical concepts. A tuple is just a single row inside a relation. Every column corresponds to a different property. A precise collection of data manipulating structures based on sophisticated mathematical principles is likewise described by the conceptual data (D’Ambros et al., 2008). Codd’s technique was regarded as innovative but unworkable in 1970. The conceptual clarity of the relational data model was purchased at the cost of computing complexity; computers just have a time-limited capacity to execute the relational data model. Luckily, computer power and operational system efficiency both increased tremendously. Even better, as computational power increased, the price of computers decreased fast. Although PCs can now run complex relational database technology including Oracle, DB2, Microsoft SQL, MySQL, as well as other classic relational software for a fraction of the price of respective mainframe predecessors (Blundell et al., 1999). A complex relational data software is used to support the relational model (RDBMS). The RDBMS provides the same fundamental operations as hierarchical database DBMS systems, as well as a slew of extra details that help the relational model be simpler to comprehend and execute. The capability of the RDBMS to conceal the complexity of the relational data model first from the consumer is perhaps its most significant asset. The RDBMS is in charge of any physical aspects, but the user views the
Data Models
57
relationship system as a collection of databases where data is kept. The data may be manipulated and queried logically and understandably by the user (Manegold et al., 2009). Tables are linked by the fact that they have a similar property (value in a column). Figure 2.1’s CUSTOMER table, for instance, may comprise a sales agent’s quantity that is also found inside the AGENT table (Yoder & Yang, 2000).
Figure 2.1. Creating connections between relational tables. Source: https://launchschool.com/books/sql/read/table_relationships.
However, if the consumer data is saved in one database as well as the sales consultant data is maintained in another, having a common connection between the Client and AGENT columns involves comparing the consumer with his or her sales consultant. For instance, you may quickly discover that consumer Dunne’s representative is Alex Alby since the CUSTOMER table’s AGENT CODE is 501, which corresponds to the AGENT table’s AGENT CODE (Elith et al., 2010). Even though the tables are independent of each other, the data may simply be linked between them. The relationship approach offers a regulated amount of redundancy that eliminates most of the inconsistencies seen in data files (Schilthuizen & Davison, 2005). In such a relational schema, the connection type (1:1, 1:M, or M: N) is frequently displayed, as seen in Figure 2.2. A connection diagram depicts the
58
The Creation and Management of Database Systems
objects, characteristics inside those organizations, and connections among these organizations in a database system (Staub et al., 2010). The correlation diagram shown In figure 2.2 depicts the linking fields (in this example, AGENT CODE) and also the interaction type, 1:M. The (universe) sign is used to denote the “many” sides in Microsoft Word, the database software package used to create Figure 2.2. Whereas an AGENT might have multiple CUSTOMERS, the CUSTOMER symbolizes the “multiple” side in just this example. Because every CUSTOMER has just one AGENT, every Agent usually has the “1” portion (Andreozzi et al., 2008).
Figure 2.2. A diagram that shows how things are connected. Source: https://www.nsf.gov/news/mmg/mmg_disp.jsp?med_id=79315.
A collection of connected items is stored in a relational table. In this way, a relational data table is similar to a file. However, there’s one significant distinction between such a table as well as a file: Because this is a purely rational design, a table provides access to all information and architectural independence. Both user and the developer are unconcerned with how the information is kept in the databases; what matters that’s how the information is seen. And this aspect of a relational schema, which will be discussed in further subsequent chapters, became the catalyst for a true database revolution (Schönrich & Binney, 2009). The traditional relational model’s prominence is also due to its strong and flexible programming language. The programming language like most
Data Models
59
relational data technology is Structured Query Language (SQL), which enables users to express what needs to be done without defining how. SQL is used by the RDBMS to interpret customer inquiries into commands for getting the data. SQL allows you to retrieve information with significantly less work than every database or file system. Any Query language relational application software has three pieces from the customer viewpoint: an interface, a collection of database tables, as well as the SQL “engines.” Each one of these components is described in detail below (Manel et al., 2001). •
The User Interface is the Way the User Interacts with the System: The interface essentially enables the end consumer to utilize the data (by auto-generating SQL code). Every interface seems to be the result of the program vendor’s vision of significant data interactions. You may also create your own services provider enabling using application developers, which are already commonplace in database applications. • A Database Table that Contains a Group of Tables: All information inside a relational database seems to be saved in tables. The tables merely “display” the facts in a simple manner to the end customer. Every table is self-contained. Basic values within common characteristics connect rows in various tables. • SQL Engine: All inquiries, or data queries, are executed by the SQL engine, which is mostly concealed first from the end consumer. Remember that now the SQL engines are a component of the database management system (DBMS). SQL is used by the end-user to construct multiple tables, as well as data availability and table management. All consumer requests are processed by the SQL processor mostly behind the curtains but without the awareness of the end consumer. As a result, SQL is described as a descriptive language that specifies what can be performed not just how. It is not required to concentrate just on the physical components of both the database since the RDBMS handles behind operations. The next chapters, on the other hand, focus just on the logical aspect of both the relational model and its architecture (Blaschka et al., 1999).
2.5.3. The Entity-Relationship Model The desire for RDBMSs was sparked by the theoretical clarity of relational data technologies. As a result of the fast-expanding transactional and
The Creation and Management of Database Systems
60
information needs, more complicated database application structures were required, necessitating the development of more effective system design tools. (Constructing a tower, for instance, necessitates more extensive design processes than constructing a doghouse) (Harrison, 2015). To be effective, complex design processes need conceptual simplicity. Despite being a major advance so over hierarchical database designs, the relational model required the qualities which would make it a useful data modeling instrument. DBAs prefer to utilize a graphical tool that depicts objects and their connections because it is simpler to evaluate structures visually than it would be to explain them in language. As a result, the entity connection (ER) model, sometimes known as the ERM, becomes a widely acknowledged database design standard (Blaschka et al., 1999). An ER database model was initially developed in 1976 by Peter Chen; that was a graphical depiction of objects and their interactions in a database schema that soon gained popularity as a supplement to traditional relational model principles. The relational database model and ERM worked together to provide the groundwork for a well-structured database. Entity-relationship diagrams (ERDs) are a visual portrayal of separable datasets that are used to depict ER models (Masud et al., 2010). The following components make up the ER model (Liu et al., 2011): •
Entity: An organization was described previously in this chapter as something over which data should be gathered and kept. A rectangular, also termed an object box, is used to symbolize an object in the ERD. The institution’s name, a name, is placed inside the rectangle’s interior. PAINTER instead of PAINTERS, while EMPLOYEE but instead of EMPLOYEES are examples of entity names written in upper case letters as well as in the single form. Each entity determines the truth to a related table when using the ERD with a relational model. In ER model, every entry within a relational database is referred to as an object instance or object occurrence. Every entity is characterized by a collection of attributes that explain the entity’s specific qualities. Contact Information, the last surname, or first title, for instance, will be characteristics of the type EMPLOYEE.
Data Models
61
•
Relationships: The term “relationship” refers to the way data is linked together. The majority of relationships are made up of two or more entities. Three kinds of data connections were presented when the fundamental data manufacturing performance was presented: another (1:M), numerous (M: N), and just one (1:1). The word “communication” is used in the ER model to describe the different sorts of relationships. The connection is generally named after a passive or active verb. A PAINTER, for instance, creates several pictures; an EMPLOYEE acquires numerous SKILLS, and an EMPLOYEE operates a STORE. Figure 2.3 uses two ER notations to depict distinct sorts of relationships: the classic Chen writing and the more recent Crow’s Foot notation (Schwartz & Schäffer, 2017). The Chen notation, founded on Peter Chen’s seminal article, is shown on the left side of an ER diagram. The provides detailed information is presented beside each item box in this manner. A diamond represents a connection, which is linked to the associated entities via a relationship line. The name of the connection is engraved just on a diamond. The Spider’s Foot indication is shown on the given in figure 2.3. The three-pronged sign used to depict the “many” end of the connection gave rise to the term “Crow’s Foot.” Note that the provides detailed information is expressed by signs in the basic Crow’s Foot ERD in Diagram 2.3. A quick line section, for instance, represents the “1,” while the three-pronged “crow’s foot” represents the “M.” The connection name is printed above the connection line in just this example (Herrmannsdoerfer et al., 2010). Entities and connections are shown horizontally in Figure 2.3, although they can also be positioned vertically. It makes no difference where the objects are located or in what sequence they are displayed; just takes some time to read a 1:M connection from the “1” edge to the “M” side. Under this textbook, the Crow’s Foot symbol is employed as the design specification. When required, though, the Chen notation is employed to demonstrate a few of the ER modeling ideas. The Crow’s Foot notation is available in most information modeling software. The Crow’s Foot drawings you’ll see in the following chapters were created using Microsoft Visio Expert software (Bolker et al., 2009).
62
The Creation and Management of Database Systems
Figure 2.3. Notations for Chen and crow’s foot. Source: http://www.myreadingroom.co.in/notes-and-studymaterial/65dbms/471-the-entity-relationship-model.html.
The ER model is the most popular database modeling and designing tool because of its unparalleled visual clarity. Nonetheless, even as the data environment evolves, the hunt for new relevant data continues (Moser et al., 2011).
2.5.4. The Object-Oriented (OO) Model Any need for a database schema that more accurately matched the actual world became apparent as actual concerns became more complicated. This information and its connections are stored in a single structure called an item within an object-oriented data model (OODM). The object-oriented database server is built on top of the OODM (OODBMS) (McLennan & Taylor, 1980). An OODM is a representation of a significantly different approach to defining and using things. Objects, such as the entity in the relational data model, are characterized by their actual information. Besides an entity, however, an object has information about connections between both the facts included inside the item, as well as data about the organism’s interactions with some other objects. As a result, the facts contained inside the item are given more weight. Because semantics denotes meaning, the OODM is referred to as a semantics data structure (Warren et al., 2008).
Data Models
63
Following OODM developments, an object may now include all of the actions that can be done on it, including updating data values, locating a particular data value, and publishing data values. Even though objects comprise data, multiple forms of connections, and operational processes, they become an identity, possibly making them a fundamental building element for autonomous systems (Calvet Liñán & Juan Pérez, 2015). The following elements make up the OO dataset (Wu et al., 2013): •
An item is a representation of a physical thing. In general, an item can be thought of as the entity of an ER model. A single instance of such an object is represented by an object. (Many of the components inside this list determine the semantic meaning of the thing.) • The qualities of an item are described via attributes. The characteristics of Identity, Social Security Card, or Birth date, for instance, are all present inside a PERSON’s object. • Classes are groups of objects with similar features. A class is a group of objects that have similar formats (characteristics) and behaviors (methods). A class is similar to the fixed order in the Er diagram generally. A class, on the other hand, differs from a fixed order in that it has a collection of operations called methods. The method of a class depicts a real-world activity like discovering a PERSON’s name, updating a PERSON’s name, or publishing a PERSON’s address. In those other words, in classical computer languages, techniques are equal to processes. Methods describe an object’s behavior in OO terminology (Robins et al., 1995). • A class structure is used to arrange classes. That class architecture resembles a positive side tree with just one parent in each class. The CUSTOMER as well as EMPLOYEE classes, for instance, have the same parental PERSON class. (Notice how this is comparable to the hierarchical database.) • Inheritance refers to that an object’s capacity to inherit the characteristics and functions of the objects upwards in the class structure. For instance, two classes may be built as subclasses of the PERSON class: CUSTOMER or EMPLOYEE. CUSTOMER or EMPLOYEE would inherit all properties and methods through PERSON in this situation. Unified Modeling Language (UML) classes and objects are often used to illustrate entity database schemas. The Unified Modeling Language
The Creation and Management of Database Systems
64
(UML) is an OO-based language that provides a collection of diagrams & icons for visually describing a system. Within the wider UML objectoriented program’s markup language, a class diagram is used to model data and their connections. For a more detailed explanation of UML, see here (Sauter, 2007; Hansen & Martins, 1996). Let’s look at a basic billing scenario to demonstrate the key elements of the entity data architecture. Customers produce bills in this example, with each invoice referencing one or even more sections, every line representing a user’s purchase. The object format, and also the matching Uml sequence diagram with ER model, are shown in Figure 2.4 for this basic invoices scenario. An easy technique to depict a single item event is to use a type of material (Lynch, 2007).
Figure 2.4. OO, UML, and ER models are compared. Source: http://www.differencebetween.info/difference-between-uml-and-erd.
Take note of the following as you study Figure 2.4 (Manatschal, 2004): •
The INVOICE’s item format has all linked items in the same item box. It’s worth noting that perhaps the strong positive correlation (1 and M) denotes the associated items’ connection to the INVOICE. The 1 beside the CUSTOMER object, for instance, shows so each INVOICE is associated with just one CUSTOMER. Every INVOICE has numerous LINEs, as shown by the M besides the LINE objects.
Data Models
•
•
65
To depict this basic invoicing issue, the UML class diagram employs three independent classifiers (CUSTOMER, INVOICE, and LINE) as well as two connections. The 1.1, 0.*, or 1..* signs reflect the connection consubstantial, and the connections are identified along both ends to symbolize the various “responsibilities” that the items play inside the connection (Ingram & Mahler, 2013). To illustrate this basic invoicing issue, the ER model employs three different objects and two connections. The benefits of OODM are being felt across the board, from systems modeling to coding. The ODM’s new semantics allows for a more detailed description of complex forms. As a result, programs were able to accommodate more complex structures in novel ways. As you’ll see in the following section, adaptive developments had an impact just on the relational data model as well.
66
The Creation and Management of Database Systems
REFERENCES 1.
Andreozzi, S., Burke, S., Field, L., & Konya, B., (2008). Towards GLUE 2: Evolution of the computing element information model. In: Journal of Physics: Conference Series (Vol. 119, No. 6, p. 062009). IOP Publishing. 2. Araujo‐Pradere, E. A., (2009). Transitioning space weather models into operations: The basic building blocks. Space Weather, 7(10), 33–40. 3. Aubry, K. B., Raley, C. M., & McKelvey, K. S., (2017). The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species. PLoS One, 12(6), e0179152. 4. Bachman, C. W., & Daya, M., (1977). The role concept in data models. In: Proceedings of the Third International Conference on Very Large Data Bases (Vol. 1, pp. 464–476). 5. Bajec, M., & Krisper, M., (2005). A methodology and tool support for managing business rules in organizations. Information Systems, 30(6), 423–443. 6. Blaschka, M., Sapia, C., & Höfling, G., (1999). On schema evolution in multidimensional databases. In: International Conference on Data Warehousing and Knowledge Discovery (Vol. 1, pp. 153–164). Springer, Berlin, Heidelberg. 7. Blundell, R., & Bond, S., (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics, 87(1), 115–143. 8. Blundell, R., Griffith, R., & Van, R. J., (1999). Market share, market value and innovation in a panel of British manufacturing firms. The Review of Economic Studies, 66(3), 529–554. 9. Blundell, R., Griffith, R., & Windmeijer, F., (2002). Individual effects and dynamics in count data models. Journal of Econometrics, 108(1), 113–131. 10. Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S., (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127–135. 11. Bond, S. R., (2002). Dynamic panel data models: A guide to micro data methods and practice. Portuguese Economic Journal, 1(2), 141–162. 12. Brodie, M. L., (1984). On the development of data models. In: On Conceptual Modeling (Vol. 1, pp. 19–47). Springer, New York, NY.
Data Models
67
13. Brown, J. M., & Lemmon, A. R., (2007). The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology, 56(4), 643–655. 14. Calvet, L. L., & Juan, P. Á. A., (2015). Educational data mining and learning analytics: Differences, similarities, and time evolution. International Journal of Educational Technology in Higher Education, 12(3), 98–112. 15. Chang, K. C., Dutta, S., Mirams, G. R., Beattie, K. A., Sheng, J., Tran, P. N., & Li, Z., (2017). Uncertainty quantification reveals the importance of data variability and experimental design considerations for in silico proarrhythmia risk assessment. Frontiers in Physiology, 1, 917. 16. Charfi, A., & Mezini, M., (2004). Hybrid web service composition: Business processes meet business rules. In: Proceedings of the 2nd International Conference on Service Oriented Computing (Vol. 1, pp. 30–38). 17. Collins, A. G., Schuchert, P., Marques, A. C., Jankowski, T., Medina, M., & Schierwater, B., (2006). Medusozoan phylogeny and character evolution clarified by new large and small subunit rDNA data and an assessment of the utility of phylogenetic mixture models. Systematic Biology, 55(1), 97–115. 18. D’Ambros, M., Gall, H., Lanza, M., & Pinzger, M., (2008). Analyzing software repositories to understand software evolution. In: Software Evolution (Vol. 1, pp. 37–67). Springer, Berlin, Heidelberg. 19. Demuth, B., Hussmann, H., & Loecher, S., (2001). OCL as a specification language for business rules in database applications. In: International Conference on the Unified Modeling Language (Vol. 1, pp. 104–117). Springer, Berlin, Heidelberg. 20. Elhorst, J. P., (2014). Spatial panel data models. In: Spatial Econometrics (Vol. 1, pp. 37–93). Springer, Berlin, Heidelberg. 21. Elith, J., Kearney, M., & Phillips, S., (2010). The art of modelling range‐shifting species. Methods in Ecology and Evolution, 1(4), 330– 342. 22. Falk, E. B., & Bassett, D. S., (2017). Brain and social networks: Fundamental building blocks of human experience. Trends in Cognitive Sciences, 21(9), 674–690. 23. Feeley, K. J., & Silman, M. R., (2010). Modelling the responses of Andean and Amazonian plant species to climate change: The effects of
68
24. 25.
26.
27.
28.
29.
30.
31. 32.
33.
The Creation and Management of Database Systems
georeferencing errors and the importance of data filtering. Journal of Biogeography, 37(4), 733–740. Fry, J. P., & Sibley, E. H., (1976). Evolution of data-base management systems. ACM Computing Surveys (CSUR), 8(1), 7–42. Graml, T., Bracht, R., & Spies, M., (2007). Patterns of business rules to enable agile business processes. In: 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007) (Vol. 1, pp. 365–365). IEEE. Grosof, B. N., Labrou, Y., & Chan, H. Y., (1999). A declarative approach to business rules in contracts: Courteous logic programs in XML. In: Proceedings of the 1st ACM Conference on Electronic Commerce (Vol. 1, pp. 68–77). Ha, D., & Schmidhuber, J., (2018). Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems, 1, 31. Hansen, T. F., & Martins, E. P., (1996). Translating between microevolutionary process and macroevolutionary patterns: The correlation structure of interspecific data. Evolution, 50(4), 1404–1417. Harrison, X. A., (2015). A comparison of observation-level random effect and beta-binomial models for modelling overdispersion in binomial data in ecology & evolution. PeerJ, 3, 4. Hellerstein, D., & Mendelsohn, R., (1993). A theoretical foundation for count data models. American Journal of Agricultural Economics, 75(3), 604–611. Herbst, H., (1996). Business rules in systems analysis: A meta-model and repository system. Information Systems, 21(2), 147–166. Herbst, H., Knolmayer, G., Myrach, T., & Schlesinger, M., (1994). The specification of business rules: A comparison of selected methodologies. In: Methods and Associated Tools for the Information Systems Life Cycle (Vol. 1, pp. 29–46). Herrmannsdoerfer, M., Vermolen, S. D., & Wachsmuth, G., (2010). An extensive catalog of operators for the coupled evolution of metamodels and models. In: International Conference on Software Language Engineering (Vol. 1, pp. 163–182). Springer, Berlin, Heidelberg.
Data Models
69
34. Ingram, T., & Mahler, D. L., (2013). SURFACE: Detecting convergent evolution from comparative data by fitting Ornstein‐Uhlenbeck models with stepwise Akaike information criterion. Methods in Ecology and Evolution, 4(5), 416–425. 35. Kardasis, P., & Loucopoulos, P., (2004). Expressing and organizing business rules. Information and Software Technology, 46(11), 701–718. 36. Kiviet, J. F., (1995). On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of Econometrics, 68(1), 53–78. 37. Knolmayer, G., Endl, R., & Pfahrer, M., (2000). Modeling processes and workflows by business rules. In: Business Process Management (Vol. 1, pp. 16–29). Springer, Berlin, Heidelberg. 38. Kumar, K., & Van, H. J., (2000). ERP experiences and evolution. Communications of the ACM, 43(4), 22–22. 39. Liu, C., Mao, Y., Van, D. M. J., & Fernandez, M., (2011). Cloud resource orchestration: A data-centric approach. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (Vol. 1, pp. 1–8). 40. Lynch, V. J., (2007). Inventing an arsenal: Adaptive evolution and neofunctionalization of snake venom phospholipase A2 genes. BMC Evolutionary Biology, 7(1), 1–14. 41. Makarova, K. S., Wolf, Y. I., & Koonin, E. V., (2013). The basic building blocks and evolution of CRISPR–Cas systems. Biochemical Society Transactions, 41(6), 1392–1400. 42. Manatschal, G., (2004). New models for evolution of magma-poor rifted margins based on a review of data and concepts from west Iberia and the Alps. International Journal of Earth Sciences, 93(3), 432–466. 43. Manegold, S., Kersten, M. L., & Boncz, P., (2009). Database architecture evolution: Mammals flourished long before dinosaurs became extinct. Proceedings of the VLDB Endowment, 2(2), 1648–1653. 44. Manel, S., Williams, H. C., & Ormerod, S. J., (2001). Evaluating presence–absence models in ecology: The need to account for prevalence. Journal of Applied Ecology, 38(5), 921–931.
70
The Creation and Management of Database Systems
45. Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., & Thuraisingham, B., (2010). Addressing concept-evolution in conceptdrifting data streams. In: 2010 IEEE International Conference on Data Mining (Vol. 1, pp. 929–934). IEEE. 46. McLennan, S. M., & Taylor, S. R., (1980). Th and U in sedimentary rocks: Crustal evolution and sedimentary recycling. Nature, 285(5767), 621–624. 47. Monakova, G., Kopp, O., Leymann, F., Moser, S., & Schäfers, K., (2009). Verifying business rules using an SMT solver for BPEL processes. Business Process, Services–Computing and Intelligent Service Management, 1, 2–8. 48. Moonen, N. N., Flood, A. H., Fernández, J. M., & Stoddart, J. F., (2005). Towards a rational design of molecular switches and sensors from their basic building blocks. Molecular Machines, 1, 99–132. 49. Moser, T., Mordinyi, R., Winkler, D., Melik-Merkumians, M., & Biffl, S., (2011). Efficient automation systems engineering process support based on semantic integration of engineering knowledge. In: ETFA2011 (Vol. 1, pp. 1–8). IEEE. 50. Mullahy, J., (1986). Specification and testing of some modified count data models. Journal of Econometrics, 33(3), 341–365. 51. Navathe, S. B., (1992). Evolution of data modeling for databases. Communications of the ACM, 35(9), 112–123. 52. Nemuraite, L., Skersys, T., Sukys, A., Sinkevicius, E., & Ablonskis, L., (2010). VETIS tool for editing and transforming SBVR business vocabularies and business rules into UML&OCL models. In: 16th International Conference on Information and Software Technologies, Kaunas: Kaunas University of Technology (Vol. 1, pp. 377–384). 53. Peckham, J., & Maryanski, F., (1988). Semantic data models. ACM Computing Surveys (CSUR), 20(3), 153–189. 54. Pinzger, M., Gall, H., Fischer, M., & Lanza, M., (2005). Visualizing multiple evolution metrics. In: Proceedings of the 2005 ACM Symposium on Software Visualization (Vol. 1, pp. 67–75). 55. Pulparambil, S., Baghdadi, Y., Al-Hamdani, A., & Al-Badawi, M., (2017). Exploring the main building blocks of SOA method: SOA maturity model perspective. Service Oriented Computing and Applications, 11(2), 217–232.
Data Models
71
56. Robins, J. M., Rotnitzky, A., & Zhao, L. P., (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429), 106–121. 57. Rosca, D., Greenspan, S., Feblowitz, M., & Wild, C., (1997). A decision making methodology in support of the business rules lifecycle. In: Proceedings of ISRE’97: 3rd IEEE International Symposium on Requirements Engineering (Vol. 1, pp. 236–246). IEEE. 58. Rosenberg, F., & Dustdar, S., (2005). Business rules integration in BPEL-a service-oriented approach. In: Seventh IEEE International Conference on E-Commerce Technology (CEC’05) (Vol. 1, pp. 476– 479). IEEE. 59. Sauter, T., (2007). The continuing evolution of integration in manufacturing automation. IEEE Industrial Electronics Magazine, 1(1), 10–19. 60. Sauter, T., (2010). The three generations of field-level networks— Evolution and compatibility issues. IEEE Transactions on Industrial Electronics, 57(11), 3585–3595. 61. Schilthuizen, M., & Davison, A., (2005). The convoluted evolution of snail chirality. Naturwissenschaften, 92(11), 504–515. 62. Schönrich, R., & Binney, J., (2009). Chemical evolution with radial mixing. Monthly Notices of the Royal Astronomical Society, 396(1), 203–222. 63. Schwartz, R., & Schäffer, A. A., (2017). The evolution of tumor phylogenetics: Principles and practice. Nature Reviews Genetics, 18(4), 213–229. 64. Staub, R. B., E Souza, G. D. S., & Tabak, B. M., (2010). Evolution of bank efficiency in Brazil: A DEA approach. European Journal of Operational Research, 202(1), 204–213. 65. Stearns, S. C., (1983). A natural experiment in life-history evolution: Field data on the introduction of mosquitofish (Gambusia affinis) to Hawaii. Evolution, 1, 601–617. 66. Sudan, R., (2005). The basic building blocks of e-government. E-Development: From Excitement to Effectiveness, 1, 79–100. 67. Taveter, K., & Wagner, G., (2001). Agent-oriented enterprise modeling based on business rules. In: International Conference on Conceptual Modeling (Vol. 1, pp. 527–540). Springer, Berlin, Heidelberg.
72
The Creation and Management of Database Systems
68. Tekieh, M. H., & Raahemi, B., (2015). Importance of data mining in healthcare: A survey. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (Vol. 1, pp. 1057–1062). 69. Van, E. T., Iacob, M. E., & Ponisio, M. L., (2008). Achieving business process flexibility with business rules. In: 2008 12th International IEEE Enterprise Distributed Object Computing Conference (Vol. 1, pp. 95– 104). IEEE. 70. Wan-Kadir, W. M., & Loucopoulos, P., (2004). Relating evolving business rules to software design. Journal of Systems Architecture, 50(7), 367–382. 71. Warren, D. L., Glor, R. E., & Turelli, M., (2008). Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution: International Journal of Organic Evolution, 62(11), 2868–2883. 72. Whitby, M., Pessoa-Silva, C. L., McLaws, M. L., Allegranzi, B., Sax, H., Larson, E., & Pittet, D., (2007). Behavioral considerations for hand hygiene practices: The basic building blocks. Journal of Hospital Infection, 65(1), 1–8. 73. Wooldridge, J. M., (2005). Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of Applied Econometrics, 20(1), 39–54. 74. Wu, X., Zhu, X., Wu, G. Q., & Ding, W., (2013). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97–107. 75. Yoder, A. D., & Yang, Z., (2000). Estimation of primate speciation dates using local molecular clocks. Molecular Biology and Evolution, 17(7), 1081–1090. 76. Zinner, E., Nittler, L. R., Gallino, R., Karakas, A. I., Lugaro, M., Straniero, O., & Lattanzio, J. C., (2006). Silicon and carbon isotopic ratios in AGB stars: SiC grain data, models, and the galactic evolution of the Si isotopes. The Astrophysical Journal, 650(1), 350. 77. Zur, M. M., & Indulska, M., (2010). Modeling languages for business processes and business rules: A representational analysis. Information Systems, 35(4), 379–390.
3
CHAPTER
DATABASE ENVIRONMENT
CONTENTS 3.1. Introduction....................................................................................... 74 3.2. Three-Level Ansi-Sparc Architecture................................................... 74 3.3. Database Languages.......................................................................... 81 3.4. Conceptual Modeling and Data Models............................................. 86 3.5. Functions of a DBMS......................................................................... 91 3.6. Components of a DBMS.................................................................... 97 References.............................................................................................. 102
The Creation and Management of Database Systems
74
3.1. INTRODUCTION The primary objective of a database system is to present clients with an abstract picture of information, concealing the specifics of how it has been managed and stored. Thus, the architecture of a database should begin with an abstract and broad detail of the organization’s data needs that will be reflected in the database. In this chapter, the term “organization” is used generally to refer to the entire organization or a portion of it. In the case study of Dream Home, for instance, we can be concerned with modeling (Philip et al., 1992): •
Rental Property, Staff, Client, and Private Owner are ‘real-world’ elements; • Every entity has characteristics that describe its characteristics or features (for instance, Staff has a position, a name, and a pay) (Yannakoudakis et al., 1999); The connections that exist between such elements (For instance, Staff Controls Rental Property) (Pieterse & Olivier, 2012). In addition, because a database is a source that has been shared among several users, every consumer may need a unique view of the information that is stored in the database. Because of such requirements, the architecture of many commercial database management systems (DBMSs) is based on the ANSI-SPARC design. We will discuss several different functional and architectural properties of DBMSs (Zdonik & Wegner, 1988).
3.2. THREE-LEVEL ANSI-SPARC ARCHITECTURE The Data Base Task Group (DBTG), which had been constituted by the Conference on Data Systems and Languages in 1971, provides a timely recommendation for common nomenclature and basic design for database systems (CODASYL, 1971). With a system view termed as the schema and user views termed as the subschemas, the database task group recognized the necessity for a 2-level method. The Standards Planning and Requirements Committee (SPARC) of the American National Standards Institute (ANSI), ANSI/X3/SPARC, established a comparable vocabulary and architecture in 1975 (ANSI, 1975) (Sy et al., 2018). The requirement for a 3-level strategy with a system catalog was acknowledged by ANSI-SPARC. Such ideas resembled those made a few years before by IBM client groups Guide and Share, and focused on the requirement for an execution-based layer to insulate programs from
Database Environment
75
fundamental representation difficulties (Share/Guide, 1970). Even though the ANSI-SPARC model didn’t become a standard, still it serves as a foundation for recognizing a few of the functionality of a DBMS (Grefen et al., 2003). The essential goal of these and subsequent reports, for our objectives, has been the determination of 3-levels of abstraction, or 3 unique levels during which data objects might be defined. As indicated in Figure 3.1, the levels include a 3-level architecture consisting of an exterior, a conceptual, and an interior level. The exterior level refers to how people interpret the data. The interior level is how the DBMS and OS interpret the data and where the data is kept utilizing data structures and files organization (Grefen & Angelov, 2001).
Figure 3.1. The ANSI-SPARC 3-level design. Source: ture/.
https://www.geeksforgeeks.org/the-three-level-ansi-sparc-architec-
Between the exterior and interior levels, the conceptual level offers both the mapping and the needed independence. The purpose of the 3-level
The Creation and Management of Database Systems
76
design is to isolate the user’s perspective from the database’s physical representation. Such separation is beneficial for several reasons (Kemp et al., 2002): •
•
• •
•
Every client must have access to the data, but with a unique personalized perspective of it. Every user must be capable to modify the method she or he perceives the data without affecting other users. The consumers of the database must not be required to deal directly with the details of the physical database storage, like hashing or indexing (see Appendix C). That is to say, how a user interacts with the database ought to be unaffected by concerns about data storage. The Database Administrator (DBA) must be allowed to modify database storage structures without impacting user views. Changes to the physical characteristics of storage, like switching to a new storage device, will not affect the database’s inner structure. The DBA should be allowed to alter the database’s conceptual structure without impacting all clients (Grefen et al., 2002).
3.2.1. External Level The external level has been comprised of a variety of external database views. Every consumer sees a representation of the “actual world” that has been recognizable to that person. The external view contains just the entities, properties, and relationships from the “real world” that are of interest to the user. Other irrelevant entities, properties, or relationships can be recorded in the database, however, the client would be ignorant of them (Habela et al., 2006). Additionally, multiple perspectives can show the same data differently. For instance, one client can see dates as (year, month, day), whereas another may view dates as (year, month, day). Certain views may contain derived or calculated data, which is data that is not saved in the database but is constructed on demand (Aier & Winter, 2009). In the case study of Dream Home, for instance, we can desire to observe the age of a staff member. Furthermore, storing ages is improbable, as this information will need to be updated regularly. Rather than, the employee’s date of birth will be recorded,
Database Environment
77
and the DBMS will determine the employee’s age when it was accessed (Samos et al., 1998).
3.2.2. Conceptual Level The conceptual level is the intermediate level of the 3-level design. This level offers the DBA’s view of the complete database’s logical structure. It is a comprehensive perspective of the organization’s data needs that has been irrespective of any storage issues. The conceptual level denotes the following (CLASS, 2001): • all elements, their properties, and their connections; • the limitations of the data; • information about the data in terms of semantics; and • security and integrity-related data. Every external view is supported by the conceptual level, as all useraccessible data should be confined in or derived from the conceptual level. Nonetheless, this level cannot include every storage-dependent information. For example, the detail of an element must only include the kinds of data of its characteristics (like real, integer, and character) and its length (like the maximal number of characters or digits), but not storage concerns like the number of bytes allocated (Okayama et al., 1998).
3.2.3. Internal Level The physical execution of the database at the internal level is covered to ensure optimum runtime efficiency and storage space usage. It includes the data structures and file systems that are utilized to save information on storing devices. It works with OS access methods (file management methods for retrieving and storing data records) to put data on storing devices, recover data, and create indexes among other things. On the internal level, there are concerns about (Biller, 1982): • allotment of index and data storage space; • record descriptors for storage (including data item sizes); • location of the record; and • approaches for data encryption and compression. There has been a physical level beneath the interior level, which can be controlled by the OS under the guidance of the DBMS. Furthermore, at the physical level, the functions of the DBMS and the OS stay not straightforward
78
The Creation and Management of Database Systems
and differ from scheme to scheme. Certain DBMSs utilize several of the OS access mechanisms, whilst others utilize only the fundamental ones and develop their file structures. The physical level beneath the DBMS contains information that is only known by the operating system, like how the sequencing has been performed or if the fields of interior records are saved as continuous bytes on the disk (Li & Wang, 2007).
3.2.4. Mappings, Schemas, and Instances The comprehensive detail of a database has been referred to as the database schema. There have been 3 distinct schema kinds in the database, each of which is defined by the degrees of abstraction of the 3-level architecture depicted in Figure 3.1. Various external schemas (also known as subschemas) relate to distinct data views at the higher level (de Brock, 2018). At the conceptual level, there is the conceptual schema, that specifies all items, characteristics, and connections along with integrity limitations. The internal schema is a thorough detail of the interior model, and it contains the detail of contained records, the techniques of depiction, the data fields, as well as the indexes and storage structures that are utilized. This schema exists at the most fundamental level of abstraction. There has been one conceptual schema and one inner schema associated with a particular database (Fischer et al., 2010). The DBMS oversees mapping such 3 kinds of schema together. It should also ensure that the schemas are consistent; in other words, every external schema should be deducible from the conceptual schema, and the conceptual schema should be used to map internal and external schemas. Internal/ conceptual mapping links the conceptual and internal schemas (Koonce, 1995). This allows the DBMS to locate the actual record or set of records in physical stores that make up a logical record in the conceptual schema, as well as any restrictions that need to be applied to those actions. It also provides for the resolution of any discrepancies in attribute names, entity names, data types, attribute order, and so on. Ultimately, the conceptual or external mapping connects every exterior schema to the conceptual schema. The DBMS may then map names in the user’s view to the appropriate component of the conceptual schema (van Bommel et al., 1994).
Database Environment
79
Figure 3.2. The distinctions between the 3 levels. Source: https://slideplayer.com/slide/14504100/.
Figure 3.2 depicts an illustration of the many levels. There are 2 alternative external views of employee information: one with the last name (lName), first name (fName), age, staff number (sNo), and salary, and the other with the last name (lName), a staff number (staffNo), and the number of the branch where the employee works (branchNo) (Rossiter & Heather, 2005). Such external perspectives have been combined into a single conceptual perspective. The main difference in this merging procedure is that the age field is replaced with a DOB field. The DBMS keeps track of the conceptual/external mapping; for instance, it translates the 1st external view’s staffNo field to the conceptual record’s staffNo field. After that, the conceptual level has been translated to the interior level, which includes a physical detail of the conceptual record’s structure (Li et al., 2009). At this level, the structure is defined using a higher-level language. The structure has a reference, next, that enables the list of staff records to be physically chained together. Remember that the inner order of fields differs from the conceptual order. The database manages the internal or conceptual mapping once more. This is essential to disintegrate between the database’s
80
The Creation and Management of Database Systems
description and the database itself. The database schema is the detail of the database (Flender, 2010). During the database design phase, the schema is determined and has not been intended to change regularly. Though, the actual data in the database can change often, for instance, if we add information on a new employee or property. A database instance refers to the data in the database at a specific point in time. Consequently, several database examples may correspond to a single database schema. The schema sometimes is referred to as the database’s intention, whereas an instance is referred to as the database’s extension (or state) (Cooper, 1995).
3.2.5. Data Independence A key purpose of the 3-level design is to guarantee data independence, meaning that changes to lower levels do not affect higher levels. Two types of data independence exist: physical and logical (Samos et al., 1998).
Figure 3.3. The ANSISPARC 3-level architecture and data independence. Source: https://www.researchgate.net/figure/Schemas-data-independence-inANSI-SPARC-three-level-architecture_fig2_326468693.
It ought to be possible to make modifications to the conceptual schema, such as removing or adding new items, attributes, or relationships, without
Database Environment
81
having to rewrite application programs or alter existing external schemas. It is necessary for the consumers who will be directly impacted by the changes to be aware of them, but no other consumers must be aware of them (Shahzad, 2007). It must be feasible to make modifications to the internal schema, like employing different organizations of file or storage structures, utilizing various storing media, updating indexes, or hashing methods, without having to make modifications to the external or conceptual schemas. The only consequence that the consumers can perceive is a difference in performance, as this is the only aspect that has been altered. A decline in efficiency is the factor that leads to internal schema modifications the vast majority of the time. The locations of each form of data independence are depicted in Figure 3.3 concerning the 3-level architecture (Ligêza, 2006). Although the ANSI-SPARC architecture’s two-stage mapping could be wasteful, the increased data independence it offers is well worth the tradeoff. The ANSI-SPARC approach, on the other hand, permits the mapping of foreign schemas directly onto the internal schema. This enables the conceptual schema to be skipped through, which results in a more effective mapping process. This, however, lessens the data’s independence, Therefore, whenever the internal schema is updated, the external schema as well as any application programs that are based upon the internal schema may also need to be updated. This is because the internal schema is dependent upon the external schema (Neuhold & Olnhoff, 1981).
3.3. DATABASE LANGUAGES The 2 components of a data sublanguage are DDL and DML. The DDL has been utilized to define the database schema, whereas the DML has been utilized to read and modify the database. Such languages have been referred to as data sublanguages since they lack features for all computing requirements, like conditional and iterative expressions, which have been given by higher-level programming languages. Numerous DBMSs provide the ability to incorporate a sublanguage into a higher-level programming language, like Fortran, COBOL, Ada, Pascal, C++, 81C, and Java, in this instance, the higher-level language has been also termed the host language (Vossen, 1991). To generate the embedded file, the data sublanguage instructions in the host language program are 1st substituted by function calls. The pre-processed
82
The Creation and Management of Database Systems
file has been subsequently built, stored in an object module, associated with DBMS-specific library comprising the replacement functions, and performed as necessary. The majority of data sublanguages also have interactive or non-embedded instructions that may be entered directly from a terminal (Bidoit, 1991).
3.3.1. The Data Definition Language (DDL) A series of definitions defined in a particular language known as a DDL defines the database schema. The DDL has been utilized to create a schema or alter one that already exists. This isn’t possible to alter data using it. The DDL statements are compiled into a collection of tables that are stored in special files known as the system catalog. The system index incorporates metadata, or data that explains things in the database, and makes it simpler to access and change those objects (Ramamohanarao & Harland, 1994). The metadata defines data items, records, and other things which are of significance to consumers or are needed by the DBMS. Before accessing data in the database, the DBMS usually examines the system index. The names information dictionary and information directory have been utilized to detail the system catalog, while the word ‘data dictionary’ generally refers to a broader software system rather than a catalog for a DBMS (Chen et al., 1990). Theoretically, we may identify distinct DDLs for every schema in the 3-level design, including a DDL for external schemas, a DDL for conceptual schemas, and a DDL for inner schemas. In practice, however, there is a single DDL that permits the description of minimum the conceptual and external schemas (Bach & Werner, 2014).
3.3.2. The Data Manipulation Language (DML) The following are common data manipulation tasks (Liu, 1999): • modifications to database-stored information; • retrieval of information from the database; and • data is removed from the database. As a result, one of the DBMS’s primary roles is to enable a data transformation language wherein the client may write statements that would trigger data transformation. Data manipulation occurs on three levels: internal level, conceptual level, and external. Internally, however, we should create some quite intricate low-level routines that enable effective access
Database Environment
83
to data. At elevated levels, however, the focus is on the simplicity of use, and efforts are made to provide effective user engagement with the system (Kanellakis, 1995). A query language has been the component of a DML that deals with data retrieval. A query language is a higher-level special-purpose language that has been utilized to respond to a variety of requests for data retrieval from a database. As a result, the term ‘query’ is only used to refer to a retrieval statement written in a query language. While technically wrong, the phrases ‘query language’ and ‘DML’ have been frequently utilized interchangeably. The fundamental retrieval structures differentiate DMLs (Atkinson & Buneman, 1987). DML may be divided into 2 categories: nonprocedural and procedural. The primary distinction between these 2 DMLs has been that procedural languages indicate how a DML statement’s output should be acquired, whereas non-procedural DMLs just identify what result should be acquired. Procedural languages, on the other hand, work with individual records, while non-procedural languages work with groups of records (Chandra, 1981).
3.3.2.1. Procedural DMLs The client, or more commonly the programmer, decides what data has been required and how it will be obtained in a procedural DML. This implies that the client should specify all data access actions by invoking the proper processes to retrieve the information needed. Usually, a procedural DML obtains a record, evaluates it, and then retrieves another record that will be handled in the same way, etc. This retrieval procedure continues till all the data sought by the retrieval is collected. Procedural DMLs are often incorporated into a higher-level programming language that includes tools for iteration and navigational logic. The majority of hierarchical and network DMLs are procedural (Chen & Zaniolo 1999).
3.3.2.2. Non-Procedural DMLs The needed data can be supplied in a single retrieve or update statement with non-procedural DMLs. The client specifies that data has been wanted but not how it will be collected in non-procedural DMLs. A DML expression is translated into more or one procedure by the database management, which alters the appropriate sets of records. This relieves the client of the need to understand how data structures have been performed internally and what techniques are necessary to access and maybe change data, giving users a
The Creation and Management of Database Systems
84
significant level of data independence. Declarative languages are a type of non-procedural language. Non-procedural data manipulation languages, such as Query-By-Example (QBE) or Structured Query Language (SQL), are commonly included in relational DBMSs. Non-procedural DMLs are typically easier to understand and use than procedural DMLs since the consumer does less effort and the DBMS does more (Date, 1984).
3.3.3. Fourth-Generation Languages (4GLs) A 4th-generation language is essentially a shorthand programming language; there is no agreement upon what characterizes it. In a fourth-generation language (4GL), an operation that needs hundreds of lines in a 3rd-generation language (3GL), like COBOL, usually uses many fewer lines (Stemple et al., 1992). A 4GL is non-procedural in comparison to a 3GL, that is procedural: the consumer determines what has to be completed, not how. A 4GL is planned to rely heavily on 4th-generation tools, which are significantly high-level elements. The consumer doesn’t specify the stages that a program must take to complete a job; rather, the user specifies variables for the types of equipment that utilize them to create an application program. 4GLs are said to increase productivity by a factor of 10 while restricting the sorts of problems that may be addressed. Languages of the 4th generation include (Ceri et al., 1991): • • • •
Report generators and query languages are examples of presentation languages. Database and spreadsheets languages are examples of specialist languages. Application generators that create applications by defining, inserting, updating, and retrieving data from databases; Application code is generated using very higher-level languages. 4GLs include the previously stated QBE and SQL. Now we’ll take a look at some of the various varieties of 4GL.
3.3.3.1. Forms Generators A formats generator is a software tool that allows you to quickly create data input and display layouts for screen forms. The formats generator enables the client to specify how the screen should appear, what information must be presented, and where it should be displayed on the screen. Colors for
Database Environment
85
screen components and other properties, like underline, reverse video, bold, flashing, and so on, can be defined. The improved form generators enable the construction of derived characteristics, such as those generated from aggregates or arithmetic operators, as well as the definition of data input validation checks (Tresch & Scholl, 1994).
3.3.3.2. Report Generators A report generator is a program that enables you to produce reports using data from a database. It functions in the same way as a query language in that it enables the consumer to ask database queries and get data for a report. However, in the case of a report generator, we have a lot more control over the end outcome. To create our customized output reports, we can either let the report generator select how the output should look or utilize explicit report-generator command instructions (Piatetsky-Shapiro & Jakobson, 1987). Aesthetically oriented and language-oriented report generators are the two basic categories. In the first scenario, we use a sublanguage command to specify which data should be included in the report and how it should be formatted. In the 2nd scenario, we describe the same data using a capability comparable to a forms generator (Romei & Turini, 2011).
3.3.3.3. Graphics Generators A graphics generator is a tool for retrieving data from a database and displaying it as a graph that displays trends and correlations. It usually lets users make pie charts, line charts, scatter charts, and bar charts, other types of graphs (Solovyev & Polyakov, 2013).
3.3.3.4. Application Generators An application generator is a tool which allows you to create software that connects to a database. The usage of an application generator may cut down on the time it takes to create a software application from start to finish. Most application generators are made out of pre-written modules that include the basic operations which most programs require. These modules, which are often written in a higher-level language, form a ‘library’ of functions from which to pick. The client defines what the software should accomplish, and the application generator chooses how the tasks should be completed (Decker, 1998).
The Creation and Management of Database Systems
86
3.4. CONCEPTUAL MODELING AND DATA MODELS A data description language has been used to write a schema, as we said previously. In actuality, it’s written in the data description language of a certain DBMS. However, this level of language is insufficient to define an organization’s data needs in a way that has been intelligible to a wide range of users. What we need is a data model, which is a high-level detail of the schema. A model is a depiction of items and events in the actual world, as well as their relationships (Mathiske et al., 1995; Overmyer et al., 2001). It’s an abstraction that focuses on an organization’s basic, inherent characteristics while ignoring its incidental characteristics. A data model is a depiction of an organization. It must include fundamental notations and concepts that would enable end-users and database designers to interact with their understanding of organizational information unambiguously and appropriately. A data model may be conceived of as having 3 components (Wang & Zaniolo, 1999): •
a structural component composed of a collection of rules that may be used to build databases; • a manipulating portion that specifies the sorts of operations that may be performed on the data (for example, operations for retrieving or updating data from the database and modifying the database’s structure); and • specify a set of integrity restrictions that guarantee the data’s accuracy. A data model’s goal is to describe data and make it intelligible to others. If it may accomplish this, it can simply be utilized to create a database. We may identify 3 related data models that mirror the ANSI-SPARC architecture outlined in Section 2.1 (Andries & Engels, 1994): an exterior data model, commonly referred to as the Universe of Discourse (UoD), to reflect every user’s perspective on the organization; • a conceptual data model to express the logical view which is independent of the DBMS; • An internal data model is used to express the conceptual schema in a fashion that the DBMS may understand. The research literature has several data models that are proposed. Record-based, object-based, and physical data models are the 3 primary classifications that they are placed into. The very first two have been utilized •
Database Environment
87
to explain the data at the external and conceptual levels, whereas the third has been utilized to explain data at the internal level (Trinder, 1990).
3.4.1. Object-Based Data Models Characteristics, entities, and associations are used in object-based data models. An entity is a distinctive item (place, event, person, concept, thing) that must be recorded in the database. An attribute is a characteristic that defines a certain feature of the item to be recorded, while a relationship is a link between things. Among the most prevalent forms of object-based data models are (Sutton & Small, 1995): • Semantic; • Entity–Relationship; • Object-Oriented; and • Functional. The Entity-Relationship (ER) model has evolved as among the most important methodologies for database design and serves as the foundation for the database design methodology employed in this chapter. The objectoriented data model (OODM) broadened the concept of an entity that includes not just characteristics that define the state of an object, but as well as its related behavior, or actions. It is stated that the object encapsulates both behavior and state (Chorafas, 1986).
3.4.2. Record-Based Data Models In a record-based architecture, the database has been composed of a variety of constant-format records, which may be of different sorts. Every type of record specifies a given number of fields with a constant length. Network data model, relational data model, and hierarchical data model have been the 3 main forms of record-based logical data models. The network data models and hierarchy had been created over a decade even before the relational data model, therefore its connections to conventional file processing principles are stronger (Bédard & Larrivée, 2008).
3.4.2.1. Relational Data Model The mathematical relationships notion underpins the relational data paradigm. Data and relationships have been expressed in the relational model
88
The Creation and Management of Database Systems
as tables, each one containing several columns having a unique identifier. Figure 3.4 shows a prototype relational schema for a section of the Dream Home case study, including branch and personnel information (Wang & Brooks, 2007). Employee John White, for instance, is a manager with a salary of £30,000 who operates at the branch (branchNo) B005, which is located at 22 Deer Rd in London, according to the first table. It’s crucial to remember that staff and branch have a relationship: a branch office contains employees. Furthermore, there has been no explicit connection between the two tables; we may only create a relationship by understanding that the property branchNo in the Staff relation is like the branchNo in the Branch relationship (May et al., 1997). It’s worth noting that the relational data model just demands that the database be seen as tables by the client (Robinson, 2011).
Figure 3.4. This is an example of a relational schema. Source: https://www.researchgate.net/figure/Source-relational-schema-LIBRARY-REL_fig2_264458232.
Database Environment
89
Figure 3.5. This is an example of a network schema. Source: ately/.
https://creately.com/blog/examples/network-diagram-templates-cre-
Nonetheless, this view only refers to the database’s logical structure, for example, the conceptual and external layers of the ANSI-SPARC design. It doesn’t pertain to the database’s physical structure, which may be executed utilizing several storage types (Roussopoulos & Karagiannis, 2009).
3.4.2.2. Network Data Model Data has been depicted as the sets of records in the network model, whereas connections have been portrayed using sets. In contrast to the relational model, the sets clearly describe the connections between the entities, and these sets become pointers when the execution is carried out (Montevechi et al., 2010). The records are arranged like that of extended graph structures, with the records taking the place of nodes (also known as segments) and sets taking the place of edges. An example of a network schema is shown in Figure 3.5 for the same data collection that was shown in Figure 3.4. IDMS/ R from Computer Associates has been the most widely used DBMS for networks (Badia, 2002).
3.4.2.3. Hierarchical Data Model Hierarchical models are a kind of network model. Likewise, information is depicted as the sets of records and associations as the sets. Furthermore, the hierarchical model permits just one parent per node. Hierarchical models
90
The Creation and Management of Database Systems
may be represented as tree graphs with records as nodes (or segments) and sets as edges. Figure 3.6 shows a hierarchical framework for Figure 3.4’s data. IMS from IBM is the primary hierarchical DBMS, however, it also supports non-hierarchical functionality (Calvanese et al., 2009). The general structure of the database and a high-level detail of the execution are specified using record-based (logical) data models. Their fundamental disadvantage is that they lack proper capabilities for explicitly providing data constraints, while object-based data models cannot give logical structure but give greater semantic content by enabling the client to define data limitations. The relational paradigm underpins the bulk of current commercial systems, while early database systems relied on either hierarchical or network data models (Ram & Liu, 2006).
Figure 3.6. This is an example of a hierarchical schema. Source: https://www.geeksforgeeks.org/difference-between-hierarchical-andrelational-data-model/.
The latter 2 methods need the client to be familiar with the actual database being viewed, while the former offers significant data independence. As a result, whereas relational systems use a declarative method for database operations (for example, they describe what data should be obtained), hierarchical, and network systems use a navigational method (In other words, they indicate how the data should be obtained) (Borgida, 1986).
Database Environment
91
3.4.3. Physical Data Models The data that has been represented by physical data models includes record orderings, record structures, and access pathways. These models are used to explain how data has been stored in the computer. The number of logical data models is far higher than the number of physical data models, the most prevalent of which are the frame memory and the unifying model (Boehnlein & Ulbrich-vom Ende, 1999).
3.4.4. Conceptual Modeling Examining the 3-level design reveals that the ‘heart’ of the database is the conceptual schema. It provides support for all outside views and has been assisted by the internal schema. Nevertheless, the internal schema is only the actual realization of the mental schema. The conceptual schema must be an exhaustive and precise depiction of the enterprise’s data requirements. If that’s not the case, certain business data may be missing or wrongly displayed, and we’ll have problems executing one or more outside views in their entirety (Jarke & Quix, 2017). The procedure of creating a model of the data used in an organization which is independent of execution details, like the target DBMS, programming languages, application programs, or any other physical considerations, has been referred to as conceptual modeling. Another name for this procedure has been conceptual database design (Ceri et al., 2002). That paradigm has been termed a conceptual data model. In the academic literature, conceptual models can also be found under the name logical models. On the other hand, we differentiate between logical and conceptual data models throughout this work. In contrast to the logical model, which presupposes prior knowledge of the target DBMS’s fundamental data model, the conceptual model has been unaffected by any execution specifics (Delcambre et al., 2018).
3.5. FUNCTIONS OF A DBMS In this part, we will examine the many kinds of functions and services that we would anticipate being provided by a DBMS. Codd (1982) provides a list of eight functions that every full-scale DBMS must be able to perform, and we have added two more that can be anticipated to be accessible if one is to have reasonable expectations (Jaedicke & Mitschang, 1998). •
Data Storage, Update, Retrieval: A DBMS should allow users to save, access, and change data in the database. It is the most
The Creation and Management of Database Systems
92
•
• • • • • • •
•
basic function of a DBMS, and it goes without saying that to provide this feature, the DBMS must hide the internal physical details of the implementation (like storage structures and the organization of files) from the client (He et al., 2005). A Client-Accessible List: A DBMS should have a list that contains the detail of data objects and is available to consumers. An essential aspect of the ANSI-SPARC architecture is the determination of an integrated system catalog that will serve as a repository for data regarding schemas, consumers, programs, and other such things. Both customers and the DBMS ought to have access to the catalog (Stonebraker et al., 1993). Often referred to as a data dictionary, a system catalog is the repository of information within a database that represents the data; in other words, it is the “information about the data” or the metadata. The DBMS has an impact not only on the total amount of data that is stored but also on how that data has been utilized. In most cases, the system catalog contains the following information (Dogac et al., 1994): Types, sizes, and names of data elements; Names of associations; Integrity restrictions on the data; Names of authenticated persons who have access to the information; Data elements that every client may access and the sorts of access permitted, such as delete, update, insert or read access; Internal, conceptual, and external schemas as well as schema mappings; Use statistics, like the frequency of database transactions and the number of database object accesses. The DBMS catalog has been one of the system’s main elements. Several of the software elements which we will explain in the next part obtain their data from the system catalog. Certain advantages of a system catalog include (Arens et al., 2005): Data-related data can be gathered and kept securely. This facilitates the maintenance of control over the information as a resource;
Database Environment
93
•
The meaning of information may be specified, allowing other consumers to comprehend the data’s purpose; • Communication has been facilitated by the storage of precise meanings. Additionally, the system catalog can indicate the consumer or consumers that own or have access to the information; • It is easier to identify redundancies and discrepancies when data is consolidated (Zalipynis, 2020); • Database modifications may be logged; • Because the system catalog captures every piece of data, all its connections, and all of its consumers, the effect of an alteration may be evaluated before it has been executed; • Security may be implemented; • Integrity is attainable; and • Audit data is provided. Certain writers distinguish between a system catalog and a data directory, with a data directory including data about how and where data has been kept. The ISO has approved the Information Resource Dictionary System (IRDS) as a standard for information dictionaries (ISO, 1990, 1993). IRDS is a software solution that may be utilized to manage and document the data resources of a company. It defines the tables that make up the data dictionary, as well as the actions that may be utilized to retrieve them. In this book, we referred to all repository data as a “system catalog.” Other sorts of statistical data contained in the system catalog are discussed to aid query efficiency (Ng & Muntz, 1999). •
Transaction Support: A technique should be included in a system for managing databases that can guarantee that either all of the related changes with a specific transaction have been executed or none of them has been performed at all. A collection of operations carried out by a single user or piece of application software that either accesses or updates the contents of a database is referred to as a transaction. In the context of the Dream Home case study, examples of straightforward transactions would be adding a new member of staff to the database, revising the salary of an existing member of staff, or removing a property from the register (Carey et al., 1988).
The Creation and Management of Database Systems
94
Figure 3.7. The lost update. Source: https://www.educba.com/hierarchical-database-model/.
The deletion of a member of staff from the database and the transfer of the responsibilities that were previously held by that individual to another member of staff is an example of a more complex use case. Several database variations need to be done in this situation. If the transaction fails to complete, maybe due to a computer crash, the database would be in an unstable state, with a few changes performed and others not. As a result, the modifications made would have to be reversed to restore consistency to the database (Nidzwetzki & Güting, 2015). •
Concurrency Control Services: When several consumers are changing the database at the same time, the DBMS should provide a means to verify that the database has been updated appropriately. Using a DBMS to achieve many users simultaneously seeing shared data is one of the primary aims that should be pursued (Chakravarthy, 1989). When all consumers are doing nothing more than reading the material, concurrent access is a very straightforward process because there has been no way for them to communicate with one another. The interruption can take place when two or many customers access the database at the same time, and at the very least, one of them is making changes to the data. This can lead to inconsistencies in the information. Consider 2 transactions T1 and T2, which have been running at the same time, as shown in Figure 3.7 (Khuan et al., 2008). T1 withdraws £10 from an account (having a positive balance) and T2 deposits £100 into a similar account. If such transactions had been conducted sequentially, without any interleaving, the total balance will be
Database Environment
95
£190 irrespective of which had been executed initially. Thus, in this case, transactions T1 and T2 begin almost simultaneously and both display a balance of £100. T2 then raises balx by £100 to £200 and saves the database modification. In the meantime, transaction T1 decrements its copy of balx by £10 to £90 and puts this value in the database, thus “losing” £100. The DBMS should assure that interference may not happen when numerous consumers access the database simultaneously (Linnemann et al., 1988). •
•
•
Recovery Services: A DBMS should include a technique for restoring the database if it becomes corrupted. When addressing transaction support, we emphasized that the database must be restored to a stable state when a transaction fails. It can be the consequence of a media failure, a system crash, a software or hardware issue forcing the DBMS to stop, or the consumer can have detected an error and aborted the transaction before its completion. In each of these circumstances, the DBMS should provide a method for recovering the database to a stable state (Zlatanova, 2006). Authorization Services: A DBMS should provide a means to restrict database access to just authenticated persons. This is not hard to imagine situations in which we will wish to restrict access to part of the information stored. For instance, we can require only branch managers to view salary-related data to the employee and restrict access to this information to all other consumers. Furthermore, we can wish to prevent illegal access to the database. The phrase security relates to the safeguarding of a database from unintended or inadvertent unwanted access. We antedate that the DBMS would include tools to assure data security (Isaac & Harikumar, 2016). Support for the Communication of Data: A DBMS should be compatible with the software application. Many database clients connect via workstations. Occasionally, such workstations have been directly linked to the machine containing the DBMS. In other instances, distant workstations connect with the machine hosting the DBMS over a network. In both scenarios, the DBMS receives requests as communications messages and reacts in the same manner (Härder et al., 1987). A Data Communication Manager (DCM) oversees all of these transfers. Even though the DCM is not a component of the DBMS, the DBMS must be able to integrate with many DCMs for the system to be
The Creation and Management of Database Systems
96
•
•
economically successful. Even DBMSs for personal computers must be capable of functioning over a local area network since a centralized database may be created for consumers to share, as opposed to having a collection of fragmented databases, one per user. It doesn’t indicate that the database must be dispersed over the network; instead, consumers must be able to remotely access a centralized database (Burns et al., 1986). Integrity Services: A DBMS should offer a way to ensure that the information contained in the database as well as any modifications made to the data adhere to specific criteria. The integrity of databases refers to the correctness and dependability of the data that has been stored and is an additional type of database protection. Integrity, which has traditionally been tied to security, also has larger implications, as it is primarily concerned with the quality of the data. It is common practice to use the term “restrictions” to describe database integrity (Lohman et al., 1991). Restrictions are defined as rules of consistency that the database cannot violate under any circumstances. For instance, we may desire to impose a cap on the number of properties that any one member of staff is permitted to handle in any given location at any given time. In this case, we’d want the DBMS to ensure that this limitation isn’t exceeded when we allocate a property to a member of staff and to prohibit the assignment from taking place if it is. A DBMS might additionally fairly be expected to deliver the following 2 functions in addition to such eight (Jovanov et al., 1992). Services that Encourage Data Independence: A DBMS needs to allow for program independence from the structure of the database. Establishing data independence often requires the utilization of a subschema or view approach. The independence of the physical data is easy to obtain: there are typically many different kinds of adjustments that may be made to the physical features of the database without affecting the views (Joo et al., 2006). On the other hand, achieving completely logical data independence can be quite challenging. In most cases, it is
Database Environment
•
• • • • •
97
possible to incorporate a new object, attribute, or connection, but that’s not possible to eliminate them. In some of these systems, it is forbidden to make any changes to the logical structure by modifying any of the elements that are already there (Ge & Zdonik, 2007). Utility services: A DBMS must have a set of utility services. Utility applications assist the DBA in properly administering the database. Certain utilities operate on an external level and, as a result, may be created by the DBA. Other utilities are internal and may only be given by the DBMS provider. The following are some instances of the latter type of utility: Exports and imports tools are available to load the database from flat files and to unload the database to flat files; Capabilities for monitoring database utilization and operation; To assess the efficiency or use of statistics, statistical analysis applications are used; Index rearrangement facilities, to rearrange indexes and their overflows; and The process of reallocating garbage and collecting it requires physically removing expired data from storage devices, aggregating the space that is freed up as a result, and reallocating it to the area where it has been requested (Gravano et al., 2001).
3.6. COMPONENTS OF A DBMS DBMSs are extremely complicated and sophisticated software applications that try to deliver the functions mentioned above. The constituent structure of a DBMS cannot be generalized since it differs widely from system to system. When attempting to comprehend database systems, although, it is beneficial to try to visualize the elements and their relations. In this part, we’ll look at a hypothetical DBMS architecture (Chakravarthy, 1989). A DBMS has been divided into multiple software elements, that each perform a specialized function. As earlier mentioned, the underlying OS supports a few of the DBMSs capabilities. The OS, on the other hand, just offers fundamental functions, and the DBMS must be developed on top
98
The Creation and Management of Database Systems
of that. As a result, the interaction between the DBMS and the operating system should be considered while designing a DBMS. Figure 3.8 depicts the primary software elements in a DBMS setup. This graphic depicts how the DBMS interacts with other software elements including consumer queries and access methods (Methods for managing files to store and retrieve data records). We’ll go through how to organize your files and how to access them. Weiderhold (1983), Fry, and Teorey (1982), Weiderhold (1983), Ullman (1988), and Barnes and Smith (1987) recommended a more extensive approach (Snodgrass, 1992).
Figure 3.8. Main elements of a DBMS. Source: https://www.techtarget.com/searchdatamanagement/definition/database-management-system.
Database Environment
99
Figure 3.9. Elements of a database manager. Source: https://corporatefinanceinstitute.com/resources/knowledge/data-analysis/database/.
The following elements are depicted in Figure 3.8 (Hellerstein et al., 2007): •
•
Query processor: this is a significant component of the DBMS that converts queries into a set of lower-level commands that are then sent to the database manager (DM). Database Manager (DM): The DM interacts with application programs and questions sent by users. The DM takes queries and looks at the external and conceptual schemas to see what conceptual records are needed to fulfill the request. The DM then contacts the file manager to complete the task. Figure 3.9 depicts the elements of the DM (Härder et al., 1987).
The Creation and Management of Database Systems
100
•
•
•
•
•
•
•
• •
•
File manager: The file manager handles the allotment of storage space on a disk and distorts the underlying storage files. It builds and maintains the internal schema’s set of indexes and structures. If hashed files are utilized, record addresses are generated using the hashing functions. The file manager, on the other hand, does not control the physical entry and extraction of information. Instead, it forwards requests to the proper access techniques, which either read from or write to the system buffer (or cache) (Arulraj & Pavlo, 2017). DML preprocessor: This module translates DML statements in an application program into regular host language function calls. To build the proper code, the DML preprocessor should communicate with the query processor. DDL compiler: The DDL compiler transforms DDL statements into a collection of metadata-containing tables. The system catalog is then used to keep such tables, whereas control information has been saved in the file headers of data files (Zhang et al., 2018). Catalog manager: The catalog manager handles system catalog access and maintenance. The system catalog has been utilized by the majority of DBMS elements. The following are the primary software elements of the DM: Authorization control: That module verifies that the consumer has the requisite permissions to perform the requested operation (Delis & Roussopoulos, 1993). Command processor: Control is handed to the command processor once the system has verified that the consumer has the authorization to perform the task. Integrity checker: The integrity checker verifies that the proposed operation fulfills all essential integrity limitations when it modifies the database (like key limitations). Query optimizer: This module selects the best query implementation technique (Batory & Thomas, 1997). Transaction manager: This module is accountable for carrying out the necessary processing of operations it has been given by transactions. Scheduler: This module is in charge of making sure that concurrent database activities don’t interfere with one another. It
Database Environment
•
•
101
determines the sequence in which transaction activities have been carried out (Aref & Samet, 1991). Recovery manager: In the event of a failure, this module guarantees that the database remains constant. It’s in charge of committing and aborting transactions. Buffer manager: This module is in charge of data transport among secondary storage and main memory devices like disks and tapes. The data manager, which includes the buffer manager and the recovery manager, is referred to as a single entity. The cache manager is another name for the buffer manager (Rochwerger et al., 2009).
102
The Creation and Management of Database Systems
REFERENCES 1.
Aier, S., & Winter, R., (2009). Virtual decoupling for IT/business alignment–conceptual foundations, architecture design and implementation example. Business & Information Systems Engineering, 1(2), 150–163. 2. Andries, M., & Engels, G., (1994). Syntax and semantics of hybrid database languages. In: Graph Transformations in Computer Science (Vol. 1, pp. 19–36). Springer, Berlin, Heidelberg. 3. Aref, W. G., & Samet, H., (1991). Extending a DBMS with spatial operations. In: Symposium on Spatial Databases (Vol. 1, pp. 297–318). Springer, Berlin, Heidelberg. 4. Arens, C., Stoter, J., & Van, O. P., (2005). Modeling 3D spatial objects in a geo-DBMS using a 3D primitive. Computers & Geosciences, 31(2), 165–177. 5. Arulraj, J., & Pavlo, A., (2017). How to build a non-volatile memory database management system. In: Proceedings of the 2017 ACM International Conference on Management of Data (Vol. 1, pp. 1753– 1758). 6. Atkinson, M. P., & Buneman, O. P., (1987). Types and persistence in database programming languages. ACM Computing Surveys (CSUR), 19(2), 105–170. 7. Bach, M., & Werner, A., (2014). Standardization of NoSQL database languages. In: International Conference: Beyond Databases, Architectures and Structures (Vol. 1, pp. 50–60). Springer, Cham. 8. Badia, A., (2002). Conceptual modeling for semistructured data. In: Web Information Systems Engineering Workshops, International Conference (Vol. 1, pp. 160–170). IEEE Computer Society. 9. Batory, D., & Thomas, J., (1997). P2: A lightweight DBMS generator. Journal of Intelligent Information Systems, 9(2), 107–123. 10. Bédard, Y., & Larrivée, S., (2008). Spatial database modeling with pictrogrammic languages. Encyclopedia of GIS, 1, 716–725. 11. Bidoit, N., (1991). Negation in rule-based database languages: A survey. Theoretical Computer Science, 78(1), 3–83. 12. Biller, H., (1982). On the architecture of a system integrating data base management and information retrieval. In: International Conference on Research and Development in Information Retrieval (Vol. 1, pp. 80–97). Springer, Berlin, Heidelberg.
Database Environment
103
13. Boehnlein, M., & Ulbrich-Vom, E. A., (1999). Deriving initial data warehouse structures from the conceptual data models of the underlying operational information systems. In: Proceedings of the 2nd ACM International Workshop on Data Warehousing and OLAP (Vol. 1, pp. 15–21). 14. Borgida, A., (1986). Conceptual modeling of information systems. On Knowledge Base Management Systems, 1, 461–469. 15. Burns, T., Fong, E., Jefferson, D., Knox, R., Mark, L., Reedy, C., & Truszkowski, W., (1986). Reference model for DBMS standardization. SIGMOD Record, 15(1), 19–58. 16. Calvanese, D., Giacomo, G. D., Lembo, D., Lenzerini, M., & Rosati, R., (2009). Conceptual modeling for data integration. In: Conceptual Modeling: Foundations and Applications (Vol. 1, pp. 173–197). Springer, Berlin, Heidelberg. 17. Carey, M. J., DeWitt, D. J., Graefe, G., Haight, D. M., Richardson, J. E., Schuh, D. T., & Vandenberg, S. L., (1988). The EXODUS Extensible DBMS Project: An Overview, 1, 3–9. 18. Ceri, S., Fraternali, P., & Matera, M., (2002). Conceptual modeling of data-intensive web applications. IEEE Internet Computing, 6(4), 20–30. 19. Ceri, S., Tanca, L., & Zicari, R., (1991). Supporting interoperability between new database languages. In: [1991] Proceedings, Advanced Computer Technology, Reliable Systems and Applications (Vol. 1, pp. 273–281). IEEE. 20. Chakravarthy, S., (1989). Rule management and evaluation: An active DBMS perspective. ACM SIGMOD Record, 18(3), 20–28. 21. Chandra, A. K., (1981). Programming primitives for database languages. In: Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Vol. 1, pp. 50–62). 22. Chen, C. X., & Zaniolo, C., (1999). Universal temporal extensions for database languages. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337) (Vol. 1, pp. 428–437). IEEE. 23. Chen, W., Kifer, M., & Warren, D. S., (1990). Hilog as a platform for database languages. In: Proceedings of the 2nd. International Workshop on Database Programming Languages (Vol. 1, pp. 315–329). 24. Chorafas, D. N., (1986). Fourth and Fifth Generation Programming Languages: Integrated Software, Database Languages, and Expert
104
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
The Creation and Management of Database Systems
Systems (Vol. 1, pp. 5–9). McGraw-Hill, Inc. CLASS, A., (2001). To illustrate how this language can be used, Figure 6 shows the definition of the example. In: OOIS 2000: 6th International Conference on Object Oriented Information Systems (Vol. 1, p. 11). London, UK: Proceedings. Springer Science & Business Media. Cooper, R., (1995). Configuring database query languages. In: Interfaces to Database Systems (IDS94) (Vol. 1, pp. 3–21). Springer, London. Date, C. J., (1984). Some principles of good language design: With especial reference to the design of database languages. ACM SIGMOD Record, 14(3), 1–7. De Brock, B., (2018). Towards a theory about continuous requirements engineering for information systems. In: REFSQ Workshops (Vol. 1, pp. 3–7). Decker, S., (1998). On domain-specific declarative knowledge representation and database languages. In: KRDB-98—Proceedings of the 5th Workshop Knowledge Representation Meets Data Bases (Vol. 1, pp. 4–9). Seattle, WA. Delcambre, L. M., Liddle, S. W., Pastor, O., & Storey, V. C., (2018). A reference framework for conceptual modeling. In: International Conference on Conceptual Modeling (Vol. 1, pp. 27–42). Springer, Cham. Delis, A., & Roussopoulos, N., (1993). Performance comparison of three modern DBMS architectures. IEEE Transactions on Software Engineering, 19(2), 120–138. Dogac, A., Ozkan, C., Arpinar, B., Okay, T., & Evrendilek, C., (1994). METU object-oriented DBMS. In: Advances in Object-Oriented Database Systems (Vol. 1, pp. 327–354). Springer, Berlin, Heidelberg. Fischer, M., Link, M., Ortner, E., & Zeise, N., (2010). Servicebase management systems: A three-schema-architecture for servicemanagement. INFORMATIK 2010: Service Science–Neue Perspektiven für die Informatik. Band 1 (Vol. 1, pp. 3–9). Flender, C., (2010). A quantum interpretation of the view-update problem. In: Proceedings of the Twenty-First Australasian Conference on Database Technologies (Vol. 104, pp. 67–74). Ge, T., & Zdonik, S., (2007). Fast, secure encryption for indexing in a column-oriented DBMS. In: 2007 IEEE 23rd International Conference
Database Environment
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
105
on Data Engineering (Vol. 1, pp. 676–685). IEEE. Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., Pietarinen, L., & Srivastava, D., (2001). Using q-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull., 24(4), 28–34. Grefen, P. W. P. J., Ludwig, H., & Angelov, S., (2002). A Framework for E-Services: A Three-Level Approach Towards Process and Data Management (Vol. 1, pp. 02–07). Centre for Telematics and Information Technology, University of Twente. Grefen, P., & Angelov, S., (2001). A three-level process framework for contract-based dynamic service outsourcing. In: Proceedings of the 2nd International Colloquium on Petri Net Technologies for Modelling Communication Based Systems (Vol. 1, pp. 123–128). Berlin, Germany. Grefen, P., Ludwig, H., & Angelov, S., (2003). A three-level framework for process and data management of complex e-services. International Journal of Cooperative Information Systems, 12(04), 487–531. Habela, P., Stencel, K., & Subieta, K., (2006). Three-level objectoriented database architecture based on virtual updateable views. In: International Conference on Advances in Information Systems (Vol. 1, pp. 80–89). Springer, Berlin, Heidelberg. Härder, T., Meyer-Wegener, K., Mitschang, B., & Sikeler, A., (1987). PRIMA: A DBMS Prototype Supporting Engineering Applications, 1, 2–9. He, Z., Lee, B. S., & Snapp, R., (2005). Self-tuning cost modeling of user-defined functions in an object-relational DBMS. ACM Transactions on Database Systems (TODS), 30(3), 812–853. Hellerstein, J. M., Stonebraker, M., & Hamilton, J., (2007). Architecture of a database system. Foundations and Trends® in Databases, 1(2), 141–259. Isaac, J., & Harikumar, S., (2016). Logistic regression within DBMS. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (Vol. 1, pp. 661–666). IEEE. Jaedicke, M., & Mitschang, B., (1998). On parallel processing of aggregate and scalar functions in object-relational DBMS. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (Vol. 1, pp. 379–389). Jarke, M., & Quix, C., (2017). On warehouses, lakes, and spaces: The changing role of conceptual modeling for data integration. In:
106
47.
48.
49.
50.
51.
52.
53.
54.
55.
56. 57.
The Creation and Management of Database Systems
Conceptual Modeling Perspectives (Vol. 1, pp. 231–245). Springer, Cham. Joo, Y. J., Kim, J. Y., Lee, Y. I., Moon, K. K., & Park, S. H., (2006). Design and implementation of map databases for telematics and car navigation systems using an embedded DBMS. Spatial Information Research, 14(4), 379–389. Jovanov, E., Starcevic, D., Aleksic, T., & Stojkov, Z., (1992). Hardware implementation of some DBMS functions using SPR. In: Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences (Vol. 1, pp. 328–337). IEEE. Kanellakis, P., (1995). Constraint programming and database languages: A tutorial. In: Proceedings of the Fourteenth ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (Vol. 1, pp. 46–53). Kemp, G. J., Angelopoulos, N., & Gray, P. M., (2000). A schemabased approach to building a bioinformatics database federation. In: Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering (Vol. 1, pp. 13–20). IEEE. Kemp, G. J., Angelopoulos, N., & Gray, P. M., (2002). Architecture of a mediator for a bioinformatics database federation. IEEE Transactions on Information Technology in Biomedicine, 6(2), 116–122. Khuan, C. T., Abdul-Rahman, A., & Zlatanova, S., (2008). 3D solids and their management in DBMS. In: Advances in 3D Geoinformation Systems (Vol. 1, pp. 279–311). Springer, Berlin, Heidelberg. Koonce, D. A., (1995). Information model level integration for CIM systems: A unified database approach to concurrent engineering. Computers & Industrial Engineering, 29(1–4), 647–651. Li, Q., & Wang, L., (2007). Devising a semantic model for multimedia databases: Rationale, facilities, and applications. In: Third International Conference on Semantics, Knowledge and Grid (SKG 2007) (Vol. 1, pp. 14–19). IEEE. Li, Q., Li, N., Wang, L., & Sun, X., (2009). A new semantic model with applications in a multimedia database system. Concurrency and Computation: Practice and Experience, 21(5), 691–704. Ligêza, A., (2006). Principles of verification of rule-based systems. Logical Foundations for Rule-Based Systems, 1, 191–198. Linnemann, V., Küspert, K., Dadam, P., Pistor, P., Erbe, R., Kemper, A.,
Database Environment
58. 59.
60.
61.
62.
63.
64.
65.
66.
107
& Wallrath, M., (1988). Design and implementation of an extensible database management system supporting user defined data types and functions. In: VLDB (Vol. 1, pp. 294–305). Liu, M., (1999). Deductive database languages: Problems and solutions. ACM Computing Surveys (CSUR), 31(1), 27–62. Lohman, G. M., Lindsay, B., Pirahesh, H., & Schiefer, K. B., (1991). Extensions to starburst: Objects, types, functions, and rules. Communications of the ACM, 34(10), 94–109. Mathiske, B., Matthes, F., & Schmidt, J. W., (1995). Scaling database languages to higher-order distributed programming. In: Proceedings of the Fifth International Workshop on Database Programming Languages (Vol. 1, pp. 1–12). May, W., Ludäscher, B., & Lausen, G., (1997). Well-founded semantics for deductive object-oriented database languages. In: International Conference on Deductive and Object-Oriented Databases (pp. 320– 336). Springer, Berlin, Heidelberg. Montevechi, J. A. B., Leal, F., De Pinho, A. F., Costa, R. F., De Oliveira, M. L. M., & Silva, A. L. F., (2010). Conceptual modeling in simulation projects by mean adapted IDEF: An application in a Brazilian tech company. In: Proceedings of the 2010 Winter Simulation Conference (Vol. 1, pp. 1624–1635). IEEE. Neuhold, E. J., & Olnhoff, T., (1981). Building data base management systems through formal specification. In: International Colloquium on the Formalization of Programming Concepts (Vol. 1, pp. 169–209). Springer, Berlin, Heidelberg. Ng, K. W., & Muntz, R. R., (1999). Parallelizing user-defined functions in distributed object-relational DBMS. In: Proceedings. IDEAS’99: International Database Engineering and Applications Symposium (Cat. No. PR00265) (Vol. 1, pp. 442–450). IEEE. Nidzwetzki, J. K., & Güting, R. H., (2015). Distributed SECONDO: A highly available and scalable system for spatial data processing. In: International Symposium on Spatial and Temporal Databases (Vol. 1, pp. 491–496). Springer, Cham. Okayama, T., Tamura, T., Gojobori, T., Tateno, Y., Ikeo, K., Miyazaki, S., & Sugawara, H., (1998). Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. Bioinformatics (Oxford, England), 14(6), 472–478.
108
The Creation and Management of Database Systems
67. Overmyer, S. P., Benoit, L., & Owen, R., (2001). Conceptual modeling through linguistic analysis using LIDA. In: Proceedings of the 23rd International Conference on Software Engineering: ICSE 2001 (Vol. 1, pp. 401–410). IEEE. 68. Philip, S. Y., Heiss, H. U., Lee, S., & Chen, M. S., (1992). On workload characterization of relational database environments. IEEE Trans. Software Eng., 18(4), 347–355. 69. Piatetsky-Shapiro, G., & Jakobson, G., (1987). An intermediate database language and its rule-based transformation to different database languages. Data & Knowledge Engineering, 2(1), 1–29. 70. Pieterse, H., & Olivier, M., (2012). Data hiding techniques for database environments. In: IFIP International Conference on Digital Forensics (Vol. 1, pp. 289–301). Springer, Berlin, Heidelberg. 71. Ram, S., & Liu, J., (2006). Understanding the semantics of data provenance to support active conceptual modeling. In: International Workshop on Active Conceptual Modeling of Learning (Vol. 1, pp. 17– 29). Springer, Berlin, Heidelberg. 72. Ramamohanarao, K., & Harland, J., (1994). An introduction to deductive database languages and systems. VLDB J., 3(2), 107–122. 73. Robinson, S., (2011). Choosing the right model: Conceptual modeling for simulation. In: Proceedings of the 2011 Winter Simulation Conference (WSC) (Vol. 1, pp. 1423–1435). IEEE. 74. Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K., Llorente, I. M., & Galán, F., (2009). The reservoir model and architecture for open federated cloud computing. IBM Journal of Research and Development, 53(4), 4–1. 75. Romei, A., & Turini, F., (2011). Inductive database languages: Requirements and examples. Knowledge and Information Systems, 26(3), 351–384. 76. Rossiter, B. N., & Heather, M. A., (2005). Conditions for interoperability. In: ICEIS (1) (Vol. 1, pp. 92–99). 77. Roussopoulos, N., & Karagiannis, D., (2009). Conceptual modeling: Past, present and the continuum of the future. In: Conceptual Modeling: Foundations and Applications (Vol. 1, pp. 139–152). Springer, Berlin, Heidelberg. 78. Samos, J., Saltor, F., & Sistac, J., (1998). Definition of derived classes in OODBs. In: Proceedings. IDEAS’98. International Database
Database Environment
79.
80.
81.
82.
83.
84.
85.
86.
87.
88. 89.
109
Engineering and Applications Symposium (Cat. No. 98EX156) (Vol. 1, pp. 150–158). IEEE. Samos, J., Saltor, F., Sistac, J., & Bardes, A., (1998). Database architecture for data warehousing: An evolutionary approach. In: International Conference on Database and Expert Systems Applications (Vol. 1, pp. 746–756). Springer, Berlin, Heidelberg. Shahzad, M. K., (2007). Version-manager: For vital simulating advantages in data warehouse. Computing and Information Systems, 11(2), 35. Snodgrass, R. T., (1992). Temporal databases. In: Theories and Methods of Spatio-Temporal Reasoning in Geographic Space (Vol. 1, pp. 22–64). Springer, Berlin, Heidelberg. Solovyev, V. D., & Polyakov, V. N., (2013). Database “languages of the world” and its application. State of the art. Computational Linguistics, 1, 3–9). Stemple, D., Sheard, T., & Fegaras, L., (1992). Linguistic reflection: A bridge from programming to database languages. In: Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences (Vol. 2, pp. 844–855). IEEE. Stonebraker, M., Agrawal, R., Dayal, U., Neuhold, E. J., & Reuter, A., (1993). DBMS research at a crossroads: The Vienna update. In: VLDB (Vol. 93, pp. 688–692). Sutton, D., & Small, C., (1995). Extending functional database languages to update completeness. In: British National Conference on Databases (Vol. 1, pp. 47–63). Springer, Berlin, Heidelberg. Sy, O., Duarte, D., & Dal, B. G., (2018). Ontologies and relational databases meta-schema design: A three-level unified lifecycle. In: 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT) (Vol. 1, pp. 518–523). IEEE. Tresch, M., & Scholl, M. H., (1994). A classification of multi-database languages. In: Proceedings of 3rd International Conference on Parallel and Distributed Information Systems (Vol. 1, pp. 195–202). IEEE. Trinder, P., (1990). Referentially transparent database languages. In: Functional Programming (Vol. 1, pp. 142–156). Springer, London. Van, B. P., Kovacs, G., & Micsik, A., (1994). Transformation of database populations and operations from the conceptual to the internal level. Information Systems, 19(2), 175–191.
110
The Creation and Management of Database Systems
90. Vossen, G., (1991). Data Models, Database Languages and Database Management Systems (Vol. 1, pp. 3–5). Addison-Wesley Longman Publishing Co., Inc. 91. Wang, H., & Zaniolo, C., (1999). User-defined aggregates in database languages. In: International Symposium on Database Programming Languages (Vol. 1, pp. 43–60). Springer, Berlin, Heidelberg. 92. Wang, W., & Brooks, R. J., (2007). Empirical investigations of conceptual modeling and the modeling process. In: 2007 Winter Simulation Conference (Vol. 1, pp. 762–770). IEEE. 93. Yannakoudakis, E. J., Tsionos, C. X., & Kapetis, C. A., (1999). A new framework for dynamically evolving database environments. Journal of Documentation, 1, 3–9. 94. Zalipynis, R. A. R., (2020). BitFun: Fast answers to queries with tunable functions in geospatial array DBMS. Proceedings of the VLDB Endowment, 13(12), 2909–2912. 95. Zdonik, S. B., & Wegner, P., (1988). Language and methodology for object-oriented database environments. In: Data Types and Persistence (Vol. 1, pp. 155–171). Springer, Berlin, Heidelberg. 96. Zhang, B., Van, A. D., Wang, J., Dai, T., Jiang, S., Lao, J., & Gordon, G. J., (2018). A demonstration of the otterTune automatic database management system tuning service. Proceedings of the VLDB Endowment, 11(12), 1910–1913. 97. Zlatanova, S., (2006). 3D geometries in spatial DBMS. In: Innovations in 3D Geo Information Systems (Vol. 1, pp. 1–14). Springer, Berlin, Heidelberg.
4
CHAPTER
THE RELATIONAL MODEL
CONTENTS 4.1. Introduction..................................................................................... 112 4.2. Brief History of the Relational Model............................................... 112 4.3. Terminology..................................................................................... 114 4.4. Integrity Constraints......................................................................... 125 4.5. Views............................................................................................... 128 References.............................................................................................. 132
The Creation and Management of Database Systems
112
4.1. INTRODUCTION It is estimated that annual prices for new licenses for the Relational Database Management System (RDBMS) range from $6 billion to $10 billion U.S. dollars (or $25 billion U.S. dollars when product purchases would be included). RDBMS becoming the preeminent data-processing technology being used today. This piece of software is similar to the second phase of database management systems (DBMSs), and it is built just on a relational schema that E. F. Codd suggested (1970). The relational model organizes all data in terms of its logical structure and places it inside relations (tables) (Schek & Scholl, 1986). Every connection does have a name, and the data that constitute it are organized into columns with the same name. Every row (or tuple) has exactly one value for each property. The structural model’s straightforward logical organization is one of its many compelling advantages. However, underneath this seemingly straightforward architecture lies a robust theoretical basis—something that the first iteration of DBMSs was severely missing (the system and ranked DBMSs) (Tyler & Lind, 1992).
4.2. BRIEF HISTORY OF THE RELATIONAL MODEL E. F. Codd developed the relational database model with his key article ‘A relational model for information for massive shared information banks’ (Codd, 1970). Although a set-oriented approach has been offered earlier, this study is now widely regarded as a breakthrough in database management systems (DBMSs) (Childs, 1968). The following are the aims of the relational data model (Atzeni et al., 2013): •
Allowing for a high level of data integrity. Updates internal information representations, notably changes to file organizations, record operands, or access pathways, must not influence application programs. • To provide a solid foundation for managing data interpretation, and integrity, including duplication issues. Codd’s study, in particular, developed the notion of normalized relations, essentially relations with no recurring groupings. • To make it possible for set-oriented data transformation languages to grow. Even though there was widespread interest in the relational data model, the most important research could be ascribed to three groups with quite distinct views. The experimental relational DBMS Systems R, created in
The Relational Model
113
the late 1970s by IBM’s San José Research Facility in California, was just one such (Astrahan et al., 1976). That project was created to demonstrate the usefulness of the relationship paradigm by implementing its data forms and functions. This has also proven to be an important source of knowledge on implementation issues like transaction processing, concurrency, recovery processes, query processing, data secure operation, human elements, and interface design, resulting in the publishing of numerous research articles and the provision of various prototypes. The Systems R program, in particular, resulted in two significant developments (Biber et al., 2008): •
the creation of SQL (prominent ‘S-Q-L’ or ‘See-Quel’), an extensible markup language (XML) that has subsequently are becoming the official International Organization for Standardization (ISO) and de facto standard language for relational DBMSs; During the late 1970s and early 1980s, many commercially relational DBMS solutions were developed, such as IBM’s DB2 including SQL/DS, and Oracle Company’s Oracle. The Interactive Graphics Retrieval System (INGRES) program just at the University of California in Berkeley, which was present simultaneously with that of the System R effort, was indeed the second key effort in the creation of the relational paradigm. The INGRES project included the creation of a sample RDBMS, with both the research focusing on other broad goals such as the Systems R project. This study created the specific products INGRES from Relational Technology Inc. (formerly Advanced Ingres Corporate Relational Databases from Computing Associates) as well as the Intelligence Database Machine by Britton Lee Inc., which advanced the broad understanding of relational ideas (Codd, 1979). Just at IBM UK Research Institute at Peterlee, this third contribution was the Peterlee Relational Technology Demonstrator (Todd, 1976). That project was crucial for studies into topics such as information retrieval and optimization, as well as functional extensions, which would have a more theoretical bent than that of the Systems R or INGRES initiatives. In the late 1970s and early 1980s, commercial units based upon that relational paradigm began to arise (Gardner et al., 2008). Although many don’t exactly follow the definitions of the relational data model, there are already several hundred RDBMSs including both industrial and PC systems. Microsoft Office and Visible FoxPro by Microsoft, InterBase with JDataStore from Borland, plus
The Creation and Management of Database Systems
114
R: Base from R: BASE Innovations are all instances of PC-based RDBMSs (Yu et al., 2006). Several non-relational applications now offer a hierarchical interface, regardless of the underlying architecture, due to the success of the relational paradigm. The main network DBMS, Computers Associates’ IDMS, has been renamed Advantages CA-IDMS, and now supports a relational representation of data. Computer Corporations of America’s Model 204 and Software AG’s ADABAS are two more mainframe DBMSs that provide certain relational functionality. Several relational model modifications have been suggested, such as expansions to (Smith & Kirby, 2009): • • •
capture the information from data more precisely (for particular, Codd, 1979); supporting object-oriented principles (Stonebraker and Rowe, 1986, for example); supports object-oriented ideas (for example, Involvement products and Rowe, 1986);
4.3. TERMINOLOGY The mathematical idea of a connection, which would be physically expressed as a table, underpins the relational paradigm. Codd, an educated mathematician, employed mathematical terminology, primarily set theory and deductive reasoning. The language and structural ideas of the relational data model are discussed in this section (Getoor & Sahami, 1999).
4.3.1. Relational Data Structure Relation A table having columns and rows is referred to as a relation. An RDBMS merely needs that now the database is seen as tables even by the user. This notion, however, only pertains to such a database’s structured way: the exterior and conceptual layers of an ANSI-SPARC design. It has no bearing on the database’s structural system, which may be handled using several storage types (Martin, 2013). Attributes A relation’s specified column is called an attribute. Relationships are being used in the relational model to store information that will be recorded in the database. Every relation is expressed as a twodimensional table, with personal statistics represented by the table’s rows
The Relational Model
115
and characteristics represented by the table’s columns. Attributes may be arranged in any sequence, and the relationship will remain the same because hence express the same information (Fan et al., 2019). The Branch connection, for example, shows the information on branch locations, having columns for characteristics branchNo (the branch number), road, town, and postal. Furthermore, the Staff relationship represents information on employees, including columns for properties such as staffNo (the employee number), fName, lName, job, gender, DOB (date of birth), income, and branchNo (the quantity of the division the operate associate the whole thing at). The Branch and Staff Relationships are shown in Figure 4.1. As you can see in this example, a column has values for just one property; for instance, the branchNo columns only carry numbers of existing branch locations (Codd, 2002). Domain The collection of possible values with one or even more characteristics is referred to as a domain. The relationship model’s domains are a very strong feature. A domain is used to specify each attribute in a connection. Every attribute may have its domain, and two or even more characteristics may be declared on the same domain. Figure 4.2 depicts the domains for a few of the Division and Staff relationship properties (Sommestad et al., 2010). It’s worth noting there will usually be values inside a domain that don’t show as entries in the appropriate attribute for anyone at one moment. The domain idea is significant because it enables the user to specify the meanings and origin of values which attributes may carry in one single location. As a consequence, the system has more information at its disposal when performing a relational action, and conceptually erroneous activities may be avoided. Although the domain specifications for both of these properties are character sequences, this is not logical to contrast a street address with a contact number. The average rent on a residence and also the number of months it has been rented, on either hand, have distinct domains (the first a money value, the other a numerical value), yet it is still allowed to do so (Schek & Scholl, 1986).
116
The Creation and Management of Database Systems
Figure 4.1. Relationships between the branch and also the staff. Source: https://www.researchgate.net/figure/Relationship-between-the-number-of-staff-and-network-range_fig1_266068320.
Figure 4.2. Some Branch and Staff Relations qualities have their domains. Source: https://opentextbc.ca/dbdesign01/chapter/chapter-8-entity-relationship-model/.
The Relational Model
117
Two parameters from such domains multiplied Since these two examples demonstrate, implementing domains completely is difficult, and as a consequence, many RDBMSs don’t support them (Horsburgh et al., 2008). Tuple: A tuple is just a relation’s row. The row or tuples inside this database are the components of a relation. Every row within the Branch relationship has four variables, per characteristic. Tuples may be arranged in any sequence, and the connection will remain the same because hence communicate the same information. The construction of a relation, as well as a definition of the domains or any other limits on potential values, are referred to it by its real intention, and it is normally fixed until the definition of the connection is updated to incorporate more characteristics. The tuples are referred to as a relation’s extensions (or condition), that vary over time (Vassiliadis & Sellis, 1999). Degree: The number of qualities in a connection determines its degree. Figure 4.1 shows a branch connection with four qualities, or grade four. This indicates that each table row is just a four-tuple with four factors. An integer arithmetic relation, also known as a one-tuple, is a connection with just one variable and has a degree of one. Binary refers to a relationship with two qualities, ternary refers to a relationship with three aspects, and n-ary refers to a relationship with more than three components. A relation’s level is a characteristic of the relation’s real intention (Johannesson, 1994). Cardinality: A relation’s cardinality has been the number of tuples it includes. The cardinality of a connection, on the other hand, refers to the number of tuples within the relation, which varies when tuples were added or removed. The cardinality of a connection is a feature of its extension that is defined by the specific instance of the connection anywhere at a given time. Finally, we’ll look at what a database system is. Database with a relational structure A set of normalized relations, each having its relation name. A database table is made up of properly formed relationships. This suitability is referred to as normalization (Gougeon, 2010). Different terminologies the structural model’s vocabulary may be somewhat perplexing. Two sets of words have been introduced. In reality, the third set of words is frequently used: a relation is called a file, tuples are called records, and characteristics are called fields. This nomenclature comes from reality so each relationship may be physically stored in a file by the RDBMS. The parameters for the relational data model are summarized in Table 4.1 (Pierce & Lydon, 2001).
118
The Creation and Management of Database Systems
Table 4.1. Alternate Names for Terms in Relational Models
4.3.2. Mathematical Relations To grasp the fundamental meaning of the word relation, we must first study several mathematical ideas. Assume we possess two sets, D1 and D2, with D1 equaling 2, 4 and D2 equaling 1, 3, 5. The set containing all ordered pairs in which the first component is a membership of D1 and the second component is a part of D2 seems to be the Cartesian combination of all these sets, denoted D1 D2. Finding all combinations of components first from D1 and the secondly from D2 is another model that explains this. In our scenario, we have the following: D1 × D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)} (Martin, 2013). A connection would be any subset of this Product of two. We could, for example, create a relationship. R such that: R = {(2, 1), (4, 1)} By specifying a criterion for their choosing, we might determine that ordered pairings would be in the connection. For instance, if we see that R comprises all organized pairs with a 1 also as a second clause, we may express R as (Kazemi & Poole, 2018): R = {(x, y) | x ∈D1, y ∈D2, and y = 1}
We may create another S connection using these identical sets, where the first member is almost always twice any second. As a result, we should write S as (Armstrong, 1974): S = {(x, y) | x ∈D1, y ∈D2, and x = 2y} or, in this example, S = {(2, 1)}
Because the Cartesian combination only has one ordered list that meets this criterion (Cozman & Mauá, 2015). The concept of a relation may readily be extended to three sets. Let’s say there are three sets: D1, D2, and D3. Every set among all sorted triples
The Relational Model
119
from the first component from D1, the second component from D2, as well as the third component from D3, seems to be the Cartesian products D1 D2 D3 of all these three sets. A connection is any collection of such a Cartesian product. Consider the following scenario (Mereish & Poteat, 2015): D1 = {1, 3} D2 = {2, 4} D3 = {5, 6} D1 × D2 × D3 = {(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (3, 2, 5), (3, 2, 6), (3, 4, 5), (3, 4, 6)} A connection is any set of all these organized triples. Using n domains, we may expand the three different sets and create a generic relation. Let D1, D2., Dn be the number of sets. Their Cartesian output is as follows: D1 × D2 ×... × Dn = {(d1, d2..., dn) | d1 ∈D1, d2 ∈D2..., dn ∈Dn}
and is most often written as (Gadia, 1988):
A relationship mostly on n subsets is any collection of n-tuples from such a Product of two. It’s worth noting because we have to provide the sets, or domains, from whom we select values when constructing these relationships (Smets & Jarzabkowski, 2013; Mesquita et al., 2008).
4.3.3. Database Relations We may create a connection schema by applying the following notions to databases. Relationship A specified relation is a collection of characters and domain name pairings that describe a relationship. Schema (Pecherer, 1975). Let A1, A2, . . . , An be qualities with areas D1, D2, . . . , Dn. Formerly the usual {A1:D1, A2:D2,. . . , An:Dn} is a relational model. A relation R is a relationship specified by such a relation schema. S is a collection of mappings between attribute names and their domains. As a result, relation R is a collection of n-tuples (Brilmyer, 2018): (A1:d1, A2:d2, . . . , An:dn) such that d1 ∈D1, d2 ∈D2, . . . , dn ∈Dn
The n-elements tuples each have a property and values for just that property. Whenever we express a relationship as both a table, we usually include the note of specific column headers and also the tuples and rows of
The Creation and Management of Database Systems
120
the form (d1, d2…, dn), where every other value comes from the relevant domain. In this approach, a relationship in the relational data model may be thought of as any subset of both the Product of two of the property domains. A table is just a depiction of such a relationship in physical form (Aghili Ashtiani & Menhaj, 2014). The Branch relationship in Figure 4.1, for example, contains characteristics branchNo, road, municipality, and postal, with its domains. Every subset of a Product of two of the domains, or indeed any minimum of four where the first component would be from the domains BranchNumbers, the next from the domain StreetNames, and so on, is referred to as the Branch relationship. The following is also one of the four tuples (Blau & McGovern, 2003): {(B005, 22 Deer Rd, London, SW1 4EH)} or more properly: {(branchNo: B005, street: 22 Deer Rd, city: London, postcode: SW1 4EH)} This is referred to as a relationship instance. The Branch table is a handy means of putting down every one of the four tuples that make up the connection at a given point in time, hence why tables rows in the relational data model are referred to as tuples. A database system has a structure in the same manner that a relationship does. Schema for database systems A collection of related schemas, each with its unique name (Cadiou, 1976). If R1, R2, . . . , Rn We may describe the relational data structure, or generally relationship schema, if there exist a collection of connection schemas., R, as: R = {R1, R2, . . . , Rn}
4.3.4. Properties of Relations The qualities of a relation are as follows (Ehrhard, 2012): • • • • • • •
The relationship has a unique name that differs from the names of all other relations in the relational model. There is only one atomic (individual) value in every cell of both relationships. Each characteristic is given a unique name. An attribute’s values are smooth from the same field.; There are no duplicate tuples; every tuple is unique. The ordering of the qualities is irrelevant; The ordering of tuples has no theoretical importance. (However,
The Relational Model
121
in fact, the ordering may affect the speed with which tuples are accessed.) Examine the Branch connection in Figure 4.1 once again to see what these constraints represent. It is unlawful to keep two postal codes for just a unique branch office inside a single cell because each cell must only hold one value. In all other words, there are no recurring groupings in connections. A relation is normalized if it meets this characteristic (Vyawahare et al., 2018). The qualities of the relation are represented by the columns displayed at the tops of the columns. We must not accept a postal value inside this column since the values inside the branchNo property were from the BranchNumbers domain. In a relationship, there can’t be any duplicated tuples. The row (B005, 22 Deer Rd, London, SW1 4EH) occurs just once, for example (Hull, 1986). We may swap columns if the attribute term is moved including the attributes. Even though it makes good sense to retain the address components in the typical sequence for reading, the table could reflect the same connection if the city property was placed even before the postcode property. Tuples may also be swapped, therefore the records of branching B005 and B004 can be swapped and the relationship will remain the same (Schumacher & Fuchs, 2012). The majority of the qualities provided for relations are derived from mathematical equations (Lyons-Ruth et al., 2004): •
•
• •
Each member in each tuple remained single-valued whenever we computed the Cartesian combination of sets containing simple, single-valued components like integers. In the same way, every cell in such a relation has precisely one value. A mathematical relationship, on the other hand, does not need to be normalized. To keep the relational model simple, Codd decided to exclude recurring groups. The set, or domains, upon which position is specified determines the available values for that position in such a connection. Every column in a table must have values from the very same characteristic domain. There are no components in a set that are duplicated. Likewise, there are still no duplicated tuples in a relationship. The position of the items in a relationship has no bearing since it is a set. As a result, the position of tuples in a relationship is irrelevant.
The Creation and Management of Database Systems
122
In a mathematical relationship, although, the arrangement of the items in a tuple is crucial. The item set (1, 2), for example, is substantially different from the item set (1, 2). (2, 1). This isn’t true for relationships inside the relational data model, which stipulates that now the sequence of characteristics is irrelevant (Ramanujam et al., 2009). Because the column heads specify which property the value refers to, this is the case. This implies that now the position of column headers in the addition is irrelevant, however, the order of components inside the tuples of the extensions must reflect the order of the attributes because once the relation’s design is determined (Greenyer & Kindler, 2010).
4.3.5. Relational Keys There seem to be no duplicated tuples inside a connection, as indicated before. As a result, we must be able to recognize one or more qualities (referred to as structural keys) that uniquely define every tuple in such a relation. The vocabulary for relationship keys is explained in this section. Superkey A property or group of properties that identifies a specific tuple inside a relation (Chamberlin, 1976). Every tuple inside a relation is uniquely identified by a superkey. Furthermore, a superkey may have other properties that aren’t required for a proper identifier, and we’re looking for superkeys with just the bare minimum of qualities needed for a proper identifier (Kloesgen & Zytkow, 1994). Candidate Within the key relationship, a superkey because no suitable subset is a superkey. A connection R’s candidate key, K, has two characteristics: •
Uniqueness: The values of K within every R tuple properly identify every tuple; and • Irreducibility: The originality property does not apply to any suitable subset of K. A relationship may have several primary keys. Whenever a key has much more than one property, it is referred to as a composite key (Vinciquerra, 1993). Take a look at Figure 4.1 for an example of a branch relationship. We may identify various branch offices based on a city’s worth (for sample, London has two division workplaces). This property can’t be used as a candidate key. On either hand, since DreamHome assigns a distinctive branch number to every branch office, we could only derive one tuple from
The Relational Model
123
a branching number value, branchNo, hence branchNo is a candidate key. Likewise, the postcode is a possible key to this relationship (He & Naughton, 2008). Considering the relationship Observing, which includes information about properties that customers have seen. A customer number (clientNo), a commodity number (propertyNo), a viewing date (viewDate), and, possibly, a remark make up the connection (comment). There could be numerous related viewings for various properties given a customer number, clientNo. Likewise, given a product number, propertyNo, this business may have been seen by numerous consumers (Hofmeyr, 2021). As a result, clientNo or propertyNo cannot be used as a contender key by themselves. However, since the pairing of clientNo and propertyNo can only identify one tuple, clientNo, and propertyNo combined to provide the (composite) unique key again for the Viewing connection. We might add viewDate towards the composite key if we ever need to account for the chance that a client would view a feature multiple times. However, we believe that it’s not required (Ceruti, 2021). It’s worth noting that a relationship instance can’t be used to verify that a feature or set of properties is a primary key. The absence of duplication for values that emerge at a given point in time does not rule out the possibility of duplication. The existence of duplication in an example, on the other hand, might be used to demonstrate that certain feature combinations are not relevant facts (Dong & Su, 2000). To determine if duplicates are conceivable, we must first understand the real world’s interpretation of the feature(s) in question. We could only be sure that a feature pairing is a potential key if we use these semantic features. For instance, based on data in Figure 4.3, we could believe that lName, the worker’s surname, would be a great candidate key again for Staff relationships. Even though the Staff relation has only one value of ‘White,’ a subsequent member of the staff with both the surname ‘White’ could join the firm, negating the selection of lName as just a candidate key (Paredaens et al., 2012). Primary Key: The possible key for identifying individual tuples inside a connection (Bucciarelli et al., 2009). Because there are no duplicated tuples in such a relationship, each row may be uniquely identified. The main key is always present in a connection. In the worst scenario, the whole collection of attributes might be used as the unique identifier, but most of the time, a smaller selection of attributes is enough to identify the tuples. Alternative keys are alternative keys which are not chosen to be the main key. If we use branchNo as the main key in the Branch relationship, postcode becomes an alternative key. Because there is
124
The Creation and Management of Database Systems
just one candidate key for the Watching relation, which consists of clientNo and propertyNo, these characteristics would naturally constitute the primary key (Allen & Terry, 2005). Foreign Key: A characteristic (or combination of characteristics) in one connection that corresponds to the primary keys of another (potentially identical) relation (Selinger et al., 1989). Whenever an attribute occurs in several relations, it typically indicates a connection between the two interactions’ tuples. For example, the presence of branchNo both in Branch and Staff relationships fields is intentional, since it connects each branch to the specifics of the employees that work there. BranchNo is the principal key in the Branch relation (Reilly, 2009). The branchNo property, on the other hand, is used in the Staff relationship to link employees to the branch office where they work. BranchNo is just a primary key in the Staff connection. The attribute branchNo with in Staff relation is said to be aimed towards the main key attribute Branch with in-home relation. As we’ll see from the following chapter, such common features play a crucial role in manipulating data (Brookshire, 1993).
4.3.6. Representing Relational Database Schemas Any amount of normalized relations may be found in a database system. A portion of the DreamHome research report’s relational structure is as follows (Pirotte, 1982):
Figure 4.3. A DreamHome rental database instance. Source: https://www.transtutors.com/questions/create-the-dreamhome-rentaldatabase-schema-defined-in-section-3–2-6-and-insert-the--2009490.htm.
The Relational Model
125
Division (branchNo, path, urban, postcode) Staff (staffNo, fName, lName, location, gender, DOB, income, branchNo) PropertyForRent (propertyNo, way, town, postcode, kind, rooms, payment, ownerNo, staffNo, branchNo) Customer (fName, lName, clientNo, telNo, prefType, maxRent) PrivateOwner (fName, ownerNo, lName, address, telNo) Observing (propertyNo, clientNo, view date, remark) Record-keeping (branchNo, clientNO, staffNo, date joined) The description of the relationship, followed by attributes in parenthesis, is a standard pattern for describing a relation structure. The principal key is often highlighted. The conceptual framework, often known as the logical model, is just a collection of all database designs. This relational structure is seen in Figure 4.3 (Bagley, 2010).
4.4. INTEGRITY CONSTRAINTS The structural aspect of a relational model was explored in the preceding section. A database schema also contains two additional components: a manipulating section that specifies the sorts of operations that may be performed just on data, as well as a set of referential integrity that ensures that data is correct. We’ll talk about relational integrity restrictions in this part, then relational manipulating operations in the following chapter (Calì et al., 2002). We’ve previously seen such an instance of an authenticity constraint: because each attribute does have a domain, there are limitations (known as domain restrictions) that limit the range of values that may be used for related attributes. There are two key integrity criteria, which are limits or limitations that are applied to all database servers. Integrity and data consistency are the two main rules of the relational data model. The multitude and generic constraints, that we present, are two further forms of referential integrity. It’s important to grasp the idea of nulls before we can establish entity and integrity constraints (Tansel, 2004).
126
The Creation and Management of Database Systems
4.4.1. Nulls Null That value contains the value for a property that is presently unknown or irrelevant to this tuple (Calı et al., 2004). The rational value ‘undefined’ might be interpreted as null. It might indicate that even a value isn’t relevant to a certain tuple, and that could simply indicate that neither value has been provided yet. Nulls are a means to cope with missing or unusual data (Zimmermann, 1975). A null, on the other hand, is not like a numerical value of zero or even a text string containing spaces; although zeros and areas are values, a null denotes the lack of one. As a result, nulls must be handled differently from those other values. Many publications use the phrase ‘null value,’ however since a null would not be a value, but rather the lack of one, the word is outdated (Cockcroft, 1997). The remark attribute, for instance, may remain unspecified in the Observing relation illustrated in Figure 4.3 until the possible renter has viewed the home and provided his or her remark to the organization. Lacking nulls, bogus data must be introduced to indicate this state, or extra characteristics must be added which may or not be significant to the consumer (Graham & Urnes, 1992). In that example, we might use the integer ‘1’ to indicate a null remark. Additionally, we might add a new property to the Observing relation called has Comment Stayed Supplied, which includes a Y (Yes) if a remark has already been provided and an N (No) without. Both of these methods might be perplexing to the consumer (Rasdorf et al., 1987). Because the relationship model is built on that first conditional calculus, which would be a two-valued and Boolean logic – only one values that may be used are right or wrong – nulls might pose implementation issues. Using nulls forces us to use higher-valued reasoning, like three- and four-valued reasoning (Codd, 1986, 1987, 1990) (Yazıcı & Sözat, 1998). The use of nulls within a relational paradigm is a point of contention. Nulls were subsequently considered an important feature of the concept by Codd (Codd, 1990). Others believe that this method is wrong, arguing that the incomplete information issue is not completely understood, that neither suitable solution has already been developed, but that, as a result, nulls should not be included in any relational data model (understand, for example, Date, 1995) (Grefen & Apers, 1993). We’ve arrived at a point where we can specify the two relationships to evaluate and analyze.
The Relational Model
127
4.4.2. Entity Integrity The main keys of basic relations are subject to the first consistency rule. A foundation relation is currently defined as a relationship that relates to an object within the conceptual data model. We give a more detailed explanation (Demuth & Hussmann, 1999). The authenticity of entities A main key characteristic cannot be null in a basic relation. A cardinality, by design, is a single identity used to identify individual tuples. This indicates that no subgroup of the main key is adequate to identify tuples uniquely. Allowing a null for just any component of referential integrity implies not all characteristics are required to differentiate between tuples, which is contrary to the basic key’s specification (Yang & Wang, 2001; Vermeer & Apers, 1995). We shouldn’t be able to put a tuple through into the Branch relationship with such a null again for branchNo property, for instance, since branchNo is the main key of both the Branch connection. Considering the composite’s unique identifier of the Observing relation, which consists of the requests of customers (clientNo) and the object number (propertyNo) (propertyNo). We shouldn’t be allowed to put a null for such clientNo property, a null again for propertyNo characteristic, or nulls both to characteristics into the Observing relationship (Motik et al., 2007). We’d find some flaws in this principle if we looked into it further. Initially, why would the regulation only implement foreign keys but not applicant keys, which further uniquely defines tuples? Second, why does this rule only applies to base connections? Take into account the query ‘Review all remarks from showings,’ which uses the data first from Observing relation in Figure 4.3 (Mcminn et al., 2015). This creates a unary relation with the ascribe comment as the key. This characteristic is required to be a unique identifier by description, but it includes nulls (consistent with the observation on PG36 and PG4 by consumer CR56). Because this isn’t a base relation, the primary key can be null. So many efforts were made to rewrite this principle (see, for example, Codd, 1988; Date, 1990) (Qian, 1994).
4.4.3. Referential Integrity Foreign keys fall under the second integrity rule. Context-sensitive If one connection has a primary key, the foreign key then must verify the authenticity of a candidate’s key-value pair of a certain tuple with an in-home relationship or be completely null (Fan & Siméon, 2003).
128
The Creation and Management of Database Systems
For instance, branchNo with in Staff relationship is a primary key that refers to the Branch household relation’s branchNo property. This should not be allowed to create a worker record with the branch code B025, for instance, unless the Branch relationship already has a reference with both the branch codes B025. Furthermore, we should be able to generate a newly employed record with such null branch numbers to account for the case when a new employee joins the organization but is not yet allocated to a specific branch office (Hammer & McLeod, 1975).
4.4.4. General Constraints Universal Optional rules set by consumers or system administration specify or limit some part of the business in a database. Additionally, users may set additional limitations that the information must meet. For example, if indeed the number of people who may function at a local branch is limited to 20, the person shall be able to define this general limitation and expect the DBMS to implement it (Golfarelli et al., 1999). Whereas if the number of workers presently allocated to a branch is 20, this shouldn’t be able to add a new employee to the Employees relation in this situation. Regrettably, the extent of support for generic restrictions varies considerably from one system to the next (Hadzilacos & Tryfona, 1992).
4.5. VIEWS Based on the information provided in the design of the databases as it seems to be a specific user within a three-level ANSI-SPARC system. The term ‘view’ has a somewhat different connotation in the relational paradigm. A viewpoint, while being the whole external representation of a user’s perspective, is a virtualized or derived relationship: a connection which does not exist in and of itself but may well be continuously formed from one or even more basic relationships. As a result, an external model might include both underlying (conceptual-level) linkages and perspectives generated from them. Aspects of relational database systems are briefly discussed in this section. We look at views in further depth and demonstrate how to design and utilize them in SQL (Cosmadakis & Papadimitriou, 1984).
4.5.1. Terminology Base relationships are the kind of relationships we’ve been discussing in this chapter thus far. Basis A named relationship that tuples are saved in the
The Relational Model
129
database, and which corresponds to an object within the conceptual data model (Dayal & Bernstein, 1982). Views may be defined in terms of basic relationships: View The dynamical outcome with one or maybe more related actions on basic relations to build another relation. A perspective is a synthetic relationship which does not have to exist in the system but may be created when a user requests it (Braganholo et al., 2003). A viewpoint is a relationship that looks to existing towards the user and may be modified as though it were a basic relationship, which doesn’t have to be saved in the same way as basis relationships do (though its description is stowed in the scheme set). An inquiry solely on a single or more basic relation is described as the contents of such a display (Pirahesh et al., 1994). Any actions taken on the view are instantly converted into actions taken on the relationships at which it is generated. Modifications towards the basic relations that influence the view are instantly represented in the view, making views dynamic. Modifications to the underlying relationships are performed when users create authorized updates to the display. We’ll go through the function of viewpoints in this part, as well as some of the limitations that come with using views to make adjustments. The discussion of how perspectives are created and processed, on the other hand, will be postponed (Scholl & Schek, 1990).
4.5.2. Purpose of Views For numerous reasons, the view method is desirable (Dayal & Bernstein, 1982): •
•
•
By concealing sections of the dataset from some users, it offers a robust and adaptable security measure. Consumers are unaware that any properties or tuples which are not visible in the display exist. It enables consumers to connect data in a manner that is tailored to their requirements, allowing the same data to be viewed by several users in various ways-. It may make complicated actions on base relationships easier. Consumers may now conduct more basic actions just on views, which will be converted mostly by DBMS into similar operations on the joined, for instance, whenever a view is specified as conjunction (join) of two variables.
The Creation and Management of Database Systems
130
A view must be built in such a way that it supports the user’s existing external framework. Consider the following scenario (Clifford & Tansel, 1985): •
A user may need Branch tuples including the identities of administrators and the other Branch properties. The Branch connection is combined with a limited variant of the Employee relationship in which the staff role is ‘Manager’ to generate this view. • Staff tuples lacking the pay property should be visible to certain employees. • Parameters can be updated or even the order in which they appear can be modified. For instance, a user may be used to referring to the branchNo property of branches by its complete name. That column headline may be seen via Branch Numbers. • Some employees should only have access to land records for such properties they oversee. Even though each of these situations shows that views give logical data isolation, viewpoints provide that more substantial form of logical and physical autonomy that aids with conceptual model restructuring. Current customers, for instance, may be ignorant of the presence of a new attribute introduced to a relationship if their points of view are configured to omit it. If an established relationship is reorganized or broken up, a perspective may be built to allow users to access their previous views (Furtado & Casanova, 1985).
4.5.3. Updating Views All changes to a base relationship must be displayed quickly in all perspectives that use it. In the same way, whenever a view is changed, the subbase relationship should be modified as well. The sorts of changes that may be done using views, however, are limited. The requirements that most systems use to decide if an update via a view is permitted are summarized below (Pellicano et al., 2018): •
•
Modifications are permitted through a view described by a simple query utilizing a single system relation and including either the basic relation’s main key or a candidate’s key. Views with numerous base relationships are not permitted to be updated.
The Relational Model
•
131
Modifications are not permitted in views that perform aggregate or grouping actions. There are three types of views: those that are technically not upgradeable, those that are potentially upgradeable, and those that are partly updatable. Furtado and Casanova provide a survey on upgrading relational perspectives (1985) (Barsalou et al., 1991).
132
The Creation and Management of Database Systems
REFERENCES 1.
Aghili, A. A., & Menhaj, M. B., (2014). Construction and applications of a modified fuzzy relational model. Journal of Intelligent & Fuzzy Systems, 26(3), 1547–1555. 2. Allen, S., & Terry, E., (2005). Understanding relational modeling terminology. Beginning Relational Data Modeling, 1, 57–87. 3. Armstrong, W. W., (1974). Dependency structures of data base relationships. In: IFIP congress (Vol. 74, pp. 580–583). 4. Atzeni, P., Jensen, C. S., Orsi, G., Ram, S., Tanca, L., & Torlone, R., (2013). The relational model is dead, SQL is dead, and I don’t feel so good myself. ACM SIGMOD Record, 42(2), 64–68. 5. Bagley, S. S., (2010). Students, teachers and alternative assessment in secondary school: Relational models theory (RMT) in the field of education. The Australian Educational Researcher, 37(1), 83–106. 6. Barsalou, T., Siambela, N., Keller, A. M., & Wiederhold, G., (1991). Updating relational databases through object-based views. ACM SIGMOD Record, 20(2), 248–257. 7. Biber, P., Hupfeld, J., & Meier, L. L., (2008). Personal values and relational models. European Journal of Personality, 22(7), 609–628. 8. Blau, H., & McGovern, A., (2003). Categorizing unsupervised relational learning algorithms. In: Proceedings of the Workshop on Learning Statistical Models from Relational Data, Eighteenth International Joint Conference on Artificial Intelligence (Vol. 1, pp. 3–9). 9. Braganholo, V. P., Davidson, S. B., & Heuser, C. A., (2003). On the updatability of XML views over relational databases. In: WebDB (Vol. 1, pp. 31–36). 10. Brilmyer, G., (2018). Archival assemblages: Applying disability studies’ political/relational model to archival description. Archival Science, 18(2), 95–118. 11. Brookshire, R. G., (1993). A relational database primer. Social Science Computer Review, 11(2), 197–213. 12. Bucciarelli, A., Ehrhard, T., & Manzonetto, G., (2009). A relational model of a parallel and non-deterministic λ-calculus. In: International Symposium on Logical Foundations of Computer Science (Vol. 1, pp. 107–121). Springer, Berlin, Heidelberg.
The Relational Model
133
13. Cadiou, J. M., (1976). On semantic issues in the relational model of data. In: International Symposium on Mathematical Foundations of Computer Science (Vol. 1, pp. 23–38). Springer, Berlin, Heidelberg. 14. Calı, A., Calvanese, D., De Giacomo, G., & Lenzerini, M., (2004). Data integration under integrity constraints. Information Systems, 29(2), 147–163. 15. Calì, A., Calvanese, D., Giacomo, G. D., & Lenzerini, M., (2002). Data integration under integrity constraints. In: International Conference on Advanced Information Systems Engineering (Vol. 1, pp. 262–279). Springer, Berlin, Heidelberg. 16. Ceruti, M. G., (2021). A review of database system terminology. Data Management, 13–31. 17. Chamberlin, D. D., (1976). Relational data-base management systems. ACM Computing Surveys (CSUR), 8(1), 43–66. 18. Clifford, J., & Tansel, A. U., (1985). On an algebra for historical relational databases: Two views. ACM SIGMOD Record, 14(4), 247– 265. 19. Cockcroft, S., (1997). A taxonomy of spatial data integrity constraints. GeoInformatica, 1(4), 327–343. 20. Codd, E. F., (1979). Extending the database relational model to capture more meaning. ACM Transactions on Database Systems (TODS), 4(4), 397–434. 21. Codd, E. F., (2002). A relational model of data for large shared data banks. In: Software Pioneers (Vol. 1, pp. 263–294). Springer, Berlin, Heidelberg. 22. Cosmadakis, S. S., & Papadimitriou, C. H., (1984). Updates of relational views. Journal of the ACM (JACM), 31(4), 742–760. 23. Cozman, F. G., & Mauá, D. D., (2015). Specifying probabilistic relational models with description logics. Proceedings of the XII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC) (Vol. 1, pp. 4–8). 24. Dayal, U., & Bernstein, P. A., (1982). On the correct translation of update operations on relational views. ACM Transactions on Database Systems (TODS), 7(3), 381–416. 25. Dayal, U., & Bernstein, P. A., (1982). On the updatability of network views—Extending relational view theory to the network model. Information Systems, 7(1), 29–46.
134
The Creation and Management of Database Systems
26. Demuth, B., & Hussmann, H., (1999). Using UML/OCL constraints for relational database design. In: International Conference on the Unified Modeling Language (Vol. 1, pp. 598–613). Springer, Berlin, Heidelberg. 27. Dong, G., & Su, J., (2000). Incremental maintenance of recursive views using relational calculus/SQL. ACM SIGMOD Record, 29(1), 44–51. 28. Ehrhard, T., (2012). The Scott model of linear logic is the extensional collapse of its relational model. Theoretical Computer Science, 424, 20–45. 29. Fan, W., & Siméon, J., (2003). Integrity constraints for XML. Journal of Computer and System Sciences, 66(1), 254–291. 30. Fan, X., Li, B., Li, C., SIsson, S., & Chen, L., (2019). Scalable deep generative relational model with high-order node dependence. Advances in Neural Information Processing Systems, 1, 32. 31. Furtado, A. L., & Casanova, M. A., (1985). Updating relational views. Query Processing in Database Systems, 127–142. 32. Gadia, S. K., (1988). A homogeneous relational model and query languages for temporal databases. ACM Transactions on Database Systems (TODS), 13(4), 418–448. 33. Gardner, D., Goldberg, D. H., Grafstein, B., Robert, A., & Gardner, E. P., (2008). Terminology for neuroscience data discovery: Multi-tree syntax and investigator-derived semantics. Neuroinformatics, 6(3), 161–174. 34. Getoor, L., & Sahami, M., (1999). Using probabilistic relational models for collaborative filtering. In: Workshop on Web Usage Analysis and User Profiling (WEBKDD’99) (Vol. 1, pp. 1–6). 35. Golfarelli, M., Maio, D., & Rizzi, S., (1999). Vertical fragmentation of views in relational data warehouses. In: SEBD (Vol. 1, pp. 19–33). 36. Gougeon, N. A., (2010). Sexuality and autism: A critical review of selected literature using a social-relational model of disability. American Journal of Sexuality Education, 5(4), 328–361. 37. Graham, T. N., & Urnes, T., (1992). Relational views as a model for automatic distributed implementation of multi-user applications. In: Proceedings of the 1992 ACM Conference on Computer-Supported Cooperative Work (Vol. 1, pp. 59–66). 38. Greenyer, J., & Kindler, E., (2010). Comparing relational model transformation technologies: Implementing query/view/transformation
The Relational Model
39.
40.
41.
42.
43. 44.
45. 46.
47.
48. 49.
135
with triple graph grammars. Software & Systems Modeling, 9(1), 21– 46. Grefen, P. W., & Apers, P. M., (1993). Integrity control in relational database systems—An overview. Data & Knowledge Engineering, 10(2), 187–223. Hadzilacos, T., & Tryfona, N., (1992). A model for expressing topological integrity constraints in geographic databases. In: Theories and Methods of Spatio-Temporal Reasoning in Geographic Space (Vol. 1, pp. 252–268). Springer, Berlin, Heidelberg. Hammer, M. M., & McLeod, D. J., (1975). Semantic integrity in a relational data base system. In: Proceedings of the 1st International Conference on Very Large Data Bases (Vol. 1, pp. 25–47). He, J. S. K. T. G., & Naughton, C. Z. D. D. J., (2008). Relational databases for querying XML documents: Limitations and opportunities. In: Proceedings of VLDB (Vol. 1, pp. 302–314). Hofmeyr, J. H. S., (2021). A biochemically-realizable relational model of the self-manufacturing cell. Biosystems, 207, 104463. Horsburgh, J. S., Tarboton, D. G., Maidment, D. R., & Zaslavsky, I., (2008). A relational model for environmental and water resources data. Water Resources Research, 44(5), 33–38. Hull, R., (1986). Relative information capacity of simple relational database schemata. SIAM Journal on Computing, 15(3), 856–886. Johannesson, P., (1994). A method for transforming relational schemas into conceptual schemas. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering (Vol. 1, pp. 190–201). IEEE. Kazemi, S. M., & Poole, D., (2018). Bridging weighted rules and graph random walks for statistical relational models. Frontiers in Robotics and AI, 5, 8. Kloesgen, W., & Zytkow, J. M., (1994). Machine discovery terminology. In: KDD Workshop (Vol. 1, p. 463). Lyons-Ruth, K., Melnick, S., Bronfman, E., Sherry, S., & Llanas, L., (2004). Hostile-helpless relational models and disorganized attachment patterns between parents and their young children: Review of research and implications for clinical work. Attachment Issues in Psychopathology and Intervention, 1, 65–94.
136
The Creation and Management of Database Systems
50. Martin, J. J., (2013). Benefits and barriers to physical activity for individuals with disabilities: A social-relational model of disability perspective. Disability and Rehabilitation, 35(24), 2030–2037. 51. Mcminn, P., Wright, C. J., & Kapfhammer, G. M., (2015). The effectiveness of test coverage criteria for relational database schema integrity constraints. ACM Transactions on Software Engineering and Methodology (TOSEM), 25(1), 1–49. 52. Mereish, E. H., & Poteat, V. P., (2015). A relational model of sexual minority mental and physical health: The negative effects of shame on relationships, loneliness, and health. Journal of Counseling Psychology, 62(3), 425. 53. Mesquita, L. F., Anand, J., & Brush, T. H., (2008). Comparing the resource‐based and relational views: Knowledge transfer and spillover in vertical alliances. Strategic Management Journal, 29(9), 913–941. 54. Motik, B., Horrocks, I., & Sattler, U., (2007). Adding integrity constraints to OWL. In: OWLED (Vol. 258, pp. 3–9). 55. Paredaens, J., De Bra, P., Gyssens, M., & Van, G. D., (2012). The Structure of the Relational Database Model (Vol. 17, pp. 22–27). Springer Science & Business Media. 56. Pecherer, R. M., (1975). Efficient evaluation of expressions in a relational algebra. In: ACM Pacific (Vol. 75, pp. 44–49). 57. Pellicano, M., Ciasullo, M. V., Troisi, O., & Casali, G. L., (2018). A journey through possible views of relational logic. In: Social Dynamics in a Systems Perspective (Vol. 1, pp. 195–221). Springer, Cham. 58. Pierce, T., & Lydon, J. E., (2001). Global and specific relational models in the experience of social interactions. Journal of Personality and Social Psychology, 80(4), 613. 59. Pirahesh, H., Mitschang, B., Südkamp, N., & Lindsay, B., (1994). Composite-object views in relational DBMS: An implementation perspective. Information Systems, 19(1), 69–88. 60. Pirotte, A., (1982). A precise definition of basic relational notions and of the relational algebra. ACM SIGMOD Record, 13(1), 30–45. 61. Qian, X., (1994). Inference channel-free integrity constraints in multilevel relational databases. In: Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy (Vol.1, pp. 158–167). IEEE.
The Relational Model
137
62. Ramanujam, S., Gupta, A., Khan, L., Seida, S., & Thuraisingham, B., (2009). R2D: Extracting relational structure from RDF stores. In: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (Vol. 1, pp. 361–366). IEEE. 63. Rasdorf, W. J., Ulberg, K. J., & Baugh, J. W., (1987). A structurebased model of semantic integrity constraints for relational data bases. Engineering with Computers, 2(1), 31–39. 64. Reilly, C., (2009). The concept of knowledge in KM: A relational model. Electronic Journal of Knowledge Management, 7(1), 145–154. 65. Schek, H. J., & Scholl, M. H., (1986). The relational model with relation-valued attributes. Information Systems, 11(2), 137–147. 66. Scholl, M. H., & Schek, H. J., (1990). A relational object model. In: International Conference on Database Theory (Vol. 1, pp. 89–105). Springer, Berlin, Heidelberg. 67. Schumacher, R. F., & Fuchs, L. S., (2012). Does understanding relational terminology mediate effects of intervention on compare word problems?. Journal of Experimental Child Psychology, 111(4), 607–628. 68. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G., (1989). Access path selection in a relational database management system. In: Readings in Artificial Intelligence and Databases (Vol. 1, pp. 511–522). Morgan Kaufmann. 69. Smets, M., & Jarzabkowski, P., (2013). Reconstructing institutional complexity in practice: A relational model of institutional work and complexity. Human Relations, 66(10), 1279–1309. 70. Smith, C. A., & Kirby, L. D., (2009). Putting appraisal in context: Toward a relational model of appraisal and emotion. Cognition and Emotion, 23(7), 1352–1372. 71. Sommestad, T., Ekstedt, M., & Johnson, P., (2010). A probabilistic relational model for security risk analysis. Computers & Security, 29(6), 659–679. 72. Tansel, A. U., (2004). Temporal data modeling and integrity constraints in relational databases. In: International Symposium on Computer and Information Sciences (Vol. 1, pp. 459–469). Springer, Berlin, Heidelberg.
138
The Creation and Management of Database Systems
73. Tyler, T. R., & Lind, E. A., (1992). A relational model of authority in groups. In: Advances in Experimental Social Psychology (Vol. 25, pp. 115–191). Academic Press. 74. Vassiliadis, P., & Sellis, T., (1999). A survey of logical models for OLAP databases. ACM SIGMOD Record, 28(4), 64–69. 75. Vermeer, M. W., & Apers, P. M., (1995). Object-oriented views of relational databases incorporating behavior. In: DASFAA (Vol. 1, pp. 26–35). 76. Vinciquerra, K. J., (1993). Terminology Database Record Standardization and Relational Organization in Computer-Assisted Terminology (pp. 170–171). ASTM Special Technical Publication, 1166. 77. Vyawahare, H. R., Karde, P. P., & Thakare, V. M., (2018). A hybrid database approach using graph and relational database. In: 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE) (Vol. 1, pp. 1–4). IEEE. 78. Yang, X., & Wang, G., (2001). Mapping referential integrity constraints from relational databases to XML. In: International Conference on Web-Age Information Management (Vol. 1, pp. 329–340). Springer, Berlin, Heidelberg. 79. Yazıcı, A., & Sözat, M. İ., (1998). The integrity constraints for similarity‐based fuzzy relational databases. International Journal of Intelligent Systems, 13(7), 641–659. 80. Yu, K., Chu, W., Yu, S., Tresp, V., & Xu, Z., (2006). Stochastic relational models for discriminative link prediction. Advances in Neural Information Processing Systems, 1, 19. 81. Zimmermann, K., (1975). Different views of a data base: Coexistence between network model and relational model. In: Proceedings of the 1st International Conference on Very Large Data Bases (Vol. 1, pp. 535–537).
5
CHAPTER
DATABASE PLANNING AND DESIGN
CONTENTS 5.1. Introduction..................................................................................... 140 5.2. The Database System Development Lifecycle................................... 141 5.3. Database Planning........................................................................... 143 5.4. Definition of the System................................................................... 144 5.5. Requirements Collection and Analysis............................................. 145 5.6. Database Design.............................................................................. 149 References.............................................................................................. 154
The Creation and Management of Database Systems
140
5.1. INTRODUCTION Several computer-based systems increasingly rely on software rather than hardware to succeed. Consequently, the company’s software development track record isn’t especially spectacular. Software applications have proliferated in recent decades, ranging from tiny, relatively basic applications with some code lines to big, complicated programs with thousands of lines of code. Most of such programs had to be maintained regularly (AlKodmany, 1999). This included fixing any errors that were discovered, adding additional user needs, and updating the program to function on new or updated systems. Maintenance initiatives have been launched to deplete resources at an astonishing rate. Consequently, a large number of important software projects were late, over budget, unstable, hard to manage, and underperforming (Skaar et al., 2022). This resulted in the occurrence of the software crisis. Even though this word had been originally coined in the late 1960s, the issue has been with us more than four decades later. Consequently, certain authors now call the software crisis the software slump. Research conducted in the United Kingdom by OASIG, a Special Interest Group dealing with Organizational Components of Information Technology, revealed the following concerning software projects (OASIG, 1996) (Teng & Grover, 1992). • • • • •
Eighty to 90% don’t fulfill their efficiency targets; Approximately 80% have been delivered late and beyond budget; Approximately 40% of them fail or have been withdrawn; fewer than 40% completely address skill and training needs; fewer than 25% combine enterprise and technology objectives effectively; • Only 10 to 20% of those who apply to satisfy all of the requirements. There have been multiple significant reasons for the software failure of the project, such as (El-Mehalawi & Miller, 2003): • • •
an absence of a comprehensive set of criteria; absence of a suitable development technique; Poor design deconstruction into manageable elements;
Database Planning and Design
141
An organized method for software design dubbed the Software Development Lifecycle (SDLC) or the Information Systems Lifecycle (ISLC) had been presented as a solution to such difficulties. The lifecycle has been more explicitly known as the Database System Development Lifecycle (DSDLC) whenever the program being created is a database system (Hernandez, 2013).
5.2. THE DATABASE SYSTEM DEVELOPMENT LIFECYCLE Because a database system is an integral part of a bigger organization’s information network, the DSDLC is inextricably linked to the information system’s lifecycle. Figure 5.1 depicts the steps of the DSDLC. Every stage’s name is followed by a link to the part of this chapter that covers that stage (Tetlay & John, 2009). It’s critical to understand that the phases of the DSDLC aren’t exactly sequential, although some of the preceding phases have been repeated via feedback loops. Issues identified throughout database design, for instance, can involve the gathering and study of new needs. We only display a few of the more prominent feedback loops in Figure 5.1 since there have feedback loops between many phases. Table 5.1 summarizes the major tasks connected with every step of the DSDLC (Ruparelia, 2010). The lifetime doesn’t have to be complicated for tiny database systems with a modest number of consumers. The lifecycle may get quite complex whenever developing a medium to big database system having hundreds to millions of consumers and thousands of questions and application programs. The activities related to the creation of medium to large database systems are the main focus. The primary tasks connected with every step of the DSDLC are described in further detail in the following parts (Weitzel & Kerschberg, 1989).
142
The Creation and Management of Database Systems
Figure 5.1. The steps in the building of a database system. Source: https://www.slideshare.net/AfrasiyabHaider/database-developmentlife-cycle.
Database Planning and Design
143
Table 5.1. A Synopsis of the Principal Activities Connected with Each Phase of the DSDLC
5.3. DATABASE PLANNING The management actions that enable the most effective and successful realization of the phases of the DSDLC (Teng & Grover, 1992). Database development should be unified into the organization’s total IT approach. The following are the 3 primary topics to consider while developing an IS approach (Haslum et al., 2007): •
Identifying corporate strategy and goals, followed by determining information system requirements; • Contemporary information systems (IS) are assessed to evaluate their strengths and limitations; • Evaluation of IT options that could lead to competitive benefits; The approaches employed to address such difficulties are outside the scope of this chapter; although, interested readers might consult Robson (1997) for a lengthier explanation. A crucial basic stage in database design
144
The Creation and Management of Database Systems
is to explicitly establish the database system’s mission statement (Taylor Jr et al., 2001). The database system’s main goals are defined in the mission statement. The mission statement is usually defined by those in charge of the database project inside the organization (like the Owner or Director). A mission statement clarifies the database system’s goal and paves the way for the effective and successful development of the essential database system (Junk, 1998). Following the definition of the mission statement, the following step is to determine the mission goals. Every mission goal must include a specific job that the database system ought to be able to do. The expectation would be that the mission statement will be met if the database system assists the mission goals. The mission statement and objectives can be followed by certain extra material that outlines, in broad terms, the job to be performed, the resources available to execute it, and the funds available to pay for everything (Álvarez-Romero et al., 2018). Database planning must also involve the creation of rules that govern how data would be gathered, the format that would be used, the documentation that would be required, and the implementation and design process. Standards may take a long time to design and manage, needing resources both to set them up and to keep them up to date (Moe et al., 2006). A properly-designed set of standards, on the other hand, may give a foundation for educating personnel and monitoring quality control, and also guarantee that work follows a sequence, regardless of staff abilities and experience. Specific rules, for instance, can regulate how data objects in the data dictionary are named, preventing both repetition and inconsistency. Every legal or corporate data requirement, like the need that certain categories of data be kept confidentially, must be recorded (Sievers et al., 2012).
5.4. DEFINITION OF THE SYSTEM The system describes both the scope and the limitations of the database use, in addition to the essential user views. Before starting to build a database system, this is important that we 1st establish the bounds of the system and how it interacts with other components of the information system of the organization. This must be done before we can move on to building the system. We mustn’t cover just our current customers and application locations, but also any potential customers and applications that may fall inside the boundaries of our system. The bounds and scope of the database system have been expanded to accommodate
Database Planning and Design
145
the important new user views that would be made possible by the database (Noor et al., 2009).
5.4.1. User Views Specifies the requirements for a database system from the standpoint of a certain job function (like Supervisor or Manager) or enterprise application domain (like stock control, personnel, or marketing) (Johansson et al., 2017).
Figure 5.2. User views (1, 2, and 3) as well as (5 and 6) contain overlapping criteria (represented as hatched regions), however, user view four has diverse demands. Source: https://slideplayer.com/slide/5694649/.
5.5. REQUIREMENTS COLLECTION AND ANALYSIS Gathering and processing data about the portion of the organization that will be assisted by the database system, and utilizing this data to determine the
The Creation and Management of Database Systems
146
system’s needs. This phase includes the gathering and processing of data about the enterprise division that will be served by the database. There have been various methods known as fact-finding methods for collecting the required data. For every main user view (such that, enterprise application area or job involvement), the following data has been collected (Hall & Fagen, 2017): • An explanation of the data that was utilized or produced; • The specifics of how data will be utilized or produced; and • If there are any other needs for the current database system. After that, the data has been evaluated to determine which needs (or characteristics) should be incorporated into the current database system. Such criteria have been outlined in a set of documents known as the new database system’s acceptance criteria (Arnold & Wade, 2015). The gathering and processing of needs is a step in the database design process. The amount of data collected has been determined by the nature of the issue and the company’s rules. Too much research too quickly might lead to analysis paralysis. Working on the incorrect solution to the wrong issue might result in a loss of both money and time if you don’t give it enough attention (Westmark, 2004). The data gathered at this stage can be unstructured and include certain unstructured requests that should be turned into a more organized statement of needs. It is accomplished using requirements specification approaches such as Data Flow Diagrams (DFD), Structured Analysis and Design (SAD), and Hierarchical Input Process Output (HIPO) charts, all of which are backed by documentation (Knight et al., 2003). Determining the essential functionality of a database system is a key task, as systems having insufficient or partial functionality would upset consumers, resulting in system rejection or underuse. Insufficient functionality, on the other hand, might be troublesome since it may make a system hard to create, manage, utilize, or learn. Another essential action related to this phase is choosing how to handle the circumstance in which the database system has several user views. There have been 3 major techniques for addressing the needs of a database system having numerous user views (Lantos, 1998): • • •
a blend of both techniques. the view integration technique; the centralized technique;
Database Planning and Design
147
Figure 5.3. Multiple user views 1 to 3 are managed in a centralized manner.
5.5.1. Centralized Approach Every user view’s needs are integrated into a single set of database system needs. During the database designing phase, a data model describing all user views has been built. The consolidated approach entails gathering all the criteria for various user views into a single list. A name has been provided to the set of user views that gives a few insights into the functional region surrounded by all the integrated user views (Franco‐Santos et al., 2007). A model of global data, that reflects all user views, has been built during the database design phase. The global data model is comprised of graphics and documents that explicitly outline the consumers’ data needs. Figure 5.3 shows a schematic depicting the administration of user views 1 to 3 utilizing a consolidated manner. Whenever there has been a lot of intersection in needs for every user view and the database system isn’t too complicated, this technique is usually recommended (Gacek et al., 1995).
5.5.2. View Integration Approach Every user’s needs are kept in separate lists. During the database design phase, data models reflecting every user view have been built and then
148
The Creation and Management of Database Systems
combined. The view integration strategy entails treating every user view’s needs as a distinct set of criteria. For every user view, we 1st develop a data model in the database design step (Chen et al., 2005). A local data model is a data model that identifies a specific user view. Every model is comprised of documentation and diagrams that officially define the needs of certain, though not all, database user views. The local data models are eventually brought together to form a global data model during a later stage of the database design process, that reflects all database user needs. Figure 5.4 shows a schematic depicting the administration of user views 1 through 3 utilizing the view integration technique. In general, this method is recommended (Bedogni et al., 2012).
Figure 5.4. Controlling multiple users’ views 1 to 3 utilizing the view integration technique.
While there have been considerable disparities between user perspectives and the database system has been complicated enough to justify breaking the task into smaller chunks. To manage various user views in certain large
Database Planning and Design
149
database systems, a mix of consolidated and view integration techniques can be effective (O’Neill et al., 1977). The needs for 2 or more user views, for instance, might be combined 1st utilizing the centralized technique, that has been utilized to generate a localized logical data model. The view integration technique may then be used to combine this model with additional local logical data models to create a global logical data model. Every local logical data model inside this scenario reflects the needs of 2 or more user views, whereas the last global logical data model reflects the needs of the database system’s whole user view set (Schalock & Luckasson, 2004).
5.6. DATABASE DESIGN The procedure for designing a database system that would support the enterprise’s mission objectives and mission statement. This chapter introduces the most common approaches to database design. In addition to this, we discuss the role that data modeling plays in the design of databases as well as their applications. After that, a discussion of the three stages of database design, namely logical design, conceptual design, and physical design, follows (Zilio et al., 2004).
5.6.1. Database Design Methodologies The terms “bottom-up” and “top-down” refer to the two basic approaches that can be taken while constructing a database. The bottom-up approach begins with the most fundamental aspects of the attributes (such that, characteristics of linkages and objects), which have been organized into relationships that describe classes of entities and interactions amongst entities by analyzing the correlations among characteristics (Rao et al., 2002). The process of normalization entails identifying the required qualities and then aggregating them into normal relations depending upon functional connections among them. The bottom-up strategy is best suited to the creation of basic databases with a limited number of properties. But, while designing more sophisticated databases with a higher number of features, where this is impossible to identify all of the functional connections between the characteristics, such a strategy becomes challenging. Because logical and conceptual data models for huge databases might include a large number of characteristics, it’s critical to develop a strategy that makes the development process easier (Navathe et al., 1984). Furthermore, it can be challenging to determine all of the features to be included in the data models in the early phases of defining the data needs
The Creation and Management of Database Systems
150
for a complicated database. The top-down method seems to be a more suited approach for designing complicated databases. Such a method begins with the creation of data models including several higher-level relationships and entities, followed by top-down revisions to locate low-level relationships, entities, and the characteristics that go with them (Shneiderman & Plaisant, 2010). The Entity-Relationship (ER) model has been used to demonstrate the top-down approach, which starts with the discovery of entities and connections amongst entities that are of relevance to the organization. For instance, we can start by defining the entities Property for Rent and Private Owner, then the connection between them, Private Owner Keeps Property for Rent, and lastly the associated traits like Private Owner (owner name, number, and address) and Property for Rent (property address and number) (Naiburg et al., 2001). Utilizing the ideas of the ER model, create a higher-level data model. Additional database design techniques include the inside-out technique and the mixed strategy approach. The inside-out strategy has been similar to the bottom-up technique, but it varies in that it identifies a collection of main entities initially, and then expands to include further relationships, entities, and qualities linked with those discovered initially. For certain portions of the model, the mixed strategy method employs both the top-down and bottom-up approaches before eventually merging all components (Schema, 1995).
5.6.2. Data Modeling The 2 basic goals of data modeling are to aid in the comprehension of the data’s interpretation and to make communication regarding information needs easier. Solving questions about relationships, entities, and characteristics is required while creating a data model. As a result, the designers learn about the semantics of the company’s data, that exist if they’re not documented in a formalized data model. All businesses rely upon entities, connections, and characteristics to function. Consequently, unless they are well recorded, their significance can be misunderstood. A data model makes it simpler to comprehend the data’s meaning, therefore we model data to guarantee that consumers comprehend (Wiederhold, 1983): • • •
The data from every user’s point of view; The application of data across many user perspectives; and The data’s intrinsic character, regardless of its physical manifestations.
Database Planning and Design
151
4Data models may be utilized to communicate the designer’s comprehension of the enterprise’s data needs. Assuming all parties are acquainted with the model’s notation, this would facilitate communication among designers and users. Enterprises are unifying how they model data by adopting a single strategy for data modeling and applying it to all database development initiatives. Depending upon the basics of the Entity-Relationship (ER) model, the ER model has the most frequent higher-level data model utilized in database architecture, and this is also the model that we make use of in this chapter (Hernandez, 2013). Table 5.2. The Requirements for an Optimum Data Model
5.6.3. Database Design Phases The 3 major stages of database design are logical, conceptual, and physical design. The procedure of creating a model of a company’s data that has been irrespective of any physical constraints. The 1st step of database design has been known as conceptual database design, and it entails creating a conceptual data model of the component of the business that we want to model. The data described in the users’ needs description has been used to build the data model. Implementation aspects like the target DBMS software, the programs of application, hardware platform, the languages of programming, or other physical concerns have no bearing on conceptual database architecture (Chin & Ozsoyoglu, 1981). In the procedure of developing a conceptual data model, the model has been subjected to testing and evaluation concerning the needs of the
152
The Creation and Management of Database Systems
customers. The conceptual data model of the enterprise acts as a source of information for the subsequent phases, which are concerned with the logical database design (Teng & Grover, 1992). Logical database design is the second phase in the process of designing a database. This step ultimately outcomes in the construction of a logical data model of the component of the business that we are attempting to explain. An improvement has been made to the conceptual data model used in the phase before this one, and it has been converted into a logical data model. The logical data model is relying upon the database’s aim data model (for instance, the model of relational data) (Navathe et al., 1986). A logical model has been produced understanding the fundamental data model of the target DBMS, while a conceptual data model has been irrespective of any physical factors. In other terms, we can tell if a database management system (DBMS) is relational, hierarchical, networked, or object-oriented. Other characteristics of the selected DBMS, in specific physical details like storage indexes and structures, are ignored (Klochkov et al., 2016). During the construction of a logical data model, the model was validated and evaluated against customer requirements. The validity of a logical data model was determined using the standardization methodology. Standardization confirms that the relationships created from the model of data don’t contain redundant data, which might cause update anomalies when employed. The logical data model must also be inspected to determine if it assists the client-requested transactions (Letkowski, 2015). The logical data model has been used as a source of data for the succeeding phase, which is known as the physical database design. It gives the physical database designer the ability to make compromises, which have proven to be essential for effective database design. The logical model also plays an important part in the DSDLC’s operational maintenance phase, which takes place after the design phase. When maintained properly and updated regularly, the data model enables the database to reflect possible changes reliably and effectively (Finkelstein et al., 1988). The 3rd and last phase of the database design procedure is physical database design, wherein the designer chooses how the database will be executed. The construction of a logical structure for the database, that explains relationships and enterprise restrictions, was the preceding part of database design (Chowdary et al., 2008). Even though this structure has been autonomous of the DBMS, this is designed around a certain data model,
Database Planning and Design
153
like network, relational, or hierarchical. But, before we can construct the physical database architecture, we should first determine the desired DBMS. As a result, physical architecture is adapted to a certain DBMS. There is an interplay between logical and physical design because changes made during physical design to improve performance can modify the structure of the logical data model (Dey et al., 1999). The basic objective of physical database design is to determine the physical implementation of the logical database architecture. The relational model implies the following (Carroll, 1987): •
generating a collection of relational tables and limitations based on the data supplied in the logical data model. • Assessing the data’s unique storage architecture and access techniques to optimize the database system’s efficiency; and • developing the system’s security protection. For the following 3 primary reasons, logical, and conceptual database design for bigger systems must preferably be kept distinct from physical design (Ling & Dobbie, 2004): • •
It focuses on a particular topic – the what rather than the how. This is carried out at a separate moment – the what should be comprehended before the how may be decided. • It necessitates a variety of abilities, which may be found in a variety of persons. The design of a database seems to be an iterative procedure, which means that there is a beginning point and an almost never-ending procession of improvements. To comprehend them better, see them as learning processes. As the designers get a deeper grasp of the inner workings of the company and the value of its data, and as they demonstrate their mastery of the selected data models, this is feasible that the data gathered would need revisions to other areas of the design (Kent & Williams, 1993). Particularly, logical, and conceptual database designs are crucial to the accomplishment of the system’s overarching goals. If the designs do not accurately depict the enterprise, it would be exceptionally hard, when it is not impossible, to specify all of the necessary user views or to keep the database’s integrity intact. It can even be challenging to describe the physical execution and keep the system running at a satisfactory level. On either side, one of the hallmarks of successful database architecture is the capacity to adapt to change. As a result, this is worthwhile to invest the time and effort required to create the greatest possible design (Atassi et al., 2014).
154
The Creation and Management of Database Systems
REFERENCES 1.
Al-Kodmany, K., (1999). Using visualization techniques for enhancing public participation in planning and design: Process, implementation, and evaluation. Landscape and Urban Planning, 45(1), 37–45. 2. Álvarez-Romero, J. G., Mills, M., Adams, V. M., Gurney, G. G., Pressey, R. L., Weeks, R., & Storlie, C. J., (2018). Research advances and gaps in marine planning: Towards a global database in systematic conservation planning. Biological Conservation, 227, 369–382. 3. Arnold, R. D., & Wade, J. P., (2015). A definition of systems thinking: A systems approach. Procedia Computer Science, 44, 669–678. 4. Atassi, N., Berry, J., Shui, A., Zach, N., Sherman, A., Sinani, E., & Leitner, M., (2014). The PRO-ACT database: Design, initial analyses, and predictive features. Neurology, 83(19), 1719–1725. 5. Bedogni, A., Fusco, V., Agrillo, A., & Campisi, G., (2012). Learning from experience. Proposal of a refined definition and staging system for bisphosphonate-related osteonecrosis of the jaw (BRONJ). Oral Diseases, 18(6), 621. 6. Carroll, J. M., (1987). Strategies for extending the useful lifetime of DES. Computers & Security, 6(4), 300–313. 7. Chen, C. H., Khoo, L. P., & Yan, W., (2005). PDCS—a product definition and customization system for product concept development. Expert Systems with Applications, 28(3), 591–602. 8. Chin, F. Y., & Ozsoyoglu, G., (1981). Statistical database design. ACM Transactions on Database Systems (TODS), 6(1), 113–139. 9. Chowdary, V. M., Chandran, R. V., Neeti, N., Bothale, R. V., Srivastava, Y. K., Ingle, P., & Singh, R., (2008). Assessment of surface and subsurface waterlogged areas in irrigation command areas of bihar state using remote sensing and GIS. Agricultural Water Management, 95(7), 754–766. 10. Dey, D., Storey, V. C., & Barron, T. M., (1999). Improving database design through the analysis of relationships. ACM Transactions on Database Systems (TODS), 24(4), 453–486. 11. El-Mehalawi, M., & Miller, R. A., (2003). A database system of mechanical components based on geometric and topological similarity. Part I: Representation. Computer-Aided Design, 35(1), 83–94.
Database Planning and Design
155
12. Finkelstein, S., Schkolnick, M., & Tiberio, P., (1988). Physical database design for relational databases. ACM Transactions on Database Systems (TODS), 13(1), 91–128. 13. Franco‐Santos, M., Kennerley, M., Micheli, P., Martinez, V., Mason, S., Marr, B., & Neely, A., (2007). Towards a definition of a business performance measurement system. International Journal of Operations & Production Management, 1, 2–9. 14. Gacek, C., Abd-Allah, A., Clark, B., & Boehm, B., (1995). On the definition of software system architecture. In: Proceedings of the First International Workshop on Architectures for Software Systems (Vol. 1, pp. 85–94). 15. Hall, A. D., & Fagen, R. E., (2017). Definition of system. In: Systems Research for Behavioral Science Systems Research (Vol. 1, pp. 81–92). Routledge. 16. Haslum, P., Botea, A., Helmert, M., Bonet, B., & Koenig, S., (2007). Domain-independent construction of pattern database heuristics for cost-optimal planning. In: AAAI (Vol. 7, pp. 1007–1012). 17. Hernandez, M. J., (2013). Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design. Pearson Education. 18. Hernandez, M. J., (2013). Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design (Vol. 1, pp. 2–6). Pearson Education. 19. Johansson, Å., Skeie, Ø. B., Sorbe, S., & Menon, C., (2017). Tax Planning by Multinational Firms: Firm-Level Evidence from a CrossCountry Database, 1, 2–5. 20. Junk, M., (1998). Domain of definition of Levermore’s five-moment system. Journal of Statistical Physics, 93(5), 1143–1167. 21. Kent, A., & Williams, J. G., (1993). Encyclopedia of Microcomputers: Multistrategy Learning to Operations Research: Microcomputer Applications (Vol. 1, pp. 2–9). CRC Press. 22. Klochkov, Y., Klochkova, E., Antipova, O., Kiyatkina, E., Vasilieva, I., & Knyazkina, E., (2016). Model of database design in the conditions of limited resources. In: 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (Vol. 1, pp. 64–66). IEEE. 23. Knight, J. C., Strunk, E. A., & Sullivan, K. J., (2003). Towards a rigorous definition of information system survivability. In: Proceedings
156
24.
25. 26. 27.
28. 29.
30. 31.
32.
33.
34.
The Creation and Management of Database Systems
DARPA Information Survivability Conference and Exposition (Vol. 1, pp. 78–89). IEEE. Lantos, P. L., (1998). The definition of multiple system atrophy: A review of recent developments. Journal of Neuropathology and Experimental Neurology, 57(12), 1099. Letkowski, J., (2015). Doing database design with MySQL. Journal of Technology Research, 6, 1. Ling, T. W., & Dobbie, G., (2004). Semistructured Database Design (Vol. 1, pp. 2–9). Springer Science & Business Media. Moe, S., Drüeke, T., Cunningham, J., Goodman, W., Martin, K., Olgaard, K., & Eknoyan, G., (2006). Definition, evaluation, and classification of renal osteodystrophy: A position statement from kidney disease: Improving global outcomes (KDIGO). Kidney International, 69(11), 1945–1953. Naiburg, E., Naiburg, E. J., & Maksimchuck, R. A., (2001). UML for Database Design (Vol. 1, pp. 2–4). Addison-Wesley Professional. Navathe, S., Ceri, S., Wiederhold, G., & Dou, J., (1984). Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS), 9(4), 680–710. Navathe, S., Elmasri, R., & Larson, J., (1986). Integrating user views in database design. Computer, 19(01), 50–62. Noor, A. M., Alegana, V. A., Gething, P. W., & Snow, R. W., (2009). A spatial national health facility database for public health sector planning in Kenya in 2008. International Journal of Health Geographics, 8(1), 1–7. O’Neill, J. P., Brimer, P. A., Machanoff, R., Hirsch, G. P., & Hsie, A. W., (1977). A quantitative assay of mutation induction at the hypoxanthineguanine phosphoribosyl transferase locus in Chinese hamster ovary cells (CHO/HGPRT system): Development and definition of the system. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 45(1), 91–101. Rao, J., Zhang, C., Megiddo, N., & Lohman, G., (2002). Automating physical database design in a parallel database. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Vol. 1, pp. 558–569). Ruparelia, N. B., (2010). Software development lifecycle models. ACM SIGSOFT Software Engineering Notes, 35(3), 8–13.
Database Planning and Design
157
35. Schalock, R. L., & Luckasson, R., (2004). American association on mental retardation’s definition, classification, and system of supports and its relation to international trends and issues in the field of intellectual disabilities. Journal of Policy and Practice in Intellectual Disabilities, 1(3, 4), 136–146. 36. Schema, C., (1995). Relational Database Design (Vol. 1, pp. 2–9). Prentice Hall Austria. 37. Shneiderman, B., & Plaisant, C., (2010). Designing the User Interface: Strategies for Effective Human-Computer Interaction (Vol. 1, pp. 2, 3). Pearson Education India. 38. Sievers, S., Ortlieb, M., & Helmert, M., (2012). Efficient implementation of pattern database heuristics for classical planning. In: International Symposium on Combinatorial Search (Vol. 3, No. 1, pp. 1–5). 39. Skaar, C., Lausselet, C., Bergsdal, H., & Brattebø, H., (2022). Towards a LCA database for the planning and design of zero-emissions neighborhoods. Buildings, 12(5), 512. 40. Taylor, Jr. F. B., Toh, C. H., Hoots, K. W., Wada, H., & Levi, M., (2001). Towards definition, clinical and laboratory criteria, and a scoring system for disseminated intravascular coagulation. Thrombosis and Haemostasis, 86(11), 1327–1330. 41. Teng, J. T. C., & Grover, V., (1992). Factors influencing database planning: An empirical study. Omega, 20(1), 59–72. 42. Teng, J. T., & Grover, V., (1992). An empirical study on the determinants of effective database management. Journal of Database Management (JDM), 3(1), 22–34. 43. Tetlay, A., & John, P., (2009). Determining the Lines of System Maturity, System Readiness and Capability Readiness in the System Development Lifecycle, 1, 2–6. 44. Weitzel, J. R., & Kerschberg, L., (1989). Developing knowledgebased systems: Reorganizing the system development life cycle. Communications of the ACM, 32(4), 482–488. 45. Westmark, V. R., (2004). A definition for information system survivability. In: 37th Annual Hawaii International Conference on System Sciences, 2004: Proceedings of the (Vol. 1, p. 10). IEEE. 46. Wiederhold, G., (1983). Database Design (Vol. 1077, pp. 4–9). New York: McGraw-Hill.
158
The Creation and Management of Database Systems
47. Zilio, D. C., Rao, J., Lightstone, S., Lohman, G., Storm, A., GarciaArellano, C., & Fadden, S., (2004). DB2 design advisor: Integrated automatic physical database design. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases (Vol. 1, pp. 1087– 1097).
6
CHAPTER
DATA MANIPULATION
CONTENTS 6.1. Introduction..................................................................................... 160 6.2. Introduction to SQL......................................................................... 160 6.3. Writing SQL Commands.................................................................. 165 6.4. Data Manipulation........................................................................... 167 References.............................................................................................. 173
160
The Creation and Management of Database Systems
6.1. INTRODUCTION We provided an in-depth description of such relational models as well as relational languages. Query Language, as well as SQL which is more generally known, is a special language which has arisen as a result of the evolution of the relational paradigm. In the recent past, SQL has emerged as the dominant language for relational database management systems (RDBMSs). The Structured Query Language (SQL) specification was first developed in 1986 first by American National Standards Institute (ANSI), and it has been later approved for use internationally and accepted in 1987 even by International Organization for Standardization (ISO) (ISO, 1987) (Warnes et al., 2014). SQL is presently supported by more than one hundred Database Systems, and it can operate on a wide variety of hardware platforms, including personal computers to mainframes. Even though SQL is now rather important, we do not make an effort to cover all aspects of the language because this particular standard is somewhat complicated. In this chapter, we will concentrate on the expressions of a language that are responsible for data modification (Chatterjee & Segev, 1991).
Figure 6.1. A graphical example of the manipulation of data files. Source; https://www.solvexia.com/blog/5-top-tips-for-data-manipulation.
6.2. INTRODUCTION TO SQL In this part, we’ll go over the goals of SQL, provide a brief history of a language, and then explain why it’s so vital for database systems (Kim, 1992).
Data Manipulation
161
Figure 6.2. SQL: A basic overview. Source: https://www.javatpoint.com/dbms-sql-introduction.
6.2.1. Objectives of SQL Relational languages must ideally enable a user to (Melton & Simon, 2001): • •
construct the database and relational structures; fundamental data analysis activities, including such data input, update, and deleting from relationships; and • Both basic and complicated queries may be run. A computer language’s command and control and syntax should be reasonably simple to learn and must fulfill these functions with little user effort. Lastly, the languages should be transportable, that is, they must adhere to some widely accepted standards such that we may utilize the very same central command as well as syntax from one database management system (DBMS) to the next. SQL was created to meet these criteria (Emerson et al., 1989). SQL would be an example of a transform-oriented language, which is a language intended to convert data into desired outputs using relations. An ISO SQL standard includes two key components like a language (Lans, 2006):
The Creation and Management of Database Systems
162
•
A Data Definition Language (DDL) for establishing database schema and regulating access to data; • Data Manipulation Language (DML) is a programming language for accessing and altering data. SQL only had these operational definitions and manipulation instructions before SQL:1999; it didn’t include flows of control systems like IF. THEN. ELSE, GO TO, or DO. WHILE. These have to be programmed in coding or career language or performed interactively based just on the user’s choices. SQL may be utilized in two ways because of that lack of logical correctness. The first method would be to use SQL manually by typing commands into a terminal. So second option is to use a practical language to incorporate SQL queries (Ali, 2010). SQL is just a language that is reasonably simple to learn (Kofler, 2005): •
That’s a non-procedural language in which you express what data you need instead of how to acquire it. To put it another way, SQL doesn’t need you to define the database access techniques. • SQL, like other current languages, is largely free-format, meaning that portions of statements don’t have to be entered in certain positions on the display. The instruction structure is made out of common English terms like CREATE INSERT, TABLE and SELECT. Consider the following scenario (Kumar et al., 2014): – CREATE TABLE Staff (staff VARCHAR(5), name VARCHAR(15), salary DECIMAL(7,2)); – INSERT INTO Staff VALUES (‘SG16,’ ‘Brown,’ 8300); – SELECT staff, lName, salary FROM Staff WHERE salary > 10000; Database administrators (DBAs), management employees, software developers, as well as a variety of other sorts of end-users may all utilize SQL (de Haan, 2005). The SQL language currently has an international standard, establishing it as the official and de facto industry-standard language for designing and using database systems (ISO, 1992,1999a) (Kriegel, 2011).
Data Manipulation
163
6.2.2. History of SQL The foundational article by E. F. Codd, when serving at IBM’s Research Laboratory in San José, began the evolution of the relational data model (and consequently SQL) (Codd, 1970). D. Chamberlin, another IBM San José Research employee, established the Organized English Programming Language, as well as SEQUEL, in 1974. Around 1976, an updated version called SEQUEL/2 was introduced, however, the name was later altered to SQL for legal concerns (Chamberlin and Boyce, 1974; Chamberlin et al., 1976). Although the formal pronunciation is ‘S-Q-L,’ most individuals commonly interpret SQL as ‘See-Quel’ (Harkins & Reid, 2002). IBM developed System R, a prototype DBMS centered on SEQUEL/2 (Astrahan et al., 1976). The goal of this sample was to see whether the relational data model was feasible. One of the most noteworthy achievements credited to this project, apart from its other accomplishments, was the invention of SQL. The system SQUARE (Clarifying Inquiries as Regional Expressions), which predates the Systems R project, is where SQL gets its start. SQUARE was created as an experimental language to use with English phrases to achieve relational algebra (Boyce et al., 1975). Database management Oracle, developed by what became known as the Oracle Corporation in the late 1970s, was likely the very first commercialized version of such a relational DBMS depending on SQL. Soon after, INGRES came up with QUEL, a programming language that, while more organized than SQL, is a little less English-like. INGRES is transformed into a SQL-based DBMS when SQL became the primary database language of distributed databases. From 1981 to 1982, IBM released SQL/ DS, the very first professional RDBMS, for such DOS/VSE and VM/CMS systems, accordingly, but also DB2 for such VMS systems in 1983 (Haan et al., 2009). Following an IBM project proposal, the ANSI started development on even a Relational Database Language (RDL) in 1982. In 1983, ISO entered the effort, and the two groups collaborated to create SQL standards. (In 1984, the moniker RDL was discarded, and the proposed standard returned to a format very much like to current SQL versions.) The first ISO standard, which was released in 1987, received a lot of flak. Important elements including relational model requirements and some relational operations, according to Date, an outstanding researcher in this field, were missing (Gorman, 2014; Leng & Terpstra, 2010). He also noted how the language was immensely repetitive, meaning that the identical question could be written in several ways (Date, 1986, 1987a, 1990). Most of the complaint was
164
The Creation and Management of Database Systems
genuine, and the standards organizations had acknowledged this before the guideline was issued. Therefore, it was considered that releasing a standard as soon as possible to create a common basis wherein the languages and implementation might evolve was more essential than waiting until all of the features that individuals thought should be included could be described and agreed upon (Willis, 2003). An ‘Integrity Improvement Feature’ was specified in an ISO supplement released in 1989. (ISO, 1989). Around 1992, this ISO standard underwent its first significant change, known as SQL2 and SQL-92 (ISO, 1992). Even though some capabilities were described for the very first time in the standards, several of them would have previously been incorporated in one or both of the numerous SQL versions, either part or even in a comparable manner. The following version of the standards generally referred to only as SQL:1999, did not become official until 1999. (ISO, 1999a). Further capabilities for entity data processing have been included in this edition. In late 2003, a new version of SQL, SQL:2003, was released (Roof & Fergus, 2003). Additions are vendor-provided features that go above and beyond the standards. Using data inside a SQL Server database, for instance, that standard that provides six alternative data types. Several implementations add a variety of enhancements to this list. A vernacular is a term used to describe how SQL is implemented. There are no two precisely identical languages, and no dialect presently fits the ISO standard. Furthermore, when database manufacturers add new features, they are extending their SQL languages and separating them still further (Jesse, 2018). A SQL language’s essential core, on the other hand, is exhibiting symptoms of standardization. To claim conformity with the SQL:2003 standards, a supplier must provide a set of capabilities known as Core SQL. Several of the remaining points are separated into bundles; for instance, product capabilities and OLAP were separated into bundles (Operational Logical Processing). Even though SQL was created by IBM, its prominence prompted other companies to develop their versions. There are probably hundreds of SQL-based solutions available nowadays, including new ones being released regularly (El Agha et al., 2018).
6.2.3. Importance of SQL SQL will be the first and, thus far, most widely accepted specific database language. The Network Database Languages (NDL), depending on the
Data Manipulation
165
CODASYL network structure, are also the only mainstream database languages (Thomas et al., 1977). Almost every major contemporary vendor offers SQL-based or SQL-interfaced database solutions, and almost all are included in just several of the standard-setting organizations. Both suppliers and consumers have made significant investments in the SQL language. Many big and prominent groups, such as the X/OPEN collaboration for UNIX standard, have adopted this as part of their structural arrangements, including IBM’s Systems Application Architecture (SAA) (Der Lans, 2007). SQL has also been designated as a Federal Information Processing Standard (FIPS), with compliance required for any DBMS sales to the US administration. A group of companies known as the SQL Includes Pre created a set of modifications to SQL which would allow for compatibility across different systems. As just an operational definitions tool, SQL is utilized in those other specifications and even impacts the creation of other standards (Rose, 1989). The ISO’s Information Resource Dictionary System (IRDS) and Remote Data Access (RDA) standards are two examples. A research study on the language’s progress is strong, also supplying a theoretical foundation for such language as well as the tools required to properly implement it. This is particularly true when it comes to query processing, data dissemination, and protection. There are currently specialized SQL variants aimed at new industries, like OnLine Analytical Processing (OLAP) (Ntagwabira & Kang, 2010).
6.2.4. Terminology The phrases tables, rows, and columns are used in the ISO Software system rather than the formal terminology relationships, properties, and tuples. We generally utilize the ISO nomenclature in our SQL presentations. SQL also deviates from the structural model’s specification. SQL, for instance, enables duplicated rows within a table created by the SELECT command, enforces column sorting, and allows users to organize the rows in a results section (Duncan, 2018).
6.3. WRITING SQL COMMANDS We’ll go through the architecture of a SQL query and the syntax we are using to specify the syntax of the different SQL operations throughout this section. Protected words and consumer terms make up a SQL query. Protected words are a permanent element of a SQL language with a specific meaning. They
The Creation and Management of Database Systems
166
can’t be divided over paragraphs and should be spelt precisely as necessary. User-defined terms are composed either by the user (following specific grammar requirements) or reflect the names of different database objects, including columns, tables, views, and indexes (Ma et al., 2019). A collection of syntactic rules governs the construction of words in the sentence. However, it is not required by the standard, several variants of SQL do need and use of a transaction terminator (typically a semicolon ‘;’) to indicate the conclusion of each SQL statement. Many SQL query elements are generally case sensitive, meaning letters may be written in either top or bottom case. Actual textual data should be written precisely as it exists inside the database, which is the only relevant exception for the principle. If we save a user’s surname as ‘SMITH’ and subsequently searched for it with the phrase ‘Smith,’ that row would not be discovered (Kearns et al., 1977). Because SQL is an unrestricted language, capitalization, and process defining make a SQL statement or series of queries more accessible. Consider the following scenario (Saisanguansat & Jeatrakul, 2016): • •
A statement’s clauses should all start on such a new line; Every clause’s starting should be aligned with the beginnings of subsequent clauses; and • When a clause contains many parts, which are on its line and recessed at the beginning of the sentence to show the connection. We utilize the extended version of a Backus Naur Form (BNF) syntax to construct SQL commands over this and the following section (Julavanich et al., 2019): •
Reserved terms are represented by upper-case characters and must be spelt precisely as indicated. • Consumer terms are represented by smaller letters.; • A vertical bar () denotes a choice between two options, such as a, b, and c; • A needed element is indicated by curly brackets, such as a; and • Optional elements are indicated by square brackets, such as [a]. The ellipsis (…) can be used to show that an item may be repeated zero or more times. Consider the following scenario (Otair et al., 2018): {a | b} (, c . . .)
Data Manipulation
167
meaning either a kind or b, detached by commas, followed by zero or even more repetition of c. In reality, DDL instructions are being used to build the database schema (tables) and accessing methods (what every user may legally view), and afterwards, DML instructions are often used to fill and search the tables. Consequently, we offer the DML instructions well before DDL comments in this chapter to emphasize the relevance of DML expressions to the average user (Batra, 2018).
6.4. DATA MANIPULATION This section examines the (Piyayodilokchai et al., 2013):
SQL DML instructions, specifically
• SELECT – to do a MySQL database; • INSERT – to enter information into such a table; • UPDATE – to make changes to database table; and • DELETE – to remove information from a table. We concentrate most of this chapter on the Group by clause and its different variants because of its intricacy and also the relative ease of all the other DML statements. We start with basic searches and gradually add difficulty to demonstrate how more sophisticated queries using sorting, combining, aggregation, and inquiries on several tables may be created. The INSERT, UPDATE, and DELETE procedures are discussed after the chapter (Sarhan et al., 2017). The SQL commands are shown using corresponding tables from the Provided study case: Divide (branchNo, highway, urban, postcode) Personnel (staffNo, fName, lName, location, gender, DOB, income, branchNo) (propertyNo, road, metropolis, postcode, kind, rooms, rent, ownerNo, staffNo, branchNo) PropertyForRent (propertyNo, street, city, postcode, type, lodgings, rent, ownerNo, staffNo, branchNo) a customer (clientNo, fName, lName, telNo, prefType, maxRent) Literals: C When we go into the SQL DML instructions, it’s important to understand what the literals are. Constants used in SQL commands are
168
The Creation and Management of Database Systems
known as literals. For each type of data handled by SQL, there are multiple types of regular expressions. However, we may differentiate between lambda expressions that are encased in quotation marks and those that are not for the sake of simplicity. Only one quotation must be used to encapsulate any non-numeric data; single quotations are not required for integer data values. We might utilize literally to enter data into the table, for instance (Zhang H. & Zhang X., 2018): INSERTINTO PropertyForRent (propertyNo, road, municipal, postcode, kind, lodgings, payment, staffNo, ownerNo, branchNo) VALUES (‘PA14,’ ‘16 Holhead,’ ‘Aberdeen,’ ‘AB7 5SU,’ ‘House,’ 6, 650.00, ‘CO46,’ ‘SA9,’ ‘B007’); The amount in column rooms is just an integer literal, while the amount in column price is just a decimal system literal; neither is contained in quote marks. The remaining sections are all text strings contained in quote marks (Kline et al., 2008). Simple Queries: The SELECT declaration is used to get data through one or maybe more MySQL database and show it. It’s a strong command that combines the Selecting, Projector, and Join procedures of relational calculus into a single sentence. The most often used SQL function is SELECT, which will have the following specific form (Budiman et al., 2017):
TableName is the name of the appropriate database particular table to which you have direct exposure, and pseudonym is an entirely voluntary abbreviated form for TableName. column expression reflects a new column or appearance, TableName seems to be the name of such an appropriate database particular table to which you have direct exposure, as well as alias is an entirely voluntary abbreviated form for TableName. In such a SELECT statement, the following steps are followed (Haan et al., 2009): FROM indicates which table(s) will be utilized (Morton et al., 2010) WHERE Sieves the rows based on a criterion. GROUP BY rows through the similar column worth are grouped together HAVING filters the categories based on a set of criteria
Data Manipulation
169
SELECT defines the columns that will be shown in the production ORDER BY defines the production output order Within the SELECT statement, the sequence of the variables cannot be modified. The very first two sentences, SELECT, and FROM, are the only ones that must be used; the rest are optional. The SELECT procedure is complete: the output of a database inquiry is yet another table. As we’ll see, there are several variants of this phrase (Piyayodilokchai et al., 2011).
6.4.1. Example #1 Retrieve All Columns, All Rows List full particulars of all staff. Later there remain no limitations stated in this inquiry, the WHERE section is needless and all columns are essential. We inscribe this inquiry as: SELECT fName, lName, staffNo, DOB, sex, salary, position, branchNo FROM Staff; Because many SQL reconstructions need some of a table’s column, an asterisk (*) instead of column names is a simple method to indicate ‘all columns’ in SQL. This question may also be expressed in the following fashion, which is similar and shortened (Kemalis & Tzouramanis, 2008): SELECT * FROM Staff; Table 6.1 shows the outcome table in both cases. Table 6.1. Table of Results for Example #1
170
The Creation and Management of Database Systems
6.4.2. Example # 2 Retrieve Specific Columns, All Rows Make a list among all employees’ pay, including simply their employee number, primary, and previous names, and income information. SELECT salary, fName,staffNo, lName, FROM Staff; Inside this example, a database table is generated using Staff that only contains the columns staffNo, fName, lName, and salary, in the order requested. Table 6.2 shows the outcome of this procedure. The rows within the results section may not have been sorted unless otherwise specified. Most DBMSs sort the bring success by one or even more columns (for instance, Microsoft Office Admission would category this outcome table founded on the main key staffNo). In the following part, we’ll show you how and where to sort individual rows of a results section (Renaud & Biljon, 2004). Table 6.2. Table of Results for Example #2
6.4.3. Example # 3 Use of Distinct Make a tilt of completely of the possessions facts that have been seen (Arcuri & Galeotti, 2019). SELECT property FROM Viewing; Table 6.3 displays the results. Although SELECT doesn’t erase duplicates when projecting across one or even additional columns, unlike with the association algebra Projecting procedure, there are many multiple copies. We utilize the DISTINCT keywords to get rid of the duplicates. Rephrase the question as follows (Rankins et al., 2010): SELECT DISTINCT property FROM Viewing;
Data Manipulation
6.4.
171
After removing the copies, we receive the result table presented in Table
Table 6.3. With Duplicates, the Outcome Table for Example #3
Table 6.4. Duplicates are Removed from the Outcome Table for Example #3
6.4.4. Example # 4 Calculated Fields Create a list of monthly income for all employees, including the employee number, initial, and previous names, and payment information (Phewkum et al., 2019). SELECT staffNo, fName, lName, salary/12 FROM Staff; And that therefore monthly wages are needed, this query is now almost similar to Example #2. In this scenario, just divide the pay by 12 to get the required result, as shown in Table 6.5 (Bagui et al., 2006). That’s an illustration of a computed field in action (occasionally termed a calculated or resulting ground). In general, you provide a SQL statement in the SELECT list to utilize a computed column. Adding, reducing, multiplying, and dividing are all possible SQL statements, and parenthesis may be utilized to create more complicated expressions. A computed column may include several table columns; although, the columns referred to in such an algebraic statement must be of the numeric category (Sadiq et al.,
The Creation and Management of Database Systems
172
2004). col4 has been printed as the fourth column of the intended values. Usually, the term of a column confidential in the result table is named of the equivalent column in the data frame in which it was fetched. SQL, on the other hand, has no idea how to name the column in just this scenario. In certain languages, the column is given a name that corresponds to its location in the table (Gini, 2008). Table 6.5. Table of Results for Example #4
(For instance, col4); others may be left the column name vacant and use the SELECT list statement. The AS phrase is allowed by the Standard specifications for naming the column. We might have published: in the preceding example (Cosentino et al., 2015): SELECT fName, lName, staffNo, salary/12 AS monthly salary FROM Staff; Row selection (WHERE clause) The SELECT command is used to obtain all rows from such a table in the examples below. Furthermore, we often have to limit the number of rows that are returned. The WHERE clause does this by combining the term WERE with a searching disorder that defines the rows to be fetched. The following stand the five fundamental search criteria (or conditional statements in ISO terminology) (Zhang et al., 2009): • • • • •
Comparison: Examine the validity of one statement in comparison to the worth of the alternative. Range: Check if the value of an appearance is inside a certain range of possibilities. Set Membership: Check if an expression’s factor is close to one with a collection of values. Pattern Match: Check to see whether a filament fits a pattern. Null: Check for null (unknown) values in a column.
Data Manipulation
173
REFERENCES 1.
2.
3. 4. 5.
6. 7.
8. 9.
10. 11. 12.
13. 14.
Ali, M., (2010). An introduction to Microsoft SQL server stream insight. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application (Vol. 1, pp. 1–1). Arcuri, A., & Galeotti, J. P., (2019). SQL data generation to enhance search-based system testing. In: Proceedings of the Genetic and Evolutionary Computation Conference (Vol. 1, pp. 1390–1398). Bagui, S., Bagui, S. S., & Earp, R., (2006). Learning SQL on SQL server 2005 (Vol. 1, pp. 2–9). O’Reilly Media, Inc. Batra, R., (2018). SQL Primer (Vol. 1, pp. 183–187). Apress, Berkeley, CA. Budiman, E., Jamil, M., Hairah, U., & Jati, H., (2017). Eloquent object relational mapping models for biodiversity information system. In: 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT) (Vol. 1, pp. 1–5). IEEE. Chatterjee, A., & Segev, A., (1991). Data manipulation in heterogeneous databases. ACM SIGMOD Record, 20(4), 64–68. Cosentino, V., Izquierdo, J. L. C., & Cabot, J., (2015). Gitana: A SQL-based git repository inspector. In: International Conference on Conceptual Modeling (Vol. 1, pp. 329–343). Springer, Cham. De Haan, L., (2005). Introduction to SQL, i SQL* plus, and SQL* plus. Mastering Oracle SQL and SQL* Plus, 1, 25–64. Der, L. V., (2007). Introduction to SQL: Mastering the Relational Database Language, 4/E (With Cd) (Vol. 1, pp. 3–9). Pearson Education India. Duncan, S., (2018). Practical SQL. Software Quality Professional, 20(4), 64–64. El Agha, M. I., Jarghon, A. M., & Abu-Naser, S. S., (2018). SQL Tutor for Novice Students (Vol. 1, pp. 4–8). Emerson, S. L., Darnovsky, M., & Bowman, J., (1989). The Practical SQL Handbook: Using Structured Query Language (Vol. 1, pp. 2–6). Addison-Wesley Longman Publishing Co., Inc. Gini, R., (2008). Stata tip 56: Writing parameterized text files. Stata Journal, 8(199-2016-2514), 134–136. Gorman, T., (2014). Introduction to SQL and SQL developer. In: Beginning Oracle SQL (Vol. 1, pp. 23–58). A press, Berkeley, CA.
174
The Creation and Management of Database Systems
15. Haan, L. D., Fink, D., Gorman, T., Jørgensen, I., & Morton, K., (2009). Introduction to SQL, SQL* plus, and SQL developer. In: Beginning Oracle SQL (Vol. 1, pp. 25–69). Apress. 16. Haan, L. D., Fink, D., Gorman, T., Jørgensen, I., & Morton, K., (2009). Writing and automating SQL* plus scripts. In: Beginning Oracle SQL (Vol. 1, pp. 287–327). Apress. 17. Harkins, S. S., & Reid, M. W., (2002). Introduction to SQL server. In: SQL: Access to SQL Server (Vol. 1, pp. 307–370). Apress, Berkeley, CA. 18. Jesse, G., (2018). SQL: An introduction to SQL lesson and hands-on lab. In: Proceedings of the EDSIG Conference ISSN (Vol. 2473, p. 3857). 19. Julavanich, T., Nalintippayawong, S., & Atchariyachanvanich, K., (2019). RSQLG: The reverse SQL question generation algorithm. In: 2019 IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA) (Vol. 1, pp. 908–912). IEEE. 20. Kearns, R., Shead, S., & Fekete, A., (1997). A teaching system for SQL. In: Proceedings of the 2nd Australasian Conference on Computer Science Education (Vol. 1, pp. 224–231). 21. Kemalis, K., & Tzouramanis, T., (2008). SQL-IDS: A specificationbased approach for SQL-injection detection. In: Proceedings of the 2008 ACM Symposium on Applied Computing (Vol. 1, pp. 2153–2158). 22. Kim, W., (1992). Introduction to SQL/X. In: Future Databases’ 92 (Vol. 1, pp. 2–7). 23. Kline, K., Kline, D., & Hunt, B., (2008). SQL in a Nutshell: A Desktop Quick Reference Guide (Vol. 1, pp. 1–4). O’Reilly Media, Inc. 24. Kofler, M., (2005). An introduction to SQL. The Definitive Guide to MySQL5 (2nd edn., pp. 189–216). 25. Kriegel, A., (2011). Discovering SQL: A Hands-on Guide for Beginners (Vol. 1, pp. 5–7). John Wiley & Sons. 26. Kumar, R., Gupta, N., Charu, S., Bansal, S., & Yadav, K., (2014). Comparison of SQL with HiveQL. International Journal for Research in Technological Studies, 1(9), 2348–1439. 27. Lans, R. F. V. D., (2006). Introduction to SQL: Mastering the Relational Database Language (Vol. 1, pp. 2–5). Addison-Wesley Professional.
Data Manipulation
175
28. Leng, C., & Terpstra, W. W., (2010). Distributed SQL queries with bubblestorm. In: From Active Data Management to Event-Based Systems and More (Vol. 1, pp. 230–241). Springer, Berlin, Heidelberg. 29. Ma, L., Zhao, D., Gao, Y., & Zhao, C., (2019). Research on SQL injection attack and prevention technology based on web. In: 2019 International Conference on Computer Network, Electronic and Automation (ICCNEA) (Vol. 1, pp. 176–179). IEEE. 30. Melton, J., & Simon, A. R., (2001). SQL: 1999: Understanding Relational Language Components (Vol. 1, pp.6–9). Elsevier. 31. Morton, K., Osborne, K., Sands, R., Shamsudeen, R., & Still, J., (2010). Core SQL. In: Pro Oracle SQL (Vol. 1, pp. 1–27). Apress. 32. Naeem, M. A., Ullah, S., & Bajwa, I. S., (2012). Interacting with data warehouse by using a natural language interface. In: International Conference on Application of Natural Language to Information Systems (Vol. 1, pp. 372–377). Springer, Berlin, Heidelberg. 33. Ntagwabira, L., & Kang, S. L., (2010). Use of query tokenization to detect and prevent SQL injection attacks. In: 2010 3rd International Conference on Computer Science and Information Technology (Vol. 2, pp. 438–440). IEEE. 34. Otair, M., Al-Sardi, R., & Al-Gialain, S., (2008). An Arabic retrieval system with native language rather than SQL queries. In: 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) (Vol. 1, pp. 84–89). IEEE. 35. Phewkum, C., Kaewchaiya, J., Kobayashi, K., & Atchariyachanvanich, K., (2019). Scramble SQL: A novel drag-and-drop SQL learning tool. In: 2019 23rd International Computer Science and Engineering Conference (ICSEC) (Vol. 1, pp. 340–344). IEEE. 36. Piyayodilokchai, H., Panjaburee, P., Laosinchai, P., Ketpichainarong, W., & Ruenwongsa, P., (2013). A 5E learning cycle approach–based, multimedia-supplemented instructional unit for structured query language. Journal of Educational Technology & Society, 16(4), 146– 159. 37. Piyayodilokchai, H., Ruenwongsa, P., Ketpichainarong, W., Laosinchai, P., & Panjaburee, P., (2011). Promoting students’ understanding of SQL in a database management course: A learning cycle approach. International Journal of Learning, 17(11), 2–6.
176
The Creation and Management of Database Systems
38. Rankins, R., Bertucci, P., Gallelli, C., & Silverstein, A. T., (2010). Microsoft SQL Server 2008 R2 Unleashed (Vol. 1, pp. 2–9). Pearson Education. 39. Renaud, K., & Biljon, J. V., (2004). Teaching SQL—Which pedagogical horse for this course?. In: British National Conference on Databases (Vol. 1, pp. 244–256). Springer, Berlin, Heidelberg. 40. Roof, L., & Fergus, D., (2003). Introduction to SQL Server CE. In: The Definitive Guide to the. NET Compact Framework (Vol. 1, pp. 453–481). Apress, Berkeley, CA. 41. Rose, J. A., (1989). Introduction to SQL, by Rick F. Van Der Lans, Addison-Wesley Publishing Company, Wokingham, England, 348pages including index, 1988 (£ 16.95). Robotica, 7(4), 365–366. 42. Sadiq, S., Orlowska, M., Sadiq, W., & Lin, J., (2004). SQLator: An online SQL learning workbench. In: Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (Vol. 1, pp. 223–227). 43. Saisanguansat, D., & Jeatrakul, P., (2016). Optimization techniques for PL/SQL. In: 2016 14th International Conference on ICT and Knowledge Engineering (ICT&KE) (Vol. 1, pp. 44–48). IEEE. 44. Sarhan, A. A., Farhan, S. A., & Al-Harby, F. M., (2017). Understanding and discovering SQL injection vulnerabilities. In: International Conference on Applied Human Factors and Ergonomics (Vol. 1, pp. 45–51). Springer, Cham. 45. Thomas J. Watson IBM Research Center. Research Division, & Denny, G. H., (1977). An introduction to SQL, a Structured Query Language, (Vol. 1, pp.4–9). 46. Warnes, G. R., Bolker, B., Gorjanc, G., Grothendieck, G., Korosec, A., Lumley, T., & Rogers, J., (2014). gdata: Various R programming tools for data manipulation. R Package Version, 2(3), 35. 47. Willis, T., (2003). Introduction to SQL Server 2000. In: Beginning SQL Server 2000 for Visual Basic Developers (Vol. 1, pp. 11–28). Apress, Berkeley, CA. 48. Zhang, H., & Zhang, X., (2018). SQL injection attack principles and preventive techniques for PHP site. In: Proceedings of the 2nd International Conference on Computer Science and Application Engineering (Vol. 1, pp. 1–9).
Data Manipulation
177
49. Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., & Wang, S., (2009). ScaMMDB: Facing challenge of mass data processing with MMDB. In: Advances in Web and Network Technologies, and Information Management (Vol. 1, pp. 1–12). Springer, Berlin, Heidelberg.
7
CHAPTER
DATABASE CONNECTIVITY AND WEB TECHNOLOGIES
CONTENTS 7.1. Introduction..................................................................................... 180 7.2. Database Connectivity..................................................................... 180 7.3. Internet Databases........................................................................... 194 7.4. Extensible Markup Language........................................................... 204 References.............................................................................................. 208
180
The Creation and Management of Database Systems
7.1. INTRODUCTION A database, as already known, is a central place for crucial corporate data. Conventional commercial applications or emerging business channels like the Internet and mobile devices such as smart phones might create this data (Wahle et al., 2009). The data should be accessible to all commercial. These users require data access through a variety of methods, including spreadsheets, Visual Basic applications, Web front ends, Microsoft Access reports and forms, and etc. The topologies that programs employ to link to databases are discussed in this chapter (Coelho et al., 2011). The Internet has altered the way businesses of all sizes function. Purchasing products and services over the Web, for instance, has become normal. Interconnectivity occurs in current’s environment not only among an application and the database, but also among applications exchanging messages and data (Baca et al., 2009). The Extensible Markup Language (XML) standardizes the exchange of structured and unstructured data amongst programs. Because the Web and databases are becoming more intertwined, database experts must understand how to design, use, and maintain Web interfaces to the databases (Migliore & Chinta, 2017).
7.2. DATABASE CONNECTIVITY This refers to the connection and communication protocols between application programs and data storage. Since it offers an interface amongst the application program and database, database connection software is called the database middleware. The data store, also called the data source, is the data analysis application, like Oracle RDBMS, IBM DBMS, or SQL Server DBMS that will be utilized to save the application program’s generated data. In an ideal world, the data source or store might be hosted everywhere and contain any sort of data. The source of data could be, for instance, a relational database, a structured database, an Excel spreadsheet, or a textual data file (Johnson, 2014). It is impossible to stress the importance of traditional database connectivity interfaces. A common database connecting interface is needed to allow programs to connect to data sources, just like SQL has become the dominant global data manipulation language. Database connectivity can be accomplished in a variety of ways. Just the following interfaces will be covered in this section (Ramakrishnan & Rao, 1997):
Database Connectivity and Web Technologies
181
a. Native SQL connectivity b. OLE-DB (Object Linking and Embedding for Database) developed by Microsoft. c. ODBC (Open Database Connectivity), DAO (Data Access Objects), and RDO (Remote Data Objects) developed by Microsoft. d. Sun’s JDBC (Java Database Connectivity). e. Microsoft’s ADO.NET (ActiveX Data Objects). Most of the interfaces are Microsoft products. Client apps, even so, link to databases, and most of such applications operate on computers that run any form of Microsoft Windows. The data communication interfaces depicted here are market leaders, and they have the backing of many database suppliers. In truth, ODBC, ADO.NET, and OLE-DB are the foundation of Microsoft’s Universal Data Access (UDA) framework, a group of techniques utilized to access and manipulate data from any kind of data source via a single interface. Database connectivity interfaces of Microsoft have grown over time: every interface builds on the previous one, delivering more capability, capabilities, flexibility, and assistance (Hackathorn, 1993).
7.2.1. Native SQL Connectivity Most DBMS manufacturers offer their own connection mechanisms for their databases. Native SQL connectivity is the connection interface offered by the database seller and exclusive to that seller. The Oracle RDBMS is the best instance of this kind of native interface. To link the application of a client to the Oracle database, the Oracle SQL*Net interface should be installed and set up on the client machine. Figure 7.1 depicts the Oracle SQL*NET interface settings on the PC of client (Ivanov & Carson, 2002). Connectivity interfaces of native database are tailored to “their” database management system (DBMS), and they provide access to the vast majority of the database’s functions. Maintaining several native interfaces for various databases, on the other hand, might be time consuming for the programmer. As a result, global database connectivity is required. In most cases, the seller’s connectivity interface of native database isn’t the only option to link to the database; most modern DBMS solutions support various standards of database connectivity, the most prevalent of which being ODBC (Brown et al., 2012).
The Creation and Management of Database Systems
182
7.2.2. ODBC, RDO, and DAO Open Database Connectivity (ODBC), created by Microsoft in the early nineties, is a specialized version of the SQL Access Group CLI (Call Level Interface) standard for access of database. ODBC is the most often used database communication interface. ODBC is a basic application programming interface (API) that permits any application of Windows to retrieve relational sources of data utilizing SQL (API). An API is defined as the group of procedures, protocols, and tools for constructing software applications by the Webopedia online dictionary (Kötter, 2004).
Figure 7.1. ORACLE intrinsic connectivity. Source:https://docs.oracle.com/en/database/oracle/oracle-database/119/dbseg/introduction-to-strong-authentication.html.
A good API simplifies software development by giving all the necessary building parts; the computer programmer assembles the blocks. Most operating systems, like Microsoft Windows, offer an API so that the developers can create programs that are compatible with the operating system. Even though APIs are developed for programmers, they eventually benefit consumers since they ensure that all apps utilizing a common API have identical user interfaces. This facilitates user learning of new programs (Stephan et al., 2001). The ODBC database middleware protocol was the first extensively used standard, and it was quickly implemented in Windows applications. Beyond the capacity to implement SQL and modify relational style data, ODBC didn’t give considerable capability as computer languages matured. As a result, programmers required a more efficient method of data access. Microsoft created two more data access APIs to meet this need (Norrie et al., 1998): •
Data Access Objects is an API based on object that allows Visual Basic programs to retrieve MS Access, dBase, and MS FoxPro
Database Connectivity and Web Technologies
183
databases. DAO offered an efficient interface that revealed the Jet data engine’s features to programmers (on which the database of MS Access is centered). Other relational-style sources of data can also be accessed using the DAO interface. • Remote Data Objects is an application interface based on object utilized to connect to distant database data servers. RDO leverages the relatively low DAO and ODBC to handle databases directly. RDO has been designed for databases based on server including Microsoft SQL Server, and DB2. Figure 7.2 shows how Windows programs can retrieve local and distant relational sources of data via ODBC, RDO, and DAO (Abdalhakim, 2009). Client programs can utilize ODBC to retrieve relational sources of data, as shown in Figure 7.2. The RDO and DAO object interfaces, on the other hand, offer more capabilities. The basic ODBC data services are used by DAO and RDO. ODBC, RDO, and DAO are executed as mutual code that is instantly connected to Windows operating system via dynamic-link libraries (DLLs), which are saved as.dll files. The code, when run as a DLL, reduces burden and run times (Quirós et al., 2018).
Figure 7.2. Utilizing ODBC, RDO, and DAO to access databases. Source: https://www.oreilly.com/library/view/ado-activex-data/155659241150/ ch01s01.html.
The Creation and Management of Database Systems
184
The fundamental ODBC design has three key modules (Kuan et al., 2015): •
An API of ODBC that application programs can use to access ODBC features. • A driver manager that is responsible of handling all connections of database. • ODBC driver that interconnects straight to the DBMS. The 1st phase in utilizing ODBC is to define the source of data. To define the source of data, one should first construct data source name (DSN) (Jansons & Cook, 2006). To develop the DSN one must provide the following: •
•
•
ODBC Driver: One must determine which driver will be used to link to the source of data. Typically, the database manufacturer supplies the ODBC driver, even though Microsoft offers many drivers that link to the majority of popular databases. If the Oracle DBMS is used, for instance, one will choose the Oracle-supplied ODBC driver for Oracle or, if needed, the Microsoft-supplied ODBC driver for Oracle. DSN Name: This is the name by which ODBC, and hence applications, will recognize the source of data. There are 2 sorts of data sources available in ODBC: user and system. Only the user has access to user data sources. All of the users, particularly operating system services, have access to system sources of data (Lamb, 2007). ODBC Driver Parameters: To link to the database, most of the ODBC drivers need specified parameters. If the database of MS Access, for instance, one should specify the position of the file of Microsoft Access and, if essential, a login and password. If the DBMS server is used, one will require to enter the name of server and database, login, and password to link to the database. The ODBC displays necessary to develop a System ODBC source of data for the Oracle DBMS are shown in Figure 7.3. It should be noted that some of the ODBC drivers rely on the native driver given by the DBMS seller.
Database Connectivity and Web Technologies
185
Figure 7.3. Setting an oracle ODBC source of data. Source: tackoverflow.com/questions/511026065/how-do-i-setup-an-odbc-connection-to-oracle-using-firedac.
Once the ODBC source of data has been created, application developers can easily write to ODBC API by executing particular commands with the necessary arguments. The ODBC driver manager will direct calls to the suitable source of data. The ODBC API standard specifies 3 degrees of conformance: Core, Level 1, and Level 2, which offer progressively greater capability. Level-1 may support the majority of SQL DML and DDL statements, such as sub-queries and aggregate operations, however procedural SQL and cursors aren’t supported. Database suppliers can pick which support level to implement. To interface with ODBC, however, the database seller should execute all of the functionality outlined in the corresponding ODBC API level of support (Stephens & Huhns, 1999). Figure 7.4 explains how to utilize ODBC to access data from Oracle RDBMS using Microsoft Excel. The usage of these interfaces was restricted when they were utilized with other kinds of data source since most of the features provided by them are targeted at accessing relational sources of data. Access to alternative non-relational sources of data has grown more relevant with the introduction of object-oriented programming languages (Huang et al., 2015).
186
The Creation and Management of Database Systems
Figure 7.4. MS Excel utilizes ODBC to link to the database of oracle. Source: https://stackoverflow.com/questions/488196768/connecting-to-oracledatabase-through-excel.
7.2.3. OLE-DB ODBC, RDO, and DAO were extensively utilized, however they didn’t s nonrelational data. Microsoft created Object Linking and Embedding for Databases to address this need and ease data communication (OLE-DB). OLE-DB is the middleware database that offers object-oriented capability for accessibility to non-relational and relational data. It is centered on Microsoft’s Component Object Model (COM). OLE-DB was the initial component of Microsoft’s ambition to offer an uniform object-oriented foundation for next-generation application advancement (Moffatt, 1996). OLE-DB is built of COM objects that offer programs with relatively low database connectivity. Due to the fact that OLE-DB is built on COM, the objects have data and techniques, also called the interface. The OLE-DB paradigm is easier to comprehend when its functionality is divided into 2 sorts of objects (Saxena & Kumar, 2012):
Database Connectivity and Web Technologies
•
187
Consumers are data-requesting objects (procedures or applications). Data consumers make requests for information by using the approaches accessible by data provider objects and giving the relevant variables. • Providers are generally objects that maintain a data source’s connection and supply data to consumers. There are 2 kinds of providers: data and service providers. • Data suppliers offer information to other processes. Database suppliers make data provider objects that reveal the underlying data source’s capability. • Consumers receive more capabilities from service providers. The location of the service provider is amongst the data provider and consumer. The service provider obtains data from a data provider, changes the data, and afterwards gives the data consumer with the changed data. In other words, the service provider operates as both a data consumer and a data provider for the consumer of data. The service provider may, for instance, provide cursor and transaction management, processing of query, and indexing services. Many suppliers use OLE-DB objects to supplement their ODBC assistance, thereby establishing a mutual object layer on top of the current database connectivity (native or ODBC) that programs may interact with (Bazghandi, 2006). The OLE-DB objects reveal database capabilities; there exists objects for relational data, structured data, and the flat-file text data, for instance. The objects also do specific functions like developing a connection, running a query, calling a stored process, describing the transaction, or calling an OLAP function. Being required to have all of the capability all of the time, the database provider can select what functionality to execute in the modular manner utilizing OLE-DB objects. Table 7.1 lists few object-oriented classes that OLE-DB uses, as well as some techniques that the objects offer (Baber & Hodgkin, 1992).
188
The Creation and Management of Database Systems
Table 7.1. Example OLE-DB Interfaces and Classes
OLE-DB added features to the apps that accessed the data. Moreover, it lacked assistance for scripting languages, particularly those employed in Web development, like Active Server Pages (ASPs) and ActiveX. (A script is created in an interpretable programming language that is implemented at runtime) (Vecchio et al., 2020). ActiveX Data Objects (ADO) offers a highlevel, interface based on application to communicate with OLE-DB, RDO, and DAO. Microsoft built this architecture to provide this capability. ADO provides a standardized interface for computer languages that use OLEDB objects to retrieve data. Figure 7.5 depicts the ADO/OLE-DB design’s interaction with native and ODBC connectivity alternatives. ADO presented a simplified object model consisting of a handful of communicating objects to deliver the data transformation services needed by applications. Table 7.2 contains examples of ADO object types (Song & Gao, 2012). Table 7.2. Example ADO Objects
Database Connectivity and Web Technologies
189
Figure 7.5. OLE-DB design. Source: http://www.differencebetween.net/technology/web-applications/difference-between-oledb-and-odbc/.
Even though the ADO paradigm is a vast advance over the OLE-DB paradigm, Microsoft is urging programmers to utilize ADO.NET, their newest data access architecture (Lee et al., 2016).
7.2.4. ADO.NET ADO.NET, which is centered on ADO, is the data access module of Microsoft’s.NET application development platform. The Microsoft. NET architecture is the component-oriented platform for creating global, heterogeneous, inter-operable applications that can manipulate any sort of data across any network and in either programming language. The aim of this book extends beyond comprehensive examination of the .NET architecture (Khannedy, 2011). Consequently, this part will solely introduce ADO.NET, the fundamental data access module of the .NET architecture. It is essential to realize that the
190
The Creation and Management of Database Systems
.NET architecture expands and improves the capability offered by ADO/ OLE-DB. DataSets and XML support are two novel technologies offered by ADO.NET that are crucial for the building of distributed applications (Hall & Kier, 2000). To see the significance of this novel approach, keep in mind that the DataSet is the memory-resident, unconnected depiction of the database. Columns, rows, tables, connections, and restrictions are all part of the DataSet. After reading data from the provider of data, the data is stored in the memory-resident DataSet, which is then separated from the data provider. The application of data consumer communicates with the DataSet object’s data to make adjustments to it. The DataSet data is synced with the source of data and the modifications are made persistent once the program is completed (Press et al., 2001). The DataSet is internally kept in XML format, and the data in it can be made permanent as XML documents. This is especially important in existing scattered contexts. Consider the DataSet to be an in-memory XML-oriented database that demonstrates the durable data saved in the source of data. The key modules of ADO.NET object - oriented paradigm are depicted in Figure 7.6 (Poo et al., 2008).
Figure 7.6. Framework of ADO.NET. Source: https://www.codeproject.com/Articles/338643/ASP-NET-Providersfor-the-ADO-NET-Entity-Framework.
Database Connectivity and Web Technologies
191
The ADO.NET architecture integrates all data access features into a single object model. Numerous objects interact within this object model to accomplish specialized data manipulation operations. These items are data suppliers and data consumers (Ternon et al., 2014). The database vendors give data provider objects. ADO.NET, on the other hand, comes with 2 fundamental data providers: 1 for OLE-DB sources of data and another for SQL Server. As a result, ADO.NET can connect to any database that has previously been supported, along with an ODBC database including an OLE-DB data provider. ADO.NET also has a SQL Server data provider that is highly efficient. To change the data in data source, the data provider should facilitate a set of specified objects. Figure 7.6 depicts several of these artifacts. Following is a quick explanation of the objects (Liu et al., 2018). •
•
•
•
•
Connection object specifies the source of data, the name of server, database, and etc. This object allows the client program to open and end database connections. Command object a database command that will be implemented within a particular database connection. This object holds the SQL or stored process to be executed by the database. Command produces a collection of columns and rows when a SELECT query is implemented. DataReader object is a customized object that establishes a read-only connection with the database in order to obtain data progressively (forward only) and quickly. DataAdapter object is responsible for the DataSet object’s management. The ADO.NET design’s most specific object is this one. The DataAdapter object includes the Commands of Select, Insert, Update, and DeleteC objects, which help manage the data in a DataSet. These objects are used by the DataAdapter object to update and integrate the data in a DataSet with the data from the persistent data source (Hodge et al., 2016). DataSet object is the database’s in-memory demonstration of the data. There are two key objects in this object. The DataTableCollection object includes an amalgamation of DataTable objects that comprise the in-memory database, and DataRelationCollection object comprises an amalgamation of objects that describe data associations and methods for associating 1 row in 1 table with the associated row in some other table.
The Creation and Management of Database Systems
192
•
DataTable object displays data in a tabular style. This object contains 1 extremely crucial property: PrimaryKey, which permits entity integrity to be enforced. The DataTable object, in turn, is made up of 3 major objects: • DataColumnCollection comprises one or more descriptions of column. Column names, data types, nulls permitted, maximum minimum values are all attributes of every column description. • DataRowCollection includes 0 rows, 1 row, or generally more than 1 row with data as explained in DataColumnCollection. • ConstraintCollection comprises the explanations of limitations for the table. 2 kinds of constraints are favored: ForeignKey and Unique Constraint (Huang et al., 2012). The DataSet is a row, table, and constraint-based database. Furthermore, the DataSet does not necessitate a continuous connection to the source of data. The SelectCommand object is used by the DataAdapter to update the DataSet from the source of data. The DataSet, therefore, is completely autonomous of the source of data after it is populated, due to which it is dubbed disconnected (Shum, 1997). Furthermore, DataTable objects in the DataSet might come from several sources of data. This implies that an Oracle database might have the EMPLOYEE table and the SQL Server database might have SALES table. Might then develop a DataSet that connects the 2 tables as if they were in the very same database (Harris & Reiner, 1990). In summary, the DataSet object provides a pathway for applications to assist truly diverse distributed databases. The ADO.NET architecture is designed to perform well in disconnected situations. Applications send and receive messages in request/reply style in the disconnected environment. The Internet is by far the most usual instance of the disconnected system. The Internet serves as the network platform for modern programs, while the Web browser serves as the GUI (graphical user interface) (Deng et al., 2010).
7.2.5. Connectivity of Java Database Sun Microsystems created Java, an object-oriented computer language that functions on top of the Web browser software. Among the most widely used programming languages for development of Web is Java. Java was established by Sun Microsystems to be a write once, execute anywhere environment. That is, a programmer can build a Java application once and then execute it in numerous settings without making any changes (Apple
Database Connectivity and Web Technologies
193
OS X, Microsoft Windows, IBM AIX.). Java’s portable design underpins its cross-platform abilities (Robinson et al., 2010). Applets are pre-processed bits of Java code that execute in the environment of virtual machine on the operating system of host. The borders of this environment are clearly defined, and interaction with the operating system of host is strictly monitored. For most of the operating systems, Sun supplies Java runtime environments. Another benefit of Java is that it has a on-demand design. When the Java application starts up, it can use the Internet to adaptively download all of its modules or needed components (Regateiro et al., 2017). Java applications employ established APIs to obtain data outside of the Java execution environment. It is an API that permits the Java program to deal with the variety of sources of data (relational databases, tabular sources of data, spreadsheets, and the text files). JDBC enables the Java program to link to a source of data, prepare, and transmit SQL code to the server of database, and evaluate the query results (Coetzee & Eloff, 2002; Luo et al., 2018). One of the primary benefits of JDBC is that it permits the firm to capitalize on its existing investments in technology and staff training. JDBC enables programmers to process data in the company’s databases using their SQL expertise. In truth, JDBC supports both direct connection to the server of database and access through database middleware. Moreover, JDBC allows users to link to database via an ODBC driver. Figure 7.7 depicts the fundamental JDBC design as well as the numerous database access patterns (Li & Yang, 2020). The components and operation of any database access middleware are comparable. One benefit of JDBC as compared to the other middleware is that it doesn’t need any client configuration. As component of the Java applet download, the JDBC driver is downloaded automatically and installed. Sic Java is a Web-oriented platform, apps can utilize a simple URL to link to the database. When the URL is entered, the Java design kicks in, the appropriate applets (such as database driver of JDBC and all of the configuration information) are transferred to the client, and the applets are safely implemented in the operating environment of a client (Manzotti et al., 2015; Ng et al., 2009).
194
The Creation and Management of Database Systems
Figure 7.7. Framework of JDBC. Source: https://www.tutorialspoint.com/jdbc/jdbc-introduction.htm.
Every day, more organizations invest money in building and extending their Web presence, as well as finding new ways to do business online. This type of business will produce a growing quantity of information, which will be kept in databases. Java and the.NET architecture are examples of how businesses are increasingly relying on the Web as a crucial business resource. In reality, the Internet is expected to become the next development platform. The sections below will teach more about Web databases and how to use them (Chen et al., 2000).
7.3. INTERNET DATABASES Millions of individuals use the Internet to link to databases through Web browsers or services of data all over the globe (i.e., utilizing the smart phone apps to obtain information about weather). The ability to link to databases over the internet opens the door to novel advanced services that (Minkiewicz et al., 2016):
Database Connectivity and Web Technologies
195
•
Allow for quick reactions to competitive demands by quickly bringing novel products and services to market; • Improve client happiness by establishing Web-oriented support services; • Permit mobile smart devices to access data from anywhere and at any time through the Internet; and • Ensure that information is disseminated quickly and effectively by providing global access from all over the street or around the world. Given these benefits, numerous firms rely on their information technology teams to develop UDA designs centered on Internet protocols. Table 7.3 displays a selection of Internet technology features and the advantages they offer (Van Steensel & Winter, 1998). Table 7.3. Features and Advantages of Internet Technologies
It is simple to see why various database experts believe the DBMS link to the Internet to be the vital aspect in IS development in today’s business and worldwide information environment. The Web has a substantial impact on database application development, particularly the development and administration of user interfaces and the connectivity of database, as described in the next sections. Having a Web-oriented database interface, on the other hand, doesn’t invalidate the database development and execution challenges. In the end, whether you buy anything online or in person, the
196
The Creation and Management of Database Systems
system-level details of transaction are fundamentally the same, and they both necessitate the same fundamental database structures and associations. If there’s one thing to remember right now, it’s this (Liu et al., 2003): The consequences of poor database framework, execution, and maintenance are magnified in an environment where transactions are counted in hundreds of thousands instead of hundreds each day (Shahabi et al., 2000). The Internet is quickly altering the generation, accessibility, and distribution of information. At the heart of this transformation are the Web’s capacity to access (local and distant) database data, the interface’s simplicity, and cross-platform capabilities. The Web has contributed to the development of a novel information distribution standard. The sections that follow investigate how Web to database middleware empowers end users to communicate with databases over the internet are following (Ouzzani et al., 2000).
7.3.1. Web-to-Database Middleware Generally, the Web server serves as the central hub for all Internet services. The client browser, for instance, requests the Web page when the end user utilizes the Web browser to interactively query a database. When the server of Web accepts the page request, it searches the hard disk for page; after it locates it (for instance, the stock quote or information of product catalog), it returns it to the client (Yerneni et al., 1998). Modern Web sites rely heavily on dynamic Web pages. The Web server constructs the contents of Web page in this database-query circumstance before sending the page to the Internet browser of client. The only issue with the previous query context is that the database query outcome should be included on the webpage before it is returned to a client. Regrettably, neither the Web browser nor even the Web server understands how to link to and read information from the database. To enable this form of request, the Web server’s capacity should be expanded to interpret and execute database requests. This task is carried out via a server side extension (Blinowska & Durka, 2005). The server-side extension is an application that communicates directly with a Web server to process certain kinds of requests. In the previous database query instance, the server-side extension application recovers the information from databases and transmits it to Web server, which then
Database Connectivity and Web Technologies
197
delivers it to the client’s browser for display. This extension enables retrieval and presentation of query outcomes, but more importantly, it ensures that (Xu, 2012). It offers its services to Web server in a completely transparent manner to the client browser. In summary, the server-side extension significantly enhances the capability of the server of Web and, by extension, the Internet (Minkiewicz et al., 2015). Web to database middleware is another term for the database server-side extension software. The communication amongst the browser, Web server, and the Web to database middleware is depicted in Figure 7.8. Figure 7.8 shows the Web to database middleware activities (Khan et al., 2001): • •
A page request is sent to Web server by the browser of client. The request is received and validated by the Web server. The server will then forward the query to Web to database middleware to be processed. In most cases, the queried page includes a scripting language that enables database communication.
Figure 7.8. Web-to-database middleware. Source: https://flylib.com/books/en/2.6772.1.65/1/.
The Creation and Management of Database Systems
198
•
The script is read, authenticated, and executed by the Web to database middleware. In this instance, the connectivity layer of database is used to link to the database and execute the query. • The request is executed by the server of database, and the outcome is returned to the Web to database middleware (Sylvester, 1997). • The Web to database middleware integrates the query results, then interactively builds an HTML-formatted page using the data of database and transmits it to the Internet server. • The server of Web sends the client browser the newly produced HTML page, which already comprises the result set. • The page is shown on the local machine by the client browser. The communication amongst the Web server and Web to the database middleware is essential for the proper deployment of an Internet database. Thus, the middleware should be well-integrated with another Internet services and use-related components. For instance, while deploying Web to the database middleware, the software should validate the Web server kind and install itself according to the Web server’s specifications. Additionally, the Web server interfaces offered by Web server will determine how successfully the Web server and Web to database service communicate (Bouguettaya et al., 1999).
7.3.2. Web Server Interfaces Enhancing Web server capabilities necessitates proper communication between the Web server and Web to database middleware. (Interoperate is a term used by database experts to describe how one party can reply to the interactions of the other.) The word communicate is used in this book to imply collaboration.) If the Web server and an exterior software are to communicate effectively, both programs should adopt a standard method of exchanging messages and responding to requests. The way a Web server interfaces with external programs is defined by the Web server interface. There are 2 well-described Web server interfaces available at the moment (de Leeuw et al., 2012): • Common Gateway Interface (CGI) • Application programming interface (API) The CGI employs script files that execute certain functions centered on the parameters given to the server of Web by the client. The script file is a tiny program that contains commands written in the computer programming
Database Connectivity and Web Technologies
199
language, most commonly Perl, Visual Basic, or C#. Using the parameters given by the Web server, the information of the script file can be utilized to link to the database and get data from it. Following that, the script changes the extracted data to HTML style and transmits it to Internet server, which delivers the HTML-styled page to the client (Meng et al., 1998). The primary drawback of utilizing CGI scripts would be this document is just an external application that is performed independently for every user query. This situation reduces system efficiency. For instance, if there are 200 continuous requests, the script will be loaded 2 hundred times, consuming substantial CPU and memory on the Web server. The script’s language and creation process can also effect system performance. Using an interpretive language or constructing the script incompetently, for instance, can reduce performance (Mohr et al., 1998). A novel Web server interface standard called API is more effective and quicker as compared to the CGI script. Because APIs are executed as shared code or dynamic-link frameworks, they are more effective (DLLs). That is, the API is considered as if it were a component of a Web server program, which is called dynamically as required (Van Dulken 1999). APIs are quicker as compared to CGI scripts as the code is stored in memory, eliminating the need to run a separate program for every request. Rather, all requests are served by the same API. One more benefit is that the API can utilize the mutual connection to database rather than building a novel one each time, as done by CGI scripts. Even though APIs handle requests more efficiently, they have significant drawbacks. Since the APIs and the Web server use the similar memory space, an API issue can bring this server down. Another issue is that APIs are unique to Internet server and the operating system. At the time of writing, there are 4 well-known Web server APIs (de Macedo et al., 2008): •
ISAPI (Internet Server API) for Web servers of Microsoft Windows. • WSAPI (WebSite API) for O’Reilly Web servers. • JDBC to offer database connectivity for applications of Java. The several kinds of Web interfaces are demonstrated in Figure 7.9 (Lemkin et al., 2005).
The Creation and Management of Database Systems
200
Figure 7.9. Web server API and CGI interfaces. Source: https://frankfu.click/database/database-system-design-management/ chapter-15-database-and-internet/.
Web to the database middleware program should be capable to link to the database irrespective of the kind of Web server interface utilized. One of 2 methods can be used to make that connection (Ou & Zhang, 2006): • •
Make use of the vendor’s native SQL access middleware. If you’re using Oracle, you can, for instance, use SQL*Net. Use Open Database Connectivity (ODBC), OLE-DB (Object Linking and Embedding for Database), ActiveX Data Objects (ADO), ADO.NET (ActiveX Data Objects for .NET) interface, or Java Database Connectivity (JDBC) for Java.
7.3.3. The Web Browser It is an application program on the computer of client that allows end users to browse the Internet, like Microsoft Internet Explorer, or Mozilla Firefox. When the visitors click on the hyperlink, this browser sends an HTTP GET page request over the TCP/IP Internet protocol to the chosen Web server
Database Connectivity and Web Technologies
201
(Mai et al., 2014). The task of the Internet browser is to read the HTML code received from the Web server and display the several page modules in the standard standardized manner. Consequently, the browser’s ability to analyze and present information is insufficient for developing Web-oriented apps. This is due to the fact that the Internet is indeed a stateless system, which means that the Web server doesn’t check the update of any client connecting with it at any time (Bradley, 2003). Such that, there isn’t any open communication line amongst the server and every client who connects to it, which is obviously impracticable on the Internet! Rather, client, and computers of server communicate in brief “conversations” based on the request-response concept. Because the browser is only involved with the existing page, the second page has no means of knowing what was carried out on the first. The only time the server computers and client connect is when the client asks a page and the server transmits the desired page to a client after the user clicks on a link (Falk, 2005). The server/client communication is terminated after the page and its modules are received by the client. As a result, while one might be surfing a page and believe that interaction is open, one is actually merely accessing the HTML document saved in browser’s local cache (temporary directory). The server has no notion what the consumer is doing with the page, what information is inserted in the form, which option is chosen, and so on. To respond to the client’s choices on Web, one must navigate to the novel page, losing sight of what was done previously (Kiley, 1997)! The role of a Internet browser is to present the page on the computer of client. By utilizing HTML, the browser has no computing capabilities besides styling output script and receiving form field inputs. Even while the browser validates form field data, rapid data entry validation isn’t possible. Web deviates to some other Web programming languages like Java, VBScript, and JavaScript to conduct such important client-side processing (Chen et al., 1996). The browser looks like a dumb terminal that can only present data and conduct simple processing like receiving form data inputs. Plug-ins as well as other client-side extensions are required to enhance the functionality of the Web browser. Web application servers offer the essential computing power on the side of server (Bouguettaya et al., 1998).
202
The Creation and Management of Database Systems
7.3.4. Client-Side Extensions Client-side extensions extend the capabilities of the browser. Even though client-side extensions come in a variety of shapes and sizes, the following are the most typical ones (Kerse et al., 2001): • JavaScript and Java. • Plug-ins. • VBScript and Active X. The plug-in is an exterior application that the web browser invokes when necessary. The plug-in is a specific operating system as it is an exterior application. To permit the Web server to appropriately manage data that wasn’t originally assisted, the plug-in is linked with the data object, typically using the file extension. For instance, if one of page module is the PDF document, the server of Web will receive information, identify it as the Portable Document Format object, and start Adobe Acrobat Reader on the computer of client to display the document (Jacobsen & Andenæs, 2011). Java executes on the top of Web browser software, as previously stated. The Web server compiles and stores Java programs. (Java is similar to C++ in numerous ways.) Java procedures are called from within the HTML page. When the browser discovers this call, it downloads and executes the Java classes \from the Web server on the client machine. The fundamental benefit of Java is that it allows application developers to create apps once and operate them in several environments. (Interoperability is a critical consideration when creating Web apps.) Consequently, diverse client browsers aren’t always compatible, restricting portability (Gordon, 2003). JavaScript is the scripting language (a language that facilitates the execution of a series of instructions or macros) that permits Web designers to create interactive sites. JavaScript is simpler to comprehend as compared to Java since it is simpler to generate. The Web pages contain JavaScript code. It is downloaded along with the Web page and is engaged when a specified event occurs, like a mouse click on the item or a page being retrieved into memory from the server (Fatima & Wasnik, 2016). ActiveX is Microsoft’s Java replacement. ActiveX is a set of guidelines for creating programs that operate in Microsoft’s client browser. ActiveX has limited portability since it is primarily designed for Windows applications. By introducing controls to Web pages, ActiveX broadens the Web browser. Drop-down lists, calendar, slider, and the calculator are instances of such controls) (Carmichael, 2002). These controls, which are periodically
Database Connectivity and Web Technologies
203
retrieved from the Web server, allow to process the data within the browser. ActiveX controls can generally be written in a variety of programming languages, the most popular of which are C++ and Visual Basic. The .NET framework from Microsoft permits ActiveX-oriented applications (like ADO.NET) to work across various operating systems (Braxton et al., 2003). One more Microsoft tool used to enhance browser capabilities is VBScript. Microsoft Visual Basic is the source of VBScript. VBScript code, such as JavaScript, is embedded into an HTML page and is launched by triggering events like clicking a link. From the developer’s perspective, utilizing routines that allow validation of data on the side of client is a must (Dubey & Chueh, 1998). When data is inserted on a Web form, for instance, and no data analysis is performed on the side of client, the complete data set should be submitted to the Web server. In that case, the server must complete all data analysis, wasting significant CPU cycles of processing. As a result, among the most fundamental elements for Web applications is the clientside data input va. The vast majority of data validation functions are written in Java, JavaScript, VBScript, or Active X (Liu et al., 1998).
7.3.5. Web Application Servers Web application server is the middleware application that connects Web servers to a variety of services including databases, search engines, and directory systems. A consistent operating environment for Web applications is provided by the Web application server. These servers can be utilized to do the following (BL Bard & Davies, 1995): • •
Link to and request a database from the Web page. Display database information in the Web page, utilizing numerous formats. • Develop vibrant Web search pages. • Develop Web pages to fill, refresh, and delete database information. • In an application program logic, ensure referential integrity. • To express business rules, employ simple and layered queries as well as programming logic. Web application servers offer characteristics like (Etzioni & Weld, 1995): •
The integrated development environment that supports permanent application variables and manages sessions.
The Creation and Management of Database Systems
204
• •
User identification and security via user IDs and passwords. Computational languages are used in an application server to specify and store logic of business. • Automatic HTML page production using Java, JavaScript, ASP, VBScript, and other programming languages. • Performance and fault-tolerance characteristics • Access to a database with transaction managerial skills. • FTP (File transfers), connectivity of database, and e-mail are just a few of the capabilities available. As of this chapter, major Web application servers comprise Adobe’s ColdFusion/JRun, Oracle’s WebLogic Server, IBM’s WebSphere Application Server, NetObjects’ Fusion, and Apple’s WebObjects. All of the Web application servers allow to link to various sources of data and other services. They differ in terms of the capabilities they offer, their resilience, scalability, convenience of use, compliance with other Web and tools of database, and the scope of the development platform (Di Martino et al., 2019). Present-generation systems entail more than merely Web-enabled database application development. In addition, they need apps capable of connecting with one another and with non-Web-based systems. Obviously, systems should be able to share data in a format centered on industry standards. That is the function of XML (Bray et al., 2000).
7.4. EXTENSIBLE MARKUP LANGUAGE The Internet has ushered in novel technologies that make it easier for business associates and consumers to share information. Companies are utilizing the Internet to develop new system that combines their data in order to boost effectiveness and lower expenses. Electronic commerce allows businesses of all sizes to advertise and sell their goods and services to millions of people across the world. Business to business (B2B) or business to consumer (B2C) e-commerce transactions can occur amongst businesses or among businesses and consumers (Friedman et al., 1999). The majority of electronic-commerce transactions are among businesses. Since B2B electronic-commerce connects business operations between organizations, it necessitates the exchange of business data between various corporate entities. However, how firms represent, recognize, and utilize data varies greatly from one another (Bryan, 1998).
Database Connectivity and Web Technologies
205
Until recent times, it was expected that a Web-based purchase order might take the shape of an HTML document. The HTML Web page shown by a Web browser might have both formatting and order information. HTML tags govern how something appears on a Web page, like bold text or header style, and frequently appear in pairs to begin and conclude formatting characteristics (Sperberg-McQueen, 2000). There is no straightforward way to obtain details of order (like the order number, date, customer number, item, quantity, price, or details of payment) from an HTML page if an application wishes to receive order information from the Web page. The HTML page can only specify how to show the order in the browser; it can’t change the order’s data elements, such as the date, information of shipping, payment, and, and so on. A novel markup language called XML, was created to tackle this problem (Castan et al., 2001). The XML is a meta-language for representing and manipulating data elements. XML is intended to make it easier to interchange structured documents like orders and receipts over the Internet. In 1998, World Wide Web Consortium (W3C)1 released the 1st XML 1.0 standard specification. That standard laid the groundwork for XML to have the real-world attractiveness of being a genuine vendor-independent foundation. It’s not strange, then, that XML has quickly become the data interchange standard for e-commerce systems (Sanin & Szczerbicki, 2006). The XML metalanguage enables the creation of news elements, like (ProdPrice), to define the data items in an XML document. The X in XML refers to the language’s potential to be extended; it is considered to be expandable. The Standard Generalized Markup Language (SGML) is a global standard for the manufacture and supply of very complicated technical reports. XML is evolved from SGML. The aviation sector and the military services, for instance, utilize documents that are too sophisticated and bulky for the Internet. The XML document is the text file, similar to HTML, which was originated from SGML. It does, however, have a few crucial extra traits, which are as follows (Huh, 2014): • • •
XML permits the explanation of novel tags to define data elements, like (ProductId). XML is case sensitive, thus (ProductID) and (Productid) aren’t the same thing. XMLs should be well-created, which means tags should be formatted correctly. The majority of openings have a comparable closing. For instance, the product identifier format might be
206
The Creation and Management of Database Systems
ProductId>2345-AA. • The nesting of XMLs should be done correctly. A correctly layered XML would look something like this: Product>ProductId>2345AA. • The symbols can be used to add comments to an XML document. • Only XMLs are allowed to use the XML and xml prefixes. XML isn’t a novel version of HTML or a substitute for it. XML is engaged with the definition and processing of information instead of data the presentation of data. XML offers the principles that allow structured documents to be shared, exchanged, and manipulated across organizational boundaries (Yoshida et al., 2005). XML and HTML serve complementing services instead of overlapping ones. XHTML Extensible Hypertext Markup Language) is HTML’s next generation, built on the XML architecture. The XHTML standard adds XML functionality to the HTML standard. Even though more sophisticated than HTML, XHTML has relatively severe syntactic constraints (Vuong et al., 2001). Assume a B2B scenario in which Company A utilizes XML to swap item data with Company B over through the Web as an instance of how XML may be used for data sharing. The details of ProductList.xml document are shown in Figure 7.10 (Baski & Misra, 2011).
Figure 7.10. The productlist.xml page’s contents. Source: https://www.slideshare.net/SaranyaPsg/1-xml-fundamentals-1972357783.
Database Connectivity and Web Technologies
207
The XML instance in Figure 7.10 demonstrates the following fundamental XML characteristics (Berman, 2005): •
The 1st line is required and specifies the XML document declaration. • A root element exists in each XML document. The 2nd line in the instance declares the ProductList root component. • The root element is made up of subsidiary elements or subelements. Line 3 of the sample specifies Product as a subsidiary element of ProductList. • Subelements can be found within every element. Every Product element, for instance, is made up of multiple subsidiary elements, such as P CODE, P DESCRIPT, P INDATE, P QOH, P MIN, and P PRICE. • The XML document has a hierarchical tree structure with elements associated in the parent-child association; every parent element can possess several child elements. The root element, for instance, is ProductList. ProductList has the child element called Product. P CODE, P INDATE, P DESCRIPT, P MIN, P QOH, and P PRICE are the 6 child components of product (Yousfi et al., 2020). If Company B comprehends the tags generated by Company A, it can execute the ProductList.xml document once it receive it. The interpretation of the XMLs in Figure 7.10 is truly self evident, but there isn’t any straightforward path to validate the data or confirm whether the data is full. For instance, one might come across a P INDATE value of 25/14/2009—but is that the accurate value? What if Company B also anticipates a vendor component? How many businesses communicate descriptions of data for business data elements? The following section will demonstrate how Document kind Definitions and XML standards are utilized to overcome these issues (Katehakis et al., 2001).
208
The Creation and Management of Database Systems
REFERENCES 1.
Abdalhakim, H., (2009). Addressing burdens of open database connectivity standards on the users. In: 2009 Third International Symposium on Intelligent Information Technology Application Workshops (Vol. 1, pp. 305–308). IEEE. 2. Baber, J. C., & Hodgkin, E. E., (1992). Automatic assignment of chemical connectivity to organic molecules in the Cambridge structural database. Journal of Chemical Information and Computer Sciences, 32(5), 401–406. 3. Baca, A., Dabnichki, P., Heller, M., & Kornfeind, P., (2009). Ubiquitous computing in sports: A review and analysis. Journal of Sports Sciences, 27(12), 1335–1346. 4. Baski, D., & Misra, S., (2011). Metrics suite for maintainability of extensible markup language web services. IET Software, 5(3), 320– 341. 5. Bazghandi, A., (2006). Web database connectivity methods (using MySQL) in windows platform. In: 2006 2nd International Conference on Information & Communication Technologies (Vol. 2, pp. 3577– 3581). IEEE. 6. Berman, J. J., (2005). Pathology data integration with eXtensible markup language. Human Pathology, 36(2), 139–145. 7. Bl Bard, J., & Davies, J. A., (1995). Development, databases and the internet. BioEssays, 17(11), 999–1001. 8. Blinowska, K. J., & Durka, P. J., (2005). Efficient application of internet databases for new signal processing methods. Clinical EEG and Neuroscience, 36(2), 123–130. 9. Bouguettaya, A., Benatallah, B., & Edmond, D., (1998). Reflective data sharing in managing internet databases. In: Proceedings 18th International Conference on Distributed Computing Systems (Cat. No. 98CB36183) (Vol. 1, pp. 172–181). IEEE. 10. Bouguettaya, A., Benatallah, B., Ouzzani, M., & Hendra, L., (1999). Using java and CORBA for implementing internet databases. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337) (Vol. 1, pp. 218–227). IEEE. 11. Bradley, G., (2003). Introduction to extensible markup language (XML) with operations research examples. Newletter of the INFORMS Computing Society, 24(1), 1–20.
Database Connectivity and Web Technologies
209
12. Braxton, S. M., Onstad, D. W., Dockter, D. E., Giordano, R., Larsson, R., & Humber, R. A., (2003). Description and analysis of two internetbased databases of insect pathogens: EDWIP and VIDIL. Journal of Invertebrate Pathology, 83(3), 185–195. 13. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F., & Cowan, J., (2000). Extensible Markup Language (XML), 1, 3–7. 14. Brown, J. A., Rudie, J. D., Bandrowski, A., Van, H. J. D., & Bookheimer, S. Y., (2012). The UCLA multimodal connectivity database: A webbased platform for brain connectivity matrix sharing and analysis. Frontiers in Neuroinformatics, 6, 28. 15. Bryan, M., (1998). An introduction to the extensible markup language (XML). Bulletin of the American Society for Information Science, 25(1), 11–14. 16. Carmichael, P., (2002). Extensible markup language and qualitative data analysis. In: Forum Qualitative Sozialforschung/Forum: Qualitative Social Research (Vol. 3, No. 2). 17. Castan, G., Good, M., & Roland, P., (2001). Extensible markup language (XML) for music applications: An introduction. The Virtual Score, 12, 95–102. 18. Chen, H., Schuffels, C., & Orwig, R., (1996). Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1), 88–102. 19. Chen, J., DeWitt, D. J., Tian, F., & Wang, Y., (2000). NiagaraCQ: A scalable continuous query system for internet databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Vol. 1, pp. 379–390). 20. Coelho, P., Aguiar, A., & Lopes, J. C., (2011). OLBS: Offline location based services. In: 2011 Fifth International Conference on Next Generation Mobile Applications, Services and Technologies (Vol. 1, pp. 70–75). IEEE. 21. Coetzee, M., & Eloff, J., (2002). Secure database connectivity on the www. In: Security in the Information Society (Vol. 1, pp. 275–286). Springer, Boston, MA. 22. De Leeuw, N., Dijkhuizen, T., Hehir‐Kwa, J. Y., Carter, N. P., Feuk, L., Firth, H. V., & Hastings, R., (2012). Diagnostic interpretation of array data using public databases and internet sources. Human Mutation, 33(6), 930–940.
210
The Creation and Management of Database Systems
23. De Macedo, D. D., Perantunes, H. W., Maia, L. F., Comunello, E., Von, W. A., & Dantas, M. A., (2008). An interoperability approach based on asynchronous replication among distributed internet databases. In: 2008 IEEE Symposium on Computers and Communications (Vol. 1, pp. 658–663). IEEE. 24. Deng, Y., Tanga, Z., & Yunhua, C. M. C., (2010). Information integration based on open geospatial database connectivity specification. In: ISPRS Technical Commission IV, ASPRS/CaGIS 2010 Fall Specialty Conference (Vol. 1, pp. 4–8). 25. Di Martino, S., Fiadone, L., Peron, A., Riccabone, A., & Vitale, V. N., (2019). Industrial internet of things: Persistence for time series with NoSQL databases. In: 2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE) (Vol. 1, pp. 340–345). IEEE. 26. Dubey, A. K., & Chueh, H., (1998). Using the extensible markup language (XML) in automated clinical practice guidelines. In: Proceedings of the AMIA Symposium (Vol. 1, p. 735). American Medical Informatics Association. 27. Etzioni, O., & Weld, D. S., (1995). Intelligent agents on the internet: Fact, fiction, and forecast. IEEE Expert, 10(4), 44–49. 28. Falk, H., (2005). State library databases on the internet. The Electronic Library, 23(4), 492–498. 29. Fatima, H., & Wasnik, K., (2016). Comparison of SQL, NoSQL and NewSQL databases for internet of things. In: 2016 IEEE Bombay Section Symposium (IBSS) (Vol. 1, pp. 1–6). IEEE. 30. Friedman, C., Hripcsak, G., Shagina, L., & Liu, H., (1999). Representing information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association, 6(1), 76–87. 31. Gordon, A., (2003). Terrorism and knowledge growth: A databases and internet analysis. In: Research on Terrorism (Vol. 1, pp. 124–138). Routledge. 32. Hackathorn, R. D., (1993). Enterprise Database Connectivity: The Key to Enterprise Applications on the Desktop (Vol. 1, pp. 4–8). John Wiley & Sons, Inc. 33. Hall, L. H., & Kier, L. B., (2000). Molecular connectivity chi indices for database analysis and structure-property modeling. In: Topological
Database Connectivity and Web Technologies
34.
35.
36.
37.
38. 39.
40.
41.
42.
43.
211
Indices and Related Descriptors in QSAR and QSPAR (Vol. 1, pp. 317– 370). CRC Press. Harris, P., & Reiner, D. S., (1990). The lotus DataLens approach to heterogeneous database connectivity. IEEE Data Eng. Bull., 13(2), 46–51. Hodge, M. R., Horton, W., Brown, T., Herrick, R., Olsen, T., Hileman, M. E., & Marcus, D. S., (2016). Connectome DB—sharing human brain connectivity data. Neuroimage, 124, 1102–1107. Huang, H., Nguyen, T., Ibrahim, S., Shantharam, S., Yue, Z., & Chen, J. Y., (2015). DMAP: A connectivity map database to enable identification of novel drug repositioning candidates. In: BMC Bioinformatics (Vol. 16, No. 13, pp. 1–11). BioMed Central. Huang, H., Wu, X., Pandey, R., Li, J., Zhao, G., Ibrahim, S., & Chen, J. Y., (2012). C2Maps: A network pharmacology database with comprehensive disease-gene-drug connectivity relationships. BMC Genomics, 13(6), 1–14. Huh, S., (2014). Coding practice of the journal article tag suite extensible markup language. Sci. Ed., 1(2), 105–112. Ivanov, S., & Carson, J. H., (2002). Database connectivity. In: New Perspectives on Information Systems Development (Vol. 1, pp. 449– 460). Springer, Boston, MA. Jacobsen, H. E., & Andenæs, R., (2011). Third year nursing students’ understanding of how to find and evaluate information from bibliographic databases and internet sites. Nurse Education Today, 31(8), 898–903. Jansons, S., & Cook, G. J., (2006). Web-enabled database connectivity: A comparison of programming, scripting, and application-based access. Information Systems Management, 1, 4–6. Johnson, R. A., (2014). Java database connectivity using SQLite: A tutorial. International Journal of Information, Business and Management, 6(3), 207. Katehakis, D. G., Sfakianakis, S., Tsiknakis, M., & Orphanoudakis, S. C., (2001). An infrastructure for integrated electronic health record services: The role of XML (extensible markup language). Journal of Medical Internet Research, 3(1), e826.
212
The Creation and Management of Database Systems
44. Kerse, N., Arroll, B., Lloyd, T., Young, J., & Ward, J., (2001). Evidence databases, the internet, and general practitioners: The New Zealand story. New Zealand Medical Journal, 114(1127), 89. 45. Khan, L., Mcleod, D., & Shahabi, C., (2001). An adaptive probe-based technique to optimize join queries in distributed internet databases. Journal of Database Management (JDM), 12(4), 3–14. 46. Khannedy, E. K., (2011). MySQL Dan Java Database Connectivity (Vol. 1, pp. 4–9). Bandung: StripBandung. 47. Kiley, R., (1997). Medical databases on the internet: Part 1. Journal of the Royal Society of Medicine, 90(11), 610–611. 48. Kötter, R., (2004). Online retrieval, processing, and visualization of primate connectivity data from the CoCoMac database. Neuroinformatics, 2(2), 127–144. 49. Kuan, L., Li, Y., Lau, C., Feng, D., Bernard, A., Sunkin, S. M., & Ng, L., (2015). Neuroinformatics of the Allen mouse brain connectivity atlas. Methods, 73(1), 4–17. 50. Lamb, J., (2007). The connectivity map: A new tool for biomedical research. Nature Reviews Cancer, 7(1), 54–60. 51. Lee, J. M., Kyeong, S., Kim, E., & Cheon, K. A., (2016). Abnormalities of inter-and intra-hemispheric functional connectivity in autism spectrum disorders: A study using the autism brain imaging data exchange database. Frontiers in Neuroscience, 10(1), 191. 52. Lemkin, P. F., Thornwall, G. C., & Evans, J., (2005). Comparing 2-D electrophoretic gels across internet databases. The Proteomics Protocols Handbook, 1, 279–305. 53. Li, Z., & Yang, L., (2020). Underlying mechanisms and candidate drugs for COVID-19 based on the connectivity map database. Frontiers in Genetics, 11, 558557. 54. Liu, T. P., Hsieh, Y. Y., Chou, C. J., & Yang, P. M., (2018). Systematic polypharmacology and drug repurposing via an integrated L1000based connectivity map database mining. Royal Society Open Science, 5(11), 181321. 55. Liu, X., Liu, L. C., Koong, K. S., & Lu, J., (2003). An examination of job skills posted on internet databases: Implications for information systems degree programs. Journal of Education for Business, 78(4), 191–196.
Database Connectivity and Web Technologies
213
56. Liu, Z., Du, X., & Ishii, N., (1998). Integrating databases in internet. In: 1998 Second International Conference. Knowledge-Based Intelligent Electronic Systems: Proceedings KES’98 (Cat. No. 98EX111) (Vol. 3, pp. 381–385). IEEE. 57. Luo, B., Gu, Y. Y., Wang, X. D., Chen, G., & Peng, Z. G., (2018). Identification of potential drugs for diffuse large b-cell lymphoma based on bioinformatics and connectivity map database. PathologyResearch and Practice, 214(11), 1854–1867. 58. Mai, P. T. A., Nurminen, J. K., & Di Francesco, M., (2014). Cloud databases for internet-of-things data. In: 2014 IEEE International Conference on Internet of Things (iThings), and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) (Vol. 1, pp. 117–124). IEEE. 59. Manzotti, G., Parenti, S., Ferrari-Amorotti, G., Soliera, A. R., Cattelani, S., Montanari, M., & Calabretta, B., (2015). Monocyte-macrophage differentiation of acute myeloid leukemia cell lines by small molecules identified through interrogation of the connectivity map database. Cell Cycle, 14(16), 2578–2589. 60. Meng, W., Liu, K. L., Yu, C., Wang, X., Chang, Y., & Rishe, N., (1998). Determining Text Databases to Search in the Internet, 1, 4, 5. 61. Migliore, L. A., & Chinta, R., (2017). Demystifying the big data phenomenon for strategic leadership. SAM Advanced Management Journal, (07497075), 82(1). 62. Minkiewicz, P., Darewicz, M., Iwaniak, A., Bucholska, J., Starowicz, P., & Czyrko, E., (2016). Internet databases of the properties, enzymatic reactions, and metabolism of small molecules—Search options and applications in food science. International Journal of Molecular Sciences, 17(12), 2039. 63. Minkiewicz, P., Iwaniak, A., & Darewicz, M., (2015). Using internet databases for food science organic chemistry students to discover chemical compound information. Journal of Chemical Education, 92(5), 874–876. 64. Moffatt, C., (1996). Designing client-server applications for enterprise database connectivity. In: Database Reengineering and Interoperability (Vol. 1, pp. 215–234). Springer, Boston, MA. 65. Mohr, E., Horn, F., Janody, F., Sanchez, C., Pillet, V., Bellon, B., & Jacq, B., (1998). FlyNets and GIF-DB, two internet databases for
214
66.
67.
68. 69.
70.
71.
72.
73.
74. 75.
The Creation and Management of Database Systems
molecular interactions in Drosophila melanogaster. Nucleic Acids Research, 26(1), 89–93. Ng, C. K., White, P., & McKay, J. C., (2009). Development of a web database portfolio system with PACS connectivity for undergraduate health education and continuing professional development. Computer Methods and Programs in Biomedicine, 94(1), 26–38. Norrie, M. C., Palinginis, A., & Wurgler, A., (1998). OMS connect: Supporting multidatabase and mobile working through database connectivity. In: Proceedings 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No. 98EX122) (Vol. 1, pp. 232–240). IEEE. Ou, C., & Zhang, K., (2006). Teaching with databases: Begin with the internet. TechTrends, 50(5), 46. Ouzzani, M., Benatallah, B., & Bouguettaya, A., (2000). Ontological approach for information discovery in internet databases. Distributed and Parallel Databases, 8(3), 367–392. Poo, D., Kiong, D., & Ashok, S., (2008). Java database connectivity. In: Object-Oriented Programming and Java (Vol. 1, pp. 297–314). Springer, London. Press, W. A., Olshausen, B. A., & Van, E. D. C., (2001). A graphical anatomical database of neural connectivity. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 356(1412), 1147–1157. Quirós, M., Gražulis, S., Girdzijauskaitė, S., Merkys, A., & Vaitkus, A., (2018). Using SMILES strings for the description of chemical connectivity in the crystallography open database. Journal of Cheminformatics, 10(1), 1–17. Ramakrishnan, S., & Rao, B. M., (1997). Classroom projects on database connectivity and the web. ACM SIGCSE Bulletin, 29(1), 116–120. Regateiro, D. D., Pereira, Ó. M., & Aguiar, R. L., (2017). SPDC: Secure proxied database connectivity. In: DATA (Vol. 1, pp. 56–66). Robinson, J. L., Laird, A. R., Glahn, D. C., Lovallo, W. R., & Fox, P. T., (2010). Metaanalytic connectivity modeling: Delineating the functional connectivity of the human amygdala. Human Brain Mapping, 31(2), 173–184.
Database Connectivity and Web Technologies
215
76. Sanin, C., & Szczerbicki, E., (2006). Extending set of experience knowledge structure into a transportable language extensible markup language. Cybernetics and Systems: An International Journal, 37(2, 3), 97–117. 77. Saxena, V., & Kumar, S., (2012). Object-Oriented Database Connectivity for Hand Held Devices, 1, 4–8. 78. Shahabi, C., Khan, L., & McLeod, D., (2000). A probe-based technique to optimize join queries in distributed internet databases. Knowledge and Information Systems, 2(3), 373–385. 79. Shum, A. C., (1997). Open Database Connectivity Development of the Context Interchange System (Vol. 1, pp. 2–5). Doctoral dissertation, Massachusetts Institute of Technology. 80. Song, H., & Gao, L., (2012). Use ORM middleware realize heterogeneous database connectivity. In: 2012 Spring Congress on Engineering and Technology (Vol. 1, pp. 1–4). IEEE. 81. Sperberg-McQueen, C. M., (2000). Extensible Markup Language (XML) 1.0 (Vol. 1, pp. 4–9). World Wide Web Consortium. 82. Stephan, K. E., Kamper, L., Bozkurt, A., Burns, G. A., Young, M. P., & Kötter, R., (2001). Advanced database methodology for the collation of connectivity data on the macaque brain (CoCoMac). Philosophical Transactions of the Royal Society of London; Series B: Biological Sciences, 356(1412), 1159–1186. 83. Stephens, L. M., & Huhns, M. N., (1999). Database Connectivity Using an Agent-Based Mediator System. Univ. of S. Carolina, report A, 38. 84. Sylvester, R. K., (1997). Incorporation of internet databases into pharmacotherapy coursework. American Journal of Pharmaceutical Education, 61(1), 50–54. 85. Ternon, E., Agyapong, P., Hu, L., & Dekorsy, A., (2014). Databaseaided energy savings in next generation dual connectivity heterogeneous networks. In: 2014 IEEE Wireless Communications and Networking Conference (WCNC) (Vol. 1, pp. 2811–2816). IEEE. 86. Van, D. S., (1999). Free patent databases on the internet: A critical view. World Patent Information, 21(4), 253–257. 87. Van, S. M. A. M., & Winter, R. M., (1998). Internet databases for clinical geneticists‐an overview 1. Clinical Genetics, 53(5), 323–330. 88. Vecchio, F., Miraglia, F., Judica, E., Cotelli, M., Alù, F., & Rossini, P. M., (2020). Human brain networks: A graph theoretical analysis of
216
89.
90.
91. 92.
93.
94.
The Creation and Management of Database Systems
cortical connectivity normative database from EEG data in healthy elderly subjects. GeroScience, 42(2), 575–584. Vuong, N. N., Smith, G. S., & Deng, Y., (2001). Managing security policies in a distributed environment using extensible markup language (XML). In: Proceedings of the 2001 ACM Symposium on Applied Computing (Vol. 1, pp. 405–411). Wahle, S., Magedanz, T., Gavras, A., Hrasnica, H., & Denazis, S., (2009). Technical infrastructure for a pan-European federation of testbeds. In: 2009 5th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities and Workshops (Vol. 1, pp. 1–8). IEEE. Xu, D., (2012). Protein databases on the internet. Current Protocols in Protein Science, 70(1), 2–6. Yerneni, R., Papakonstantinou, Y., Abiteboul, S., & Garcia-Molina, H., (1998). Fusion queries over internet databases. In: International Conference on Extending Database Technology (Vol. 1, pp. 55–71). Springer, Berlin, Heidelberg. Yoshida, Y., Miyazaki, K., Kamiie, J., Sato, M., Okuizumi, S., Kenmochi, A., & Yamamoto, T., (2005). Two‐dimensional electrophoretic profiling of normal human kidney glomerulus proteome and construction of an extensible markup language (XML)‐based database. Proteomics, 5(4), 1083–1096. Yousfi, A., El Yazidi, M. H., & Zellou, A., (2020). xMatcher: Matching extensible markup language schemas using semantic-based techniques. International Journal of Advanced Computer Science and Applications, 1, 11(8).
8
CHAPTER
DATABASE ADMINISTRATION AND SECURITY
CONTENTS 8.1. Introduction..................................................................................... 218 8.2. The Role of a Database in an Organization...................................... 220 8.3. Introduction of a Database............................................................... 222 8.4. The Evolution of Database Administration Function......................... 224 References.............................................................................................. 230
218
The Creation and Management of Database Systems
8.1. INTRODUCTION To determine the monetary worth of data, consider what is recorded in a company’s database: information on users, vendors, inventories, activities, etc. How many chances would be wasted if the data is destroyed? What exactly is the importance of data damage? A business whose whole information is destroyed, for instance, would pay enormous costs and expenses (Mata-Toledo & Reyes-Garcia, 2002). Throughout tax season, the accountancy firm’s troubles would be exacerbated by a loss of data. Data loss sets every business in a precarious situation. The firm may be not capable of successfully managing everyday operations, it may lose clients who expect quick and professional assistance, and it may miss out on the chance to increase the customer base (Bertino & Sandhu, 2005). Information is a useful resource that may be derived from data. If the information is precise and accurate, it is necessary to motivate activities that strengthen the quality of its products and produce wealth. In practice, a company is susceptible to a data-information-decision loop; that is, the data owner appears to apply intelligence to data to generate information that serves as the foundation for the knowledge of the user when making choices. Figure 8.1 depicts the cycle in question (Leong-Hong & Marron, 1978).
Figure 8.1. The cycle of data-information-decision making. Source: https://slideplayer.com/slide/5985787/.
As seen in Figure 8.1, choices taken by upper-level managers result in responses at lower organizational stages. These acts provide extra data to
Database Administration and Security
219
monitor the functioning of the firm. The extra data must now be reused inside the context of data-information-decision making. Consequently, statistics serve as the foundation for decision-making, long-term strategy, monitoring, and operational tracking (Zygiaris, 2018). Effective asset control is a fundamental performance element for every firm. To arrange information as a business asset, executives must comprehend the significance of analyzed information and data. There are businesses (such as the one that gives credit files) that primary product is data and one whose viability is entirely dependent on the data administration (Said et al., 2009). Most firms are always looking for innovative methods to maximize the value of their collected data. This advantage may manifest in many ways, including data warehouses that allow enhanced relationship management as well as a closer connection with suppliers and customers to assist computerized logistics operations. As businesses grow more reliant on data, the precision of that data grows increasingly vital. Such firms face an even larger danger from soiled data, which includes inaccurate and inconsistent information. Data may get soiled for several causes, including (Bullers Jr et al., 2006): •
Absence of authenticity constraint implementation (not null, uniqueness, referential integrity, etc.; • Data entry punctuation mistakes; • Utilization of equivalents and/or homophones throughout many systems; • Utilization of nontraditional acronyms in character information; and • Distortions of composite characteristics into basic attributes vary amongst systems. Many reasons for dirty data may be handled at the level of the database, like the application of constraints correctly. Therefore, resolving the other sources of stale information is more difficult. The migration of data between networks, like in the establishment of a database system, is one cause of soiled data. Data quality measures are often considered as attempts to regulate soiled data (Tahir & Brézillon, 2013). Data quality is an all-encompassing strategy for guaranteeing the correctness, authenticity, and consistency of the data. The notion that the quality of the data is exhaustive is crucial. Quality of information is focused with over simply cleansing dirty data; also, it seeks to prevent
The Creation and Management of Database Systems
220
new data mistakes and boost user trust in the information. Large-scale data quality efforts are often complicated and costly. Therefore, these activities must be aligned with company objectives and have the support of senior management. Although data quality initiatives vary widely from organization to organization, the majority incorporate a combination of (Paul & Aithal, 2019): • • •
A framework of data governance accountable for data integrity; Evaluations of the present data integrity; Creation of data quality requirements following enterprise objectives; and • integration of tools and procedures to assure the integrity of the new dataset. Several tools may aid in the execution of data quality efforts. Particularly, several manufacturers provide data characterization and master data management (MDM) tools to aid in guaranteeing data quality. The application that makes up data profiling technology collects metrics and evaluates existing data sources. These programs evaluate current sets of data to find sequences in data and thus can match the current sequences in data to organizationally-defined criteria (El-Bakry & Hamada, 2010). This study may assist the business in understanding the performance of the manufactured data and identifying sources of soiled data. MDM software aids in the prevention of stale data by organizing shared information across many systems. MDM offers a _master_ copy of objects, like users, that exist across several organizational systems. Even though these technology techniques contribute significantly to data integrity, the total answer for highquality data inside an organization depends mainly on data management and administration (Kahn & Garceau, 1985).
8.2. THE ROLE OF A DATABASE IN AN ORGANIZATION Different personnel in various departments utilize data for various purposes. As a result, information management must include the idea of shared information. When used correctly, the DBMS enables• (Yao et al., 2007): • •
Data analysis and display in meaningful forms by converting data to information. Data and information are exchanged with the right person at the right moment.
Database Administration and Security
221
• Data retention and tracking for a suitable duration of time. • Internal and external control on data duplication and usage. Regardless of the kind of organization, the primary objective of the database is to enable management decision-making at all levels of the business while maintaining data privacy and security (Khan et al., 2014). The management structure of a company may be split into three stages: top, intermediate, and functional. Strategic choices are made by upper executives, tactical things are decided by middle management, and ongoing operational choices are taken by operational management. Shortterm operating choices affect just daily activities, such as opting to modify the cost of the product to remove it from stock. Tactical choices have a longer time horizon and have a bigger impact on larger-scale activities, such as modifying the cost of the product in reaction to competing demands (Armstrong & Densham, 1990). Strategic choices influence the organization’s ultimate well-being or even existence, such as shifting price strategies throughout product offerings to win a share of the market. The DBMS must offer tools that have a relevant perspective of the information to every management level as well as enable the needed degree of decision making. The actions listed below are common at every management level. At the highest levels of management, the system has to be capable to (Yusuf & Nnadi, 1988): • •
•
•
•
Give the data required for decision making, strategy development, policy development, and goal setting. Give entry to internally and externally data for identifying growth prospects and trace the course of this growth. (The character of the activities is referred to as path: can a corporation get to be a services company, a production organization, or a few mix of the two?) Establish a structure for developing and implementing organizational policies. (Keep in mind that such policies are transformed into business requirements at a lower level of the company.) Increase the chances of a favorable rate of return for the organization by looking out for innovative methods to cut expenses and/or increase productivity. Give feedback to determine if the organization is meeting its objectives.
The Creation and Management of Database Systems
222
At the middle management level, the database must be able to (Rabinovitch & Wiese, 2007): • •
Provide the data required for tactical judgments and strategy. Monitor and oversee the distribution and utilization of organizational assets, as well as evaluate the effectiveness of various organizations. • Give a mechanism for regulating and protecting the database’s safety and confidentiality. Data security entails safeguarding data from unauthorized users’ unintentional or purposeful usage. Privacy is concerned with people’s and organizations’ rights to decide the “who, when, why, how, as well as what” of data use. The database must have the ability to: Just at the level of operational management, the system must be capable to (Bhardwaj et al., 2012): •
As accurately as possible describe and assist the company’s activities. The database schema has to be adaptable enough to include all essential current and anticipated data. • Generate search queries within defined time frames. Please remember that the performance requirements at lower levels of the organization and operations become more stringent. As a result, at the operational managerial level, the database should allow faster answers to a bigger amount of transactions. • Improve the company’s short-term functioning by delivering accurate information for phone support, software engineering, as well as computing environments. A typical goal for every database is to ensure a constant stream of data across the organization. The corporate or enterprise network is another name for the company’s database. The enterprise database is described as “the company’s collected data that supports all current and anticipated future activities.” Most successful firms today rely on corporate databases to enable all their activities, from planning to installation, sales to operations, and everyday tactical decisions to the planning process (Dewett & Jones, 2001).
8.3. INTRODUCTION OF A DATABASE Possessing a digitized database does not ensure that the data would be utilized correctly to give managers the best options. A database management system (DBMS) is a way of managing information; the same as any tool, it should be utilized efficiently to deliver the intended results. Imagine this
Database Administration and Security
223
metaphor: a hammer may help a craftsman make woodwork, but it can also harm the grasp of a youngster. The presence of a computer central database is not the answer to a company’s issues; instead, its proper administration and utilization are (King, 2006). The implementation of a DBMS involves a major shift and challenge; the DBMS can make a significant influence on the company, which may be favorable or bad depending on how it is managed. One important factor, for instance, is tailoring the DBMS to the company rather than pushing the company to conform to the DBMS (Yarkoni et al., 2010). The primary concern ought to be the demands of the company, not the parts as shown in the figure of the DBMS. However, implementing a DBMS cannot be done without hurting the company. The stream of fresh development management data has a significant impact on how the company performs and, as a result, its company culture (Motomura et al., 2008). The implementation of a DBMS within an enterprise has been defined as a three-step process that contains three key components (Schoenbachler & Gordon, 2002): •
Technological. Database management system computer equipment; • Managerial and administrative functions; and • Cultural. Corporate aversion to change. The technical part consists of choosing, implementing, setting, and maintaining the DBMS to ensure that it manages data storage, access, and security effectively. The individual or persons in charge of dealing with the technological aspects of the DBMS system should have the technical abilities required to give or obtain acceptable assistance for the DBMS’s multiple users: programmers, administrators, and end-users. As a result, database administration manpower is a critical technical concern in the DBMS implementation. To ensure a successful move to the new social context, the chosen employees must have the correct balance of technical and administrative abilities (Kalmegh & Navathe, 2012). The management component of DBMS implementation should not be overlooked. A high-quality DBMS, like having the greatest racing car, does not ensure a high-quality information system (Dunwell et al., 2000). The implementation of a DBMS into an organization requires meticulous preparation to develop an appropriate leadership system to support the person or individuals in charge of managing the DBMS. A very well checking and regulating function should be applied to the organizational structure.
224
The Creation and Management of Database Systems
Administrative professionals must have good communication skills, as well as a thorough grasp of organizational and commercial concepts (Chen et al., 2021). Top management should be responsible for the current system to identify and promote the organization’s data administration responsibilities, objectives, and roles. The cultural effect of implementing a database system should be carefully considered. The presence of the DBMS is influential on individuals, functions, as well as relationships. For instance, extra staff may be engaged, current personnel may be assigned new duties, and job performance may be assessed using new criteria (Goldewijk 2001). Since the database strategy generates a more regulated and organized information flow, a cultural influence is anticipated. Department managers who are accustomed to managing their data must relinquish subjective control of their information to the data administration role and distribute it with the others of the firm. App developers must understand and adhere to new engineering and design guidelines (Chen et al., 2004; Amano & Maeda, 1987). Managers may experience what they perceive to be additional data and may take time to acclimatize to the new environment. Whenever the new database becomes life, users may be hesitant to utilize the data offered by the system, questioning its worth or veracity. (Many people will be astonished, if not outraged, to learn that the truth contradicts their preconceived conceptions and deeply held views.) The database administration has to be ready to welcome target customers, respond to their problems, address those issues where feasible, and train end-users on the program’s applications and advantages (Buhrmester, 1998).
8.4. THE EVOLUTION OF DATABASE ADMINISTRATION FUNCTION The origins of data administration may be traced back to the ancient, decentralized universe of the system files. The expense of data and managerial repetition in these kinds of file systems prompted the creation of an electronic data processing (EDP) or data processing (DP) division as a centralized information administration role (Fry & Sibley, 1976). The DP department’s job was to aggregate all computing funds to provide operational assistance to every division. The DP administration role was granted the ability to administer all current business file systems, as well as to address data and management issues resulting from duplication of data and/or abuse (Kahn & Garceau, 1985).
Database Administration and Security
225
The introduction of the DBMS and its shared view of data resulted in a new stage of data management complexity, transforming the DP department into an information systems (IS) division (Sherif, 1984): •
A service function to offer end-users continuous data management assistance was added to the IS department’s duties. • A production function that uses integrated app or management computational modeling to give end-users customized solutions for their data needs. The IS department’s organization’s internal design mirrored its functional emphasis. Figure 8.2 depicts how most IT departments were organized. The IS app development section was divided up by the category of the backed system as the supply for app development expanded: financial reporting, stock, advertising, and so on (Mistry et al., 2013). Even so, because of this advancement, the data management duties were split. The database operations section was in control of applying, tracking, and attempting to control the DBMS operations, while the app development section was in process of organizing database specifications and logical data layout (Gillenson, 1991).
Figure 8.2. The IS department’s internal organization. Source: https://www.researchgate.net/figure/IIASAs-internal-organizationalstructure_fig1_318585146.
The surface area as well as the role of the DBA function, as well as its location within a company’s activities, differ from one company to the next. The DBA role may be classified as either a staff or a line position in the
The Creation and Management of Database Systems
226
organization structure (Barki et al., 1993). Placing the DBA role on staff often results in a consulting environment, in which the DBA may establish a data administration plan but lacks the power to enforce it or handle any disputes. In a line role, the DBA is responsible for planning, defining, implementing, and enforcing the rules, standards, and procedures utilized in the data administration activities. Figure 8.3 illustrates the two alternative DBA function positions (Jain & Ryu, 1988).
Figure 8.3. The placement of the DBA function. Source: https://slideplayer.com/slide/13196309/.
How well the DBA role fits within an organizational process is not standardized. In part, this is since the DBA function is perhaps the most flexible of all organizational roles. In reality, the rapid evolution of DBMS technology necessitates a shift in organizational structures (Weber & Everest, 1979): •
The growth of database systems may compel a business to democratize its data management role. The distributed system enables the System Administrator to establish and assign the duties of every local DBA, hence putting new and complicated coordination activities on the systems DBA.
Database Administration and Security
227
•
The rising complexity and capability of microcomputer-based DBMS packages facilitate the creation of user-friendly, less expensive, and effective department-specific solutions. • Microcomputer-based DBMS programs are getting more complex and capable, which makes it easier to make department-specific solutions that are easy to use, less expensive, and work well. However, such an atmosphere is also conducive to data redundancy, not to add the challenges caused by those who do not have the technical expertise to construct effective database designs. In summary, the new microcomputer context necessitates that the database administrator (DBA) acquires a new set of management and technical abilities (Teng & Grover, 1992). It is customary to characterize the DBA function by separating DBA actions into stages of the Database Life Cycle (DBLC). If this strategy is implemented, the DBA role will need staff for the following cases (Houser & Pecchioli, 2000): •
Scheduling of databases, such as the development of rules, processes, and enforcement. • Collection of database needs and design concepts. • Algebraic and transactional database architecture. • Physical data development and construction. • Database validation and troubleshooting. • Maintenance and operation of databases, involving setup, transformation, and transfer. • Database instruction and assistance. • Data quality control and reporting. Per this paradigm, Figure 8.4 depicts a suitable DBA functional structure (Aiken et al., 2013).
228
The Creation and Management of Database Systems
Figure 8.4. A DBA functional organization. Source: https://documen.site/download/chapter-15–29_pdf.
Consider that a business may have many incompatible DBMSs deployed to serve distinct processes. For instance, it is fairly unusual for organizations to have a hierarchical DBMS to handle daily transactions at the top management and a database system to meet the ad hoc data needs of middle and upper management (LEONARD, 1990). Several microcomputer DBMSs may also be implemented in the various departments. In such a scenario, the business may allocate one DBA to each DBMS. Figure 8.5 depicts the job of a systems administrator, also known as the general coordinator of all DBAs (Ramanujam & Capretz, 2005).
Figure 8.5. Multiple database administrators in an organization. Source: http://onlineopenacademy.com/database-administrator/.
Database Administration and Security
229
In the data management role, there is an increasing tendency toward specialization. For example, some of the bigger businesses’ organizational charts distinguish between a DBA and a data administrator (DA). The DA, sometimes known as the information resource manager (IRM), reports directly to senior management and is given more responsibility and power than the DBA, however, the two functions overlap to some extent (Guynes & Vanecek, 1995; Roddick, 1995). The DA is in charge of the company’s comprehensive data resources, both electronic and manual. Because the DA is in charge of handling not just computerized data but also data beyond the purview of the DBMS, the DA’s job description encompasses a broader range of actions than the DBA’s. The DBA’s position within the enlarged organizational structure may differ from one firm to the next. The DBA may report to the DA, the IRM, the IS manager, or the company’s CEO, depending on the structure’s components (Rose, 1991; Thomas et al., 2006).
230
The Creation and Management of Database Systems
REFERENCES 1.
Aiken, P., Gillenson, M., Zhang, X., & Rafner, D., (2013). Data management and data administration: Assessing 25 years of practice. In: Innovations in Database Design, Web Applications, and Information Systems Management (Vol. 1, pp. 289–309). IGI Global. 2. Amano, K., & Maeda, T., (1987). Database management in research environment. In: Empirical Foundations of Information and Software Science III (Vol. 1, pp. 3–11). Springer, Boston, MA. 3. Armstrong, M. P., & Densham, P. J., (1990). Database organization strategies for spatial decision support systems. International Journal of Geographical Information Systems, 4(1), 3–20. 4. Barki, H., Rivard, S., & Talbot, J., (1993). A keyword classification scheme for IS research literature: An update. MIS Quarterly, 1, 209– 226. 5. Bertino, E., & Sandhu, R., (2005). Database security-concepts, approaches, and challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19. 6. Bhardwaj, A., Singh, A., Kaur, P., & Singh, B., (2012). Role of fragmentation in distributed database system. International Journal of Networking & Parallel Computing, 1(1), 2–7. 7. Buhrmester, D., (1998). Need Fulfillment, Interpersonal Competence, and the Developmental Contexts of Early Adolescent Friendship, 1, 1–9. 8. Bullers, Jr. W. I., Burd, S., & Seazzu, A. F., (2006). Virtual machinesan idea whose time has returned: Application to network, security, and database courses. ACM SIGCSE Bulletin, 38(1), 102–106. 9. Chen, J., Hong, H., Huang, M., & Kubik, J. D., (2004). Does fund size erode mutual fund performance? The role of liquidity and organization. American Economic Review, 94(5), 1276–1302. 10. Chen, X., Yang, H., Liu, G., & Zhang, Y., (2021). NUCOME: A comprehensive database of nucleosome organization referenced landscapes in mammalian genomes. BMC Bioinformatics, 22(1), 1–15. 11. Dewett, T., & Jones, G. R., (2001). The role of information technology in the organization: A review, model, and assessment. Journal of Management, 27(3), 313–346. 12. Dunwell, J. M., Khuri, S., & Gane, P. J., (2000). Microbial relatives of the seed storage proteins of higher plants: Conservation of structure and
Database Administration and Security
13.
14. 15.
16.
17.
18.
19. 20.
21.
22.
23.
24.
231
diversification of function during evolution of the cupin superfamily. Microbiology and Molecular Biology Reviews, 64(1), 153–179. El-Bakry, H. M., & Hamada, M., (2010). A developed watermark technique for distributed database security. In: Computational Intelligence in Security for Information Systems 2010 (Vol. 1, pp. 173– 180). Springer, Berlin, Heidelberg. Fry, J. P., & Sibley, E. H., (1976). Evolution of data-base management systems. ACM Computing Surveys (CSUR), 8(1), 7–42. Gillenson, M. L., (1991). Database administration at the crossroads: The era of end-user-oriented, decentralized data processing. Journal of Database Management (JDM), 2(4), 1–11. Goldewijk, K. K., (2001). Estimating global land use change over the past 300 years: The HYDE database. Global Biogeochemical Cycles, 15(2), 417–433. Guynes, C. S., & Vanecek, M. T., (1995). Data management issues in information systems. Journal of Database Management (JDM), 6(4), 3–13. Houser, J., & Pecchioli, M., (2000). Database administration for spacecraft operations-the integral experience. ESA Bulletin, 1, 100– 107. Jain, H. K., & Ryu, H. S., (1988). The Issue of Site Autonomy in Distributed Database Administration, 1, 2–5. Kahn, B. K., & Garceau, L. R., (1985). A developmental model of the database administration function. Journal of Management Information Systems, 1(4), 87–101. Kalmegh, P., & Navathe, S. B., (2012). Graph database design challenges using HPC platforms. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (Vol. 1, pp. 1306–1309). IEEE. Khan, S. A., Saqib, M., & Al Farsi, B., (2014). Critical Role of a Database Administrator: Designing Recovery Solutions to Combat Database Failures, 1, 3–9. King, W. R., (2006). The critical role of information processing in creating an effective knowledge organization. Journal of Database Management (JDM), 17(1), 1–15. LEONARD, B., (1990). Quality control for a shared multidisciplinary database. Data Quality Control: Theory and Pragmatics, 112, 43.
232
The Creation and Management of Database Systems
25. Leong-Hong, B., & Marron, B. A., (1978). Database Administration: Concepts, Tools, Experiences, and Problems (Vol. 28, pp. 4–8). National bureau of standards. 26. Mata-Toledo, R. A., & Reyes-Garcia, C. A., (2002). A model course for teaching database administration with personal oracle 8 i. Journal of Computing Sciences in Colleges, 17(3), 125–130. 27. Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A., & Punta, M., (2013). Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Research, 41(12), e120-e121. 28. Motomura, N., Miyata, H., Tsukihara, H., Okada, M., Takamoto, S., & Organization, J. C. S. D., (2008). First report on 30-day and operative mortality in risk model of isolated coronary artery bypass grafting in Japan. The Annals of Thoracic Surgery, 86(6), 1866–1872. 29. Paul, P., & Aithal, P. S., (2019). Database security: An overview and analysis of current trend. International Journal of Management, Technology, and Social Sciences (IJMTS), 4(2), 53–58. 30. Rabinovitch, G., & Wiese, D., (2007). Non-linear optimization of performance functions for autonomic database performance tuning. In: Third International Conference on Autonomic and Autonomous Systems (ICAS’07) (Vol. 1, pp. 48–48). IEEE. 31. Ramanujam, S., & Capretz, M. A., (2005). ADAM: A multi-agent system for autonomous database administration and maintenance. International Journal of Intelligent Information Technologies (IJIIT), 1(3), 14–33. 32. Roddick, J. F., (1995). A survey of schema versioning issues for database systems. Information and Software Technology, 37(7), 383– 393. 33. Rose, E., (1991). Data modeling for non-standard data. Journal of Database Management (JDM), 2(3), 8–21. 34. Said, H. E., Guimaraes, M. A., Maamar, Z., & Jololian, L., (2009). Database and database application security. ACM SIGCSE Bulletin, 41(3), 90–93. 35. Schoenbachler, D. D., & Gordon, G. L., (2002). Trust and customer willingness to provide information in database-driven relationship marketing. Journal of Interactive Marketing, 16(3), 2–16.
Database Administration and Security
233
36. Sherif, M. A. J., (1984). The Impact of Database Systems on Organizations: A Survey with Special Reference to the Evolution of the Database Administration Function (Vol. 1, pp.3–9). Doctoral dissertation, City University Londo. 37. Tahir, H., & Brézillon, P., (2013). Shared context for improving collaboration in database administration. International Journal of Database Management Systems, 5(2), 13. 38. Teng, J. T., & Grover, V., (1992). An empirical study on the determinants of effective database management. Journal of Database Management (JDM), 3(1), 22–34. 39. Thomas, P. D., Kejariwal, A., Guo, N., Mi, H., Campbell, M. J., Muruganujan, A., & Lazareva-Ulitsky, B., (2006). Applications for protein sequence–function evolution data: MRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Research, 34(suppl_2), 645–650. 40. Weber, R., & Everest, G. C., (1979). Database administration: Functional, organizational, & control perspectives. EDPACS: The EDP Audit, Control, and Security Newsletter, 6(7), 1–10. 41. Yao, B., Yang, X., & Zhu, S. C., (2007). Introduction to a large-scale general purpose ground truth database: Methodology, annotation tool and benchmarks. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (Vol. 1, pp. 169– 183). Springer, Berlin, Heidelberg. 42. Yarkoni, T., Poldrack, R. A., Van, E. D. C., & Wager, T. D., (2010). Cognitive neuroscience 2.0: Building a cumulative science of human brain function. Trends in Cognitive Sciences, 14(11), 489–496. 43. Yusuf, A., & Nnadi, G., (1988). Structure and functions of computer database systems. Vikalpa, 13(4), 37–44. 44. Zygiaris, S., (2018). DataBase: Administration and security. In: Database Management Systems (Vol. 1, pp. 3–10). Emerald Publishing Limited.
INDEX
A Active Server Pages (ASPs) 188 ActiveX Data Objects (ADO) 188, 200 Ada 81 advertising 225 Amazon 2 asset control 219 authenticity constraint 125 automatic transmission 56 B basic application programming interface (API) 182 budget 140 business rules 48, 50, 51, 52, 66, 67, 68, 69, 70, 71, 72 C C# 7, 32 cloud services 2 Cloud storage 2 COBOL 81, 84 commercial contract 52 commercial database management systems 74
competent judgment 5 complex relational data software 56 Component Object Model (COM) 186 computer-based programs 3 computer database 6 computer language 7 computer simulation 2 conceptual design 149 Conference on Data Systems and Languages 74 D Data 45, 47, 48, 52, 54, 56, 66, 69, 70, 72 data administration 54, 219, 224, 226, 230 database architecture 47, 51 database configuration 52 Database connectivity 180, 181, 211 database design 141, 143, 146, 147, 149, 150, 151, 152, 153, 154, 155, 156, 158 Database development 143 database management systems (DBMSs) 2
236
The Creation and Management of Database Systems
database modeling 46, 62 Database planning 144 database schema 78, 80, 81, 82 database server 8, 9, 10, 29, 31, 32 Database System Development Lifecycle (DSDLC) 141 database systems 2, 10, 13, 24, 28, 31, 34, 35, 36, 37, 38, 40, 42, 43 Data Base Task Group (DBTG) 74 data communication 181, 186 data consistency 125 Data Definition Language (DDL) 162 data dictionary 144 data ecosystem 6 Data entry 219 Data Flow Diagrams (DFD) 146 Data gathering 3 data integrity 112, 133 Data manipulation 82 Data Manipulation Language (DML) 162 data modeling 46, 47, 55, 60, 70 data modification 160 data processing (DP) 224 Data quality 219, 227 data retrieval 83 data security 50 dataset 129 data source 180, 181, 184, 185, 187, 191 data source name (DSN) 184 data system 3 data transformation 82 data warehouses 219 decision-making 219, 221 deductive reasoning 114 desktop database 10
dynamic-link libraries (DLLs) 183 E electronic data processing (EDP) 224 Extensible Markup Language (XML) 180, 209, 215 F Facebook 2 file organization system 9 financial reporting 225 Flickr 2 Fortran 81 G Google 2, 3 graphics generator 85 H Hierarchical Input Process Output (HIPO) 146 I information corruption 8 information model 46, 66 Information Resource Dictionary System (IRDS) 165 information systems (IS) 143 Information Systems Lifecycle (ISLC) 141 Interactive Graphics Retrieval System (INGRES) 113 International Organization for Standardization (ISO) 113 Internet 180, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 204, 205, 209, 211, 213, 215 inventory management system 47
Index
J
237
programming language 55, 58 Q
Java 7, 32, 81 L
Query-By-Example (QBE) 84 query language 83, 85
logical design 149
R
M macro environment 3 management system 7, 8 master data management (MDM) 220 metadata 82, 92, 100 Microsoft Access reports 180 mission statement 144, 149 mobile devices 180 N Network Database (NDL) 164
Languages
O object-oriented programming languages 185 OnLine Analytical Processing (OLAP) 165 Open Database Connectivity (ODBC), 182, 200 operating system 78, 98 Operational Logical Processing 164 operational management 221, 222 organizational sustainability 6 P Pascal 81 personality 54 physical design 149, 151, 153 policy development 221 Privacy 222 Procedural languages 83
referential integrity 125, 127, 138 Relational Database Language (RDL) 163 Relational Database Management System (RDBMS) 112 relational data model 112, 113, 114, 117, 120, 122, 125, 126 relational paradigm 113, 114, 126, 128 Reliable data 6 Remote Data Access (RDA) 165 S social networks 2 software crisis 140 Software Development Lifecycle (SDLC) 141 software failure 140 software systems 52 sound judgment 46 stale information 219 Standards Planning and Requirements Committee (SPARC) 74 standard transmitting 56 statistics 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 17, 18, 19, 21, 22, 27, 28, 30, 33 stock control 145 strategy development 221 Structured Analysis and Design (SAD) 146
238
The Creation and Management of Database Systems
Structured Query Language (SQL) 59 Systems Application Architecture (SAA) 165 T Twitter 2 Twitter posts 2
video files 2 Visual Basic applications 180 Visual Studio 7 W Web development 188 Web front ends 180 Web interfaces 180, 199
U Universal Data Access (UDA) 181 V vernacular 164
Y Yahoo 2