229 111 1MB
English Pages 298 Year 2003
Teradata Database
Introduction to Teradata Warehouse Teradata Database Release V2R5.1 Teradata Warehouse Release 7.1
B035-1091-083A November 2003
The product described in this book is a licensed product of NCR Corporation. BYNET is an NCR trademark registered in the U.S. Patent and Trademark Office. CICS, CICS/400, CICS/600, CICS/ESA, CICS/MVS, CICSPLEX, CICSVIEW, CICS/VSE, DB2, DFSMS/MVS, DFSMS/ VM, IBM, NQS/MVS, OPERATING SYSTEM/2, OS/2, PS/2, MVS, QMS, RACF, SQL/400, VM/ESA, and VTAM are trademarks or registered trademarks of International Business Machines Corporation in the U. S. and other countries. DEC, DECNET, MICROVAX, VAX and VMS are registered trademarks of Digital Equipment Corporation. HEWLETT-PACKARD, HP, HP BRIO, HP BRIO PC, and HP-UX are registered trademarks of Hewlett-Packard Co. KBMS is a trademark of Trinzic Corporation. INTERTEST is a registered trademark of Computer Associates International, Inc. MICROSOFT, MS-DOS, MSN, The Microsoft Network, MULTIPLAN, SQLWINDOWS, WIN32, WINDOWS, WINDOWS 2000, and WINDOWS NT are trademarks or registered trademarks of Microsoft Corporation. SAS, SAS/C, SAS/CALC, SAS/CONNECT, and SAS/CPE are registered trademarks of SAS Institute Inc. SOLARIS, SPARC, SUN and SUN OS are trademarks of Sun Microsystems, Inc. TCP/IP protocol is a United States Department of Defense Standard ARPANET protocol. TERADATA and DBC/1012 are registered trademarks of NCR International, Inc. UNICODE is a trademark of Unicode, Inc. UNIX is a registered trademark of The Open Group. X and X/OPEN are registered trademarks of X/Open Company Limited. YNET is a trademark of NCR Corporation. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The information contained in this document may contain references or cross references to features, functions, products, or services that are not announced or available in your country. Such references do not imply that NCR intends to announce such features, functions, products, or services in your country. Please consult your local NCR representative for those features, functions, products, or services available in your country. Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without notice. NCR may also make improvements or changes in the products or services described in this information at any time without notice. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please e-mail: [email protected] or write: Information Engineering NCR Corporation 100 North Sepulveda Boulevard El Segundo, CA 90245-4361 U.S.A. Any comments or materials (collectively referred to as “Feedback”) sent to NCR will be deemed non-confidential. NCR will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback. Copyright © 1999–2003, NCR Corporation All Rights Reserved
Preface Supported Software Release This book supports Teradata® Database Release V2R5.1 and Teradata® Warehouse Release 7.1.
Changes to This Book This book includes the following changes to support the current release: Date
November 2003
Description
Changed the title from Introduction to Teradata RDBMS to Introduction to Teradata Warehouse. Reorganized the book, placing chapters into sections so that information is easier to find. Added the following features to the appropriate chapters: • Hot Standby Nodes • Disk I/O Integrity Checking • SQL Functions • User-Defined Functions • Encryption • Database Object Use Count Added information about: • Teradata Tools and Utilities • Teradata Meta Data Services Updated information in existing sections. This information is marked by change bars in the margins.
Introduction to Teradata Warehouse
i
Preface
Date
December 2002
Description
Chapter 2 • Moved the following: “Data Communications” section to Chapter 6 and updated the features in the section “Database Management Tools” to Chapter 9 • Rewrote the following section: “Disk Arrays” • Added the Cylinder Read feature to “The Teradata File System” section Chapter 3 • Added the following new features: “Referential Constraints” “Batch Referential Integrity” • Added information about rows and columns to: “Derived Tables” “Tables, Rows, and Columns” • Moved the index information to the new Chapter 5: “Data Distribution and Access Methods.”
ii
Introduction to Teradata Warehouse
Preface
Date
December 2002 (continued)
Description
Chapter 4 • Added the following new section: “What Are Macros” “Table Constraints” “Default Database” “Identity Column” • Updated the following sections: “SQL Data Types” “Data Type Attributes” “Data Manipulation” “SELECT Statement and Set Operators” “What Are Teradata Stored Procedures” • Moved the following sections to Chapter 9: “Database Management and Analysis Tools”: “Basic Teradata Query Utility” “Teradata RDBMS Preprocessor2” Chapter 5 • Created Chapter 5: “Data Distribution and Access Methods” and consolidated index information in this chapter. • Added the following new feature: “Partitioned Primary Indexes” “Sparse Join Indexes” • Updated the following sections: “Join Indexes” “Hashing” “Index Specification” “Strengths and Weaknesses of Various Types of Indexes” “Secondary Indexes” Chapter 6 • Updated the following section: “Summary of Data Dictionary Views” Chapter 7 • Updated the following sections: • “Stored Procedures as SQL Applications” • “The EXPLAIN Statement” • “Data Communications” • “CLIv2 for Channel-Attached Systems” • “CLIv2 for Network-Attached Systems” • “Other Types of Data Communications”
Introduction to Teradata Warehouse
iii
Preface
Date
December 2002 (continued)
Description
Chapter 8 • Added new chapter: Chapter 8: “International Language Support” Chapter 9 • Added a new chapter titled, Chapter 9: “Database Management and Analysis Tools.” The chapter contains updated descriptions of the database management tools from Chapter 2 and the new database query analysis tools in release V2R5.0. • Added the following new tools: “Priority Scheduler Administrator” “Teradata Index Wizard” “Database Query Log” • Added the following sections that were previously in Chapter 4: “Structured Query Language (SQL)”: “Basic Teradata Query Utility” “Teradata RDBMS Preprocessor2” • Updated the following sections: “Query Session” “Teradata Dynamic Query Manager” “Teradata SQL Assistant” Chapter 10 • Updated the following sections: “Vproc Migration” “Fallback Tables” “Journaling” “Teradata Archive/Recovery” “BYNETs” “RAID Disk Units” Chapter 11 • Deleted Chapter 11 “System Maximum Capacities”. You can find this information in: Teradata RDBMS SQL Reference, Volume 1 Chapter 11 is now titled: Chapter 11: “Concurrency Control and Transaction Recovery” • Updated the following section: “Deadlocks and Deadlock Resolution” Chapter 12 • Added the following sections: “User-Level Password Attributes”
iv
Introduction to Teradata Warehouse
Preface
Date
December 2002 (continued)
Description
Chapter 13 • Updated the following sections: “How to Create Databases” “Account String Expansion” “Account Performance Groups” “Summary Mode” “How to Control Collection and Logging of ResUsage Data” • Added the following new section: “Roles and Profiles”
June 2001
Added the following features and enhancements: • Single Sign On • Hash Index • UTF-8 character set support • Integrated Database Query Manager • Resource Check Tool • 128 K data block support • Increased number of global and volatile temporary tables Updated the following features: • Stored procedures • Indexes • Backup utilities
September 2000
• Updated glossary. Removed references to UNISYS and KBMS/Intellect. Replaced reference to Nomad2 to NOMAD.
Introduction to Teradata Warehouse
v
Preface
Date
June 2000
Description
• COLLECT STATISTICS: Collected statistics are now stored in a spool table so that you can collect statistics at the same time you execute queries against the table. • Fallback: You can now define join index subtables with fallback. • Online Analytical Processing (OLAP): The operation of OLAP sampling is now optimized in the file system by only accessing the data blocks that contain the target row positions instead of scanning all the data blocks. • PDE Tools Utility (NT Only): A new PDE tools utility allows you to run the ctl, DBS control, DIP, and Vproc Manager utilities from any TPA or non-TPA or from the AWS. You can also start, reset, or stop the PDEs. • EXPLAIN: The size of the EXPLAIN text is now unlimited. • Stored Procedures: Teradata has developed the Teradata Stored Procedures (TDSP) feature.
April 1999
• A new virtual BYNET driver has eliminated the need for the vnet driver on systems that have no BYNET hardware. • Addition of the use of triggers with SQL statements. • Addition of a join index to improve performance • Internationalization of Kanji characters • Increase in the maximum number of vprocs to 16k • Increase in the row size to 64 K • Addition of OLAP features for statistical functions, extended date/calendar capability, and sampling. • Addition of Timestamping to the Data Dictionary.
vi
Introduction to Teradata Warehouse
Preface About This Book
About This Book Purpose This book provides an introduction to the Teradata Warehouse covering: • • • • •
Teradata Database and Teradata Tools and Utilities Teradata Database architecture and the relational model Applications and data communications Data definitions and data manipulation using Structured Query Language (SQL) System administration and security
Audience This book is intended for users who interface with the Teradata Warehouse. Such individuals may include database users or administrators.
How This Book Is Organized This book contains the following chapters: Chapter 1: “Teradata Warehouse” provides an introduction to the data warehouse concepts. Section 1: “Teradata Warehouse Overview” contains the following chapters: Chapter 2: “Teradata Warehouse Overview” describes the components of the Teradata Warehouse. Chapter 3: “Teradata Database Architecture” describes the Teradata Database system hardware and its design. Chapter 4: “International Language Support” describes the language support capabilities of the Teradata Database. Section 2: “The Teradata Database Structure” contains the following chapters: Chapter 5: “Structured Query Language (SQL)” provides information about how to use SQL to manipulate data in the Teradata Database. Chapter 6: “Application Development” provides information about developing applications for the Teradata Database. Chapter 7: “The Teradata Database Model” contains basic information about the tables, rows, and columns that make up the database model. Chapter 8: “Data Distribution and Access Methods” provides information about distributing data to and retrieving data from the Teradata Database.
Introduction to Teradata Warehouse
vii
Preface About This Book
Chapter 9: “Data Dictionary” describes the structure and content of the system tables in the Data Dictionary. Chapter 10: “Teradata Meta Data Services Services” provides information about Teradata Meta Data Services, which allows you to store, access, and administrate metadata on the Teradata Database. Chapter 11: “Other Database Objects” provides more information about views, macros, stored procedures, and triggers. Section 3: “Teradata Database System Operation” contains the following chapters: Chapter 12: “Normalization and Referential Integrity” describes the following: • •
How normalization reduces complex data to simpler, stable data structures How referential integrity protects data
Chapter 13: “Data Communication Between Client and Teradata Database” describes how the client and the Teradata Database exchange information. Chapter 14: “Reliability” describes how fault-tolerant hardware and software increase the reliability of the Teradata Database. Section 4: “Management and Monitoring” contains the following chapters: Chapter 15: “Concurrency Control and Transaction Recovery” describes the mechanisms that prevent concurrently operating sessions from damaging the data that resides within the Teradata Database. Chapter 16: “Database Management and Analysis Tools” describes the tools that you can use to manage the hardware and software that make up the Teradata Database. Chapter 17: “Security and Integrity” describes how to prevent unauthorized access to the information in the Teradata Database. Chapter 18: “System Administration” describes space allocation, roles and profiles, accounting, and maintenance capabilities on the Teradata Database. Chapter 19: “System Monitoring” describes the various aspects of monitoring the Teradata Database, including the monitoring tools used to track system performance.
Prerequisites To gain an understanding of Teradata Warehouse, you should be familiar with the following: • • •
viii
Basic computer technology System hardware Teradata Tools and Utilities
Introduction to Teradata Warehouse
Preface List of Acronyms
List of Acronyms The following acronyms, listed in alphabetical order, are used in this book: 1NF
First Normal Form
2NF
Second Normal Form
2PC
Two-Phase Commit
3NF
Third Normal Form
4NF
Fourth Normal Form
5NF
Fifth Normal Form
AMP
Access Module Process
ANSI
American National Standards Institute
API
Application Programming Interface
ARC
Teradata Archive/Recovery Utility
ASCII
American Standard Code for Information Interchange
ASE
Account String Expansion
AWS
Administration Workstation
BCNF
Boyce-Codd Normal Form
AWT
AMP Worker Task
BTEQ
Basic Teradata Query Facility
BYNET
Banyan Network (high-speed interconnect)
CICS
Customer Information Control System
CLIv2
Call-Level Interface, Version 2
CNS
Console Subsystem
DB2
DATABASE 2
DBC
Database Computer
DBQAT
Database Query Analysis Tools
DBQL
Database Query Log
DBS
Database System or Database Software
DBW
Database Window
DDE
Dynamic Data Exchange
Introduction to Teradata Warehouse
ix
Preface List of Acronyms
x
DDL
Data Definition Language
DIP
Database Initialization Program
DML
Data Manipulation Language
DNS
Domain Name Source
DSS
Decision Support System
EBCDIC
Extended Binary Coded Decimal Interchange Code
FIPS
Federal Information Processing Standards
GDO
Globally Distributed Object
HI
Hash Index
IBM
International Business Machines Corporation
ID
Identification
IMS
Information Management System
I/O
Input/Output
ISV
Independent Software Vender
JBOD
Just a Bunch Of Disks
JI
Join Index
LAN
Local Area Network
LUN
Logical Unit
MDS
Meta Data Services
MIPS
Millions of Instructions Per Second
MOSI
Micro Operating System Interface
MPP
Massively Parallel Processing
MTDP
Micro Teradata Director Program
MVS
Multiple Virtual Storage
NPPI
Non-Partitioned Primary Index
NUPI
Non-Unique Primary Index
NUSI
Non-Unique Secondary Index
ODBC
Open Database Connectivity
OS/VS
Operating System/Virtual Storage
OTB
Open Teradata Backup
PDE
Parallel Database Extensions
Introduction to Teradata Warehouse
Preface List of Acronyms PE
Parsing Engine
PI
Primary Index
PL/I
Programming Language 1
PJ/NF
Projection-Join Normal Form
PP2
Preprocessor2
PPI
Partitioned Primary Index
PUT
Parallel Upgrade Tool
QCD
Query Capture Database
QCF
Query Capture Facility
RAID
Redundant Array of Independent Disks
RCT
Resource Check Tools
RI
Referential Integrity
SCSI
Small Computer System Interface
SIA
Shared Information Architecture
SMP
Symmetric Multi-Processing
SNMP
Simple Network Management Protocol
SQL
Structured Query Language
SR
Single Request
SSO
Single Sign On
TCP/IP
Transmission Control Protocol/Internet Protocol
TDP
Teradata Director Program
TDSP
Teradata Stored Procedures
Teradata DQM
Teradata Dynamic Query Manager
TPA
Trusted Parallel Application
TS/API
Transparency Series/Application Program Interface
UPI
Unique Primary Index
USI
Unique Secondary Index
VM/CMS
Virtual Machine/Conversational Monitor System
VM
Virtual Machine
vproc
Virtual Processor
VS
Virtual Storage
Introduction to Teradata Warehouse
xi
Preface Technical Information on the Web
Technical Information on the Web The NCR home page (http://www.ncr.com) provides links to numerous sources of information about Teradata. Among the links provided are sites that deal with the following subjects: • • • • • • •
xii
Contacting technical support Enrolling in customer education courses Ordering and downloading product documentation Accessing case studies of customer experiences with the Teradata Database Accessing third-party industry analyses of Teradata Warehouse products Accessing white papers Viewing or subscribing to various online periodicals
Introduction to Teradata Warehouse
Contents
Preface Supported Software Release ............................................................................................ i Changes to This Book ....................................................................................................... i About This Book ................................................................................................................vii List of Acronyms ................................................................................................................ ix Technical Information on the Web..................................................................................xii
Chapter 1: Teradata Warehouse What Is a Data Warehouse............................................................................................. 1–2 The Next Step for the Data Warehouse........................................................................ 1–3 Strategic Queries........................................................................................................... 1–3 Tactical Queries............................................................................................................. 1–3 Teradata Warehouse .................................................................................................... 1–3 Section 1:
Teradata Warehouse Overview
Chapter 2: Teradata Warehouse Overview What Is the Teradata Database...................................................................................... 2–2 Attachment Methods ................................................................................................... 2–2 How to Communicate with the Teradata Database Using SQL............................ 2–2 Purpose in Development................................................................................................ 2–3 Shared Information Architecture .................................................................................. 2–4 Teradata Database Server Software.............................................................................. 2–5 Parallel Upgrade Tool..................................................................................................... 2–6 What Are Teradata Tools and Utilities......................................................................... 2–7 For More Information ................................................................................................... 2–14
Introduction to Teradata Warehouse
xiii
Chapter 3: Teradata Database Architecture SMP and MPP Machines ................................................................................................ 3–2 The BYNET .................................................................................................................... 3–3 Boardless BYNET.......................................................................................................... 3–4 Disk Arrays....................................................................................................................... 3–5 Logical Units.................................................................................................................. 3–5 Pdisks and Vdisks......................................................................................................... 3–5 Cliques............................................................................................................................... 3–6 Hot Standby Nodes ......................................................................................................... 3–7 Virtual Processors............................................................................................................ 3–8 Parsing Engine .............................................................................................................. 3–8 Access Module Processor ............................................................................................ 3–9 AMP Clusters .............................................................................................................. 3–10 Parsing Engine Request Processing ............................................................................ 3–11 The Dispatcher ............................................................................................................ 3–12 The AMPs .................................................................................................................... 3–13 Example: SQL Statement ........................................................................................... 3–13 Parallel Database Extensions ....................................................................................... 3–15 Trusted Parallel Applications ................................................................................... 3–15 PDE and MPP Systems .............................................................................................. 3–15 Start and Stop PDE ..................................................................................................... 3–15 The Teradata File System ............................................................................................. 3–16 Cylinder Read ............................................................................................................. 3–16 Disk I/O Integrity Checking..................................................................................... 3–16 Workstation Types and Available Platforms ............................................................ 3–18 System Console ........................................................................................................... 3–18 Administration Workstation..................................................................................... 3–18 Teradata Database Window......................................................................................... 3–19 How the Database Window Communicates with Teradata Database ............... 3–19 Running DBW ............................................................................................................. 3–19 For More Information ................................................................................................... 3–20
xiv
Introduction to Teradata Warehouse
Chapter 4: International Language Support Character Set Overview.................................................................................................. 4–2 What Is a Repertoire..................................................................................................... 4–2 Character Representation ............................................................................................ 4–2 External and Internal Character Sets ............................................................................ 4–3 Character Data Translation ......................................................................................... 4–3 What Teradata Database Supports ............................................................................ 4–3 Teradata Database Character Data Storage ................................................................. 4–4 Internal Server Character Sets .................................................................................... 4–4 User Data ....................................................................................................................... 4–4 System Dictionary Data ............................................................................................... 4–4 Language Support Modes .............................................................................................. 4–5 Default Character Set for User Data .......................................................................... 4–5 Character Set for System Dictionary Data ................................................................ 4–6 Character Set for Dictionary Data Other Than Object Names ............................... 4–6 Standard Language Support Mode .............................................................................. 4–7 LATIN Character Set.................................................................................................... 4–7 Compatible Languages ................................................................................................ 4–7 Japanese Language Support Mode ............................................................................... 4–8 Advantages of Storing System Dictionary Data Using KANJI1............................ 4–8 Advantages of Storing User Data Using UNICODE............................................... 4–8 Extended Support............................................................................................................ 4–9 For More Information ................................................................................................... 4–10 Section 2:
The Teradata Database Structure
Chapter 5: Structured Query Language (SQL) Why SQL........................................................................................................................... 5–2 What is SQL...................................................................................................................... 5–3 Data Definition Language ........................................................................................... 5–3 Data Control Language ............................................................................................... 5–4 Data Manipulation ....................................................................................................... 5–4 SQL Data Types ............................................................................................................... 5–6 Teradata and ANSI-Compliant Data Types ............................................................. 5–6 Data Type Attributes.................................................................................................... 5–6 Statement Punctuation.................................................................................................... 5–8
Introduction to Teradata Warehouse
xv
SQL Statements and Requests ....................................................................................... 5–9 The SELECT Statement ................................................................................................. 5–10 SELECT Statement and Set Operators..................................................................... 5–10 SELECT Statement and Joins .................................................................................... 5–11 SQL Functions................................................................................................................ 5–12 Scalar Functions .......................................................................................................... 5–12 Aggregate Functions .................................................................................................. 5–12 Ordered Analytical Functions .................................................................................. 5–13 User-Defined Functions................................................................................................ 5–14 Creating User-Defined Functions ............................................................................ 5–14 SQL Statements Related to Functions...................................................................... 5–15 Cursors ............................................................................................................................ 5–16 For More Information ................................................................................................... 5–17
Chapter 6: Application Development Types of SQL Development ........................................................................................... 6–2 Explicit SQL Development .......................................................................................... 6–2 Implicit SQL Development.......................................................................................... 6–2 Embedded SQL Applications ........................................................................................ 6–3 What Is Embedded SQL .............................................................................................. 6–3 How Does an Application Program Use Embedded SQL...................................... 6–3 Supported Languages and Platforms ........................................................................ 6–4 Macros as SQL Applications .......................................................................................... 6–5 SQL Used to Create a Macro....................................................................................... 6–5 Macro Usage.................................................................................................................. 6–6 SQL Used to Modify a Macro ..................................................................................... 6–6 SQL Used to Delete a Macro ....................................................................................... 6–6 Teradata Stored Procedures as SQL Applications...................................................... 6–7 SQL Used to Create Stored Procedures..................................................................... 6–7 Stored Procedure Example......................................................................................... 6–7 SQL Used to Execute a Stored Procedures ............................................................... 6–8 The EXPLAIN Statement.............................................................................................. 6–10 How Is EXPLAIN Useful........................................................................................... 6–10 EXPLAIN With Simple Join Index Example........................................................... 6–10 Third-Party Development ............................................................................................ 6–13 TS/API Products ........................................................................................................ 6–13 Compatible Third-Party Software Products ........................................................... 6–13 Performance Monitor/Application Programming Interface ............................... 6–13 For More Information ................................................................................................... 6–14
xvi
Introduction to Teradata Warehouse
Chapter 7: The Teradata Database Model What is a Relational Model ............................................................................................ 7–2 What is a Relational Database ....................................................................................... 7–3 Set Theory and Relational Database Terminology .................................................. 7–3 Tables, Rows, and Columns........................................................................................... 7–4 Table Constraints.......................................................................................................... 7–4 Permanent and Temporary Tables............................................................................. 7–4 Global Temporary Tables ............................................................................................ 7–4 Volatile Temporary Tables.......................................................................................... 7–5 Derived Tables .............................................................................................................. 7–5 Rows and Columns ...................................................................................................... 7–5 For More Information ..................................................................................................... 7–6
Chapter 8: Data Distribution and Access Methods Teradata Database Indexes ............................................................................................ 8–2 Primary Indexes............................................................................................................... 8–3 Primary Index Characteristics .................................................................................... 8–3 How Are Primary Keys and Primary Indexes Related ........................................... 8–3 Partitioned Primary Indexes.......................................................................................... 8–5 Non-partitioned Primary Indexes.............................................................................. 8–5 How Do Partitioned and Non-Partitioned Primary Indexes Compare................ 8–5 Secondary Indexes........................................................................................................... 8–6 Secondary Index Subtables ......................................................................................... 8–6 How Do Primary and Secondary Indexes Compare............................................... 8–6 Join Indexes ...................................................................................................................... 8–7 Single-Table Join Indexes ............................................................................................ 8–7 Multi-Table Join Indexes ............................................................................................. 8–7 Aggregate Join Indexes................................................................................................ 8–7 Sparse Join Indexes....................................................................................................... 8–8 Hash Indexes.................................................................................................................... 8–9 Index Specification ........................................................................................................ 8–10 Creating Indexes ......................................................................................................... 8–10 Strengths and Weaknesses of Various Types of Indexes...................................... 8–10 Hashing........................................................................................................................... 8–14 Identity Column ............................................................................................................ 8–15 For More Information ................................................................................................... 8–16
Introduction to Teradata Warehouse
xvii
Chapter 9: Data Dictionary What is the Data Dictionary........................................................................................... 9–2 Data Dictionary Content.............................................................................................. 9–2 What Is in a Data Dictionary Table ............................................................................ 9–3 Teradata Database Data Dictionary Views.................................................................. 9–6 What Is in a View.......................................................................................................... 9–6 Why Use Views............................................................................................................. 9–6 Who Uses Data Dictionary Views................................................................................. 9–7 SQL Access to the Data Dictionary ............................................................................... 9–8 For More Information ..................................................................................................... 9–9
Chapter 10: Teradata Meta Data Services Services What Is Metadata........................................................................................................... 10–2 Types of Metadata ......................................................................................................... 10–3 Teradata Meta Data Services ....................................................................................... 10–5 Creating the Teradata Meta Data Repository......................................................... 10–6 Connecting to the Teradata Meta Data Repository ............................................... 10–6 For More Information ................................................................................................... 10–7
Chapter 11: Other Database Objects What Are Views............................................................................................................. 11–2 SQL Statements Related to Views ............................................................................ 11–2 Restrictions on Using Views ..................................................................................... 11–2 What Are Teradata Stored Procedures....................................................................... 11–3 Why Use Stored Procedures ..................................................................................... 11–3 Elements of a Teradata Stored Procedure............................................................... 11–4 What Are Macros........................................................................................................... 11–5 SQL Statements Related to Macros .......................................................................... 11–5 Single-User and Multi-User Macros ........................................................................ 11–5 Macro Processing........................................................................................................ 11–5
xviii
Introduction to Teradata Warehouse
What Are Triggers......................................................................................................... 11–6 Types of Triggers ........................................................................................................ 11–6 When Do Triggers Fire .............................................................................................. 11–6 ANSI-Specified Order................................................................................................ 11–7 Trigger Functions ....................................................................................................... 11–7 SQL Statements Related to Triggers ........................................................................ 11–7 Elements of a Trigger ................................................................................................. 11–8 Restrictions on Triggers........................................................................................... 11–10 For More Information ................................................................................................. 11–11 Section 3:
Teradata Database System Operation
Chapter 12: Normalization and Referential Integrity Normalization ................................................................................................................ 12–2 Normal Forms ............................................................................................................. 12–2 Relational Database Terminology ............................................................................ 12–3 First, Second, and Third Normal Forms .................................................................... 12–5 First Normal Form...................................................................................................... 12–5 Second Normal Form ................................................................................................. 12–5 Third Normal Form.................................................................................................... 12–6 Advantages of Normalization .................................................................................. 12–6 Boyce-Codd Normal Form and Higher Normal Forms .......................................... 12–7 Boyce-Codd Normal Form........................................................................................ 12–7 Fourth Normal Form.................................................................................................. 12–7 Fifth Normal Form ..................................................................................................... 12–7 Referential Integrity ...................................................................................................... 12–8 Referential Integrity in the Teradata Database ...................................................... 12–8 Referential Integrity Terminology ........................................................................... 12–8 Referencing (Child) Table ......................................................................................... 12–9 Referenced (Parent) Table ......................................................................................... 12–9 Why Is Referential Integrity Important................................................................... 12–9 Referential Integrity Constraints............................................................................... 12–11 Referential Constraints ............................................................................................ 12–11 Batch Referential Integrity ...................................................................................... 12–11 Rules for Referential Integrity Constraints ........................................................... 12–12 Referential Constraint Checks ................................................................................ 12–13 For More Information ................................................................................................. 12–14
Introduction to Teradata Warehouse
xix
Chapter 13: Data Communication Between Client and Teradata Database Attachment Methods .................................................................................................... 13–2 CLIv2 for Channel-Attached Systems ....................................................................... 13–3 What CLIv2 for Channel-Attached Clients Does................................................... 13–3 Teradata Director Program ....................................................................................... 13–3 Server............................................................................................................................ 13–4 CLIv2 for Network-Attached Systems ....................................................................... 13–5 What CLIv2 for Network-Attached Clients Does.................................................. 13–5 Micro Teradata Director Program............................................................................ 13–5 Micro Operating System Interface ........................................................................... 13–5 Other Types of Data Communications....................................................................... 13–7 WinCLI ......................................................................................................................... 13–7 ODBC............................................................................................................................ 13–7 JDBC ............................................................................................................................. 13–7 For More Information ................................................................................................... 13–8
Chapter 14: Reliability Software Fault Tolerance.............................................................................................. 14–2 Vproc Migration.......................................................................................................... 14–2 Fallback Tables............................................................................................................ 14–3 AMP Clusters .............................................................................................................. 14–4 One-Cluster Configuration ....................................................................................... 14–4 Smaller Cluster Configuration.................................................................................. 14–5 Journaling .................................................................................................................... 14–6 Teradata Archive/Recovery ..................................................................................... 14–7 Table Rebuild Utility .................................................................................................. 14–7 Hardware Fault Tolerance ........................................................................................... 14–8 For More Information ................................................................................................. 14–10
xx
Introduction to Teradata Warehouse
Section 4:
Management and Monitoring
Chapter 15: Concurrency Control and Transaction Recovery What is Concurrency Control...................................................................................... 15–2 What is Recovery........................................................................................................... 15–3 Concept of a Transaction.............................................................................................. 15–4 Definition of a Transaction........................................................................................ 15–4 Definition of Serializability ....................................................................................... 15–4 Transaction Semantics ............................................................................................... 15–4 ANSI Mode Transactions ............................................................................................. 15–5 BEGIN TRANSACTION/END TRANSACTION Statements ............................. 15–5 Roll Back an ANSI Transaction ................................................................................ 15–5 Teradata Mode Transactions ....................................................................................... 15–6 BEGIN TRANSACTION/END TRANSACTION Statements ............................. 15–6 Roll Back a Teradata Mode Transaction.................................................................. 15–6 Concept of a Lock .......................................................................................................... 15–7 Overview of Teradata Database Locking................................................................ 15–7 Why Do Database Management Systems Require Locking................................. 15–7 Lock Levels .................................................................................................................. 15–8 Levels of Locks Types ................................................................................................ 15–9 Automatic Database Lock Levels ........................................................................... 15–10 Deadlocks and Deadlock Resolution..................................................................... 15–10 Host Utility Locks........................................................................................................ 15–11 HUT Lock Types....................................................................................................... 15–11 HUT Lock Characteristics ....................................................................................... 15–11 System and Media Recovery ..................................................................................... 15–12 System Restarts ......................................................................................................... 15–12 Transaction Recovery............................................................................................... 15–12 Down AMP Recovery .............................................................................................. 15–13 Two-Phase Commit Protocol..................................................................................... 15–14 Definition of Participant .......................................................................................... 15–14 Definition of Coordinator........................................................................................ 15–14 For More Information ................................................................................................. 15–15
Introduction to Teradata Warehouse
xxi
Chapter 16: Database Management and Analysis Tools Teradata Tools and Utilities - Archive Utilities ........................................................ 16–2 Teradata Archive/Recovery Utility......................................................................... 16–2 Open Teradata Backup .............................................................................................. 16–2 Teradata Tools and Utilities - Data Load and Export Utilities ............................... 16–3 Teradata MultiLoad ................................................................................................... 16–3 Teradata FastLoad ...................................................................................................... 16–3 Teradata Parallel Data Pump.................................................................................... 16–4 Teradata FastExport Utility....................................................................................... 16–4 Database Management Tools....................................................................................... 16–5 Teradata Database - Active Session and Configuration ....................................... 16–5 System Resource Management.................................................................................... 16–7 Teradata Database - Ferret Utility............................................................................ 16–7 Teradata Database - Priority Scheduler Utility ...................................................... 16–7 Teradata Tools and Utilities - Teradata Statistics Wizard .................................... 16–8 Teradata Database - Teradata Dynamic Query Manager..................................... 16–9 Teradata Database - Teradata MultiTool ................................................................. 16–11 Database Query Analysis Tools ................................................................................ 16–12 Teradata Tools and Utilities - Teradata Index Wizard........................................... 16–13 What Can the Teradata Index Wizard Do ............................................................ 16–13 Demographics ........................................................................................................... 16–14 Teradata Database - Query Capture Facility ........................................................... 16–15 QCD Schema Improvement .................................................................................... 16–15 Teradata Index Wizard Support............................................................................. 16–15 Teradata Tools and Utilities - Teradata Visual Explain ......................................... 16–16 Teradata Database - Database Query Log ............................................................... 16–17 Teradata Database - Target-Level Emulation.......................................................... 16–18 Teradata Tools and Utilities - Teradata System Emulation Tool.......................... 16–19 Teradata Database - Database Object Use Count ................................................... 16–20 Query Facilities ............................................................................................................ 16–21 Teradata Tools and Utilities - Basic Teradata Query Utility ................................. 16–22 BTEQ Support ........................................................................................................... 16–22 BTEQ Communication............................................................................................. 16–22 Teradata Tools and Utilities - Teradata SQL Assistant.......................................... 16–23 Teradata Tools and Utilities - Preprocessor2........................................................... 16–25 For More Information ................................................................................................. 16–26
xxii
Introduction to Teradata Warehouse
Chapter 17: Security and Integrity Security and Integrity ................................................................................................... 17–2 System Integrity............................................................................................................. 17–3 System Security.............................................................................................................. 17–4 Resource Access Control .............................................................................................. 17–5 User Identifiers ........................................................................................................... 17–5 Client Identifiers ......................................................................................................... 17–5 Logon Policies ............................................................................................................. 17–5 TDP Security................................................................................................................ 17–6 Single Sign On............................................................................................................. 17–7 Encryption ...................................................................................................................... 17–9 Network Data Encryption ......................................................................................... 17–9 Logon Encryption and the Teradata Gateway ....................................................... 17–9 Security Features ......................................................................................................... 17–10 Password Attributes................................................................................................. 17–10 User-Level Password Attributes ............................................................................ 17–11 DBC.DBase Table...................................................................................................... 17–11 SQL Used to Control Logon....................................................................................... 17–12 Data Access Control ................................................................................................. 17–12 Ownership and Implicit Rights .............................................................................. 17–12 System Views for Access Information................................................................... 17–13 Security Policies and Physical Access Control........................................................ 17–14 Principle Considerations of a Security Policy ...................................................... 17–14 Key Implementation Elements of a Security Policy ............................................ 17–14 Auditing and Accountability..................................................................................... 17–15 For More Information ................................................................................................. 17–16
Chapter 18: System Administration Space Allocation for Databases and Users................................................................. 18–2 Databases and Users .................................................................................................. 18–2 How to Create a Finance and Administration Database ..................................... 18–2 How to Create Databases .......................................................................................... 18–4 How to Create Users .................................................................................................. 18–4 Roles and Profiles for Users ......................................................................................... 18–6
Introduction to Teradata Warehouse
xxiii
Accounting ..................................................................................................................... 18–7 Session Management.................................................................................................. 18–7 Establishing a Session ................................................................................................ 18–7 Logon Operands ......................................................................................................... 18–7 Session Requests ......................................................................................................... 18–7 Account String Expansion......................................................................................... 18–8 Account Performance Groups .................................................................................. 18–8 Maintenance Utilities .................................................................................................. 18–10 For More Information ................................................................................................. 18–14
Chapter 19: System Monitoring Teradata Manager ......................................................................................................... 19–2 System and Configuration Status................................................................................ 19–6 Resource Usage Monitoring......................................................................................... 19–7 Resource Usage Tables and Views........................................................................... 19–7 Resource Usage Data Categories.............................................................................. 19–7 Resource Usage Data Handling................................................................................ 19–8 Resource Usage Macros ............................................................................................. 19–8 How to Control Collection and Logging of Resource Usage Data...................... 19–8 Summary Mode .......................................................................................................... 19–9 Performance Monitoring ............................................................................................ 19–10 The TDPTMON......................................................................................................... 19–10 System Management Facility .................................................................................. 19–10 The Performance Monitor/Application Interface ............................................... 19–10 For More Information ................................................................................................. 19–11
Index.......................................................................................................................... Index–1
xxiv
Introduction to Teradata Warehouse
Chapter 1:
Teradata Warehouse This chapter presents an overview of the Teradata® Warehouse. Topics include: • •
What is a data warehouse What is the next step in the development of the data warehouse
Introduction to Teradata Warehouse
1–1
What Is a Data Warehouse
What Is a Data Warehouse Originally, the data warehouse was a historical database containing data derived from an active operational database. The data in the warehouse was: • • • •
Subject-oriented Integrated Identified by a timestamp Nonvolatile, that is, nothing was added or removed
Rows in the tables supporting the operational database were loaded into a historical database (the data warehouse) after they exceeded some well-defined date. To support this capability, the data in the data warehouse contained a timestamp, which distinguished it from the data in the tables of the operational database.
1–2
Introduction to Teradata Warehouse
The Next Step for the Data Warehouse
The Next Step for the Data Warehouse The concept of active data warehousing evolved as part of the data warehouse environment. The data warehouse was an enterprise-wide, centralized database that stored information gathered from operational databases. This data was typically used to make strategic business decisions. The active concept takes the traditional data warehouse one step further by allowing you to ask questions that produce answers that are important not only to strategic decision making but to tactical decision making as well.
Strategic Queries Strategic queries are used when taking a proactive approach to the future. They can produce information that you can use to develop a cohesive long-term plan or course of action. The stored data that supports strategic queries must be historical in nature so that it provides a fair representation of what has happened in the past. Strategic queries involve processing volumes of data, and because the end result will provide information that is used in the long term, response time becomes less critical. Queries written to support the strategic decision-making process are typically ad-hoc and are generally not repeated.
Tactical Queries Tactical queries are useful in preparing for the future too, the near future, in that they are reactive and event driven. Tactical queries have some of the data requirements of strategic queries, in that they often act on historical information. Because strategic queries provide information that supports long-term decision making, the data need not be the latest. Because tactical queries support short-term decisions, the data from which tactical queries derive answers must be current or fresh. How fresh the data must be depends on the questions you ask.
Teradata Warehouse In the active environment of the Teradata Warehouse, data is captured from many sources, for example, customer orders, inventory and shipping applications, direct mail and e-mail, phone calls, and so forth. This data is stored in the Teradata Warehouse along with data from operational systems and other sources. Teradata provides utilities that load data in a timely, continuous fashion or in batch loads. This data provides a single source, or one version of the truth, for all those who seek information from the warehouse. The Teradata Warehouse encompasses not only the Teradata Database, the information repository, but also Teradata Tools and Utilities, a comprehensive suite of management tools and utilities.
Introduction to Teradata Warehouse
1–3
The Next Step for the Data Warehouse
The suite is organized into the following functional categories:
1–4
The following category of utility…
Is used…
mainframe
in a mainframe environment.
Teradata Utility Pak
in a network-attached environment.
Teradata preprocessors
to access the Teradata Database by interpreting Teradata SQL statements in C or COBOL programs.
load and unload
to load data into and unload data from the Teradata Database.
database management utilities
to control the Teradata Database.
query analysis tools
to analyze the performance of the Teradata Database and improve the efficiency of the queries run against it.
storage management
to back up and restore data on the Teradata Database.
Teradata Meta Data Services t
to store, administer, and navigate the metadata in a Teradata Warehouse.
Introduction to Teradata Warehouse
Section Contents
Section 1:
Teradata Warehouse Overview
Introduction to Teradata Warehouse
Section Contents
Introduction to Teradata Warehouse
Chapter 2:
Teradata Warehouse Overview This chapter presents an overview of the Teradata Warehouse and its components: Topics include: • • • • • •
A definition of the Teradata Database Purpose in development Shared information architecture Teradata Database server software Parallel Upgrade Tool Teradata Tools and Utilities
Introduction to Teradata Warehouse
2–1
What Is the Teradata Database
What Is the Teradata Database The Teradata Warehouse evolved from the concept of an enterprise-wide, centralized database that was used to store information gathered from operational databases. The Teradata Database hardware and software and Teradata Tools and Utilities provide a complete relational database management system to support the an active data warehouse concept.
Attachment Methods To support its role in the active environment, the Teradata Database can use either of two attachment methods to connect to other operational computer systems as illustrated in the following table: This attachment method…
Allows the system to be attached…
channel
directly to an I/O channel of a mainframe computer.
network
to intelligent workstations through a Local Area Network (LAN).
How to Communicate with the Teradata Database Using SQL Structured Query Language (SQL) is the language of relational database communication. To manipulate data in the Teradata Database, you issue the appropriate SQL statement. With the Teradata Database, you can access, store, and operate on data using Teradata Structured Query Language (Teradata SQL). Teradata SQL, which is broadly compatible with IBM and ANSI SQL, extends the capabilities of SQL by adding Teradata-specific extensions to the generic SQL statements. For more information about SQL, see Chapter 5: “Structured Query Language (SQL).” When you develop applications for the Teradata Database, you should use the most current Teradata SQL syntax because it is the most ANSI-compliant. Teradata SQL still supports older applications written in previous non-ANSI-compliant versions of Teradata SQL. You can run transactions in either Teradata or ANSI mode and these modes can be set or changed. Teradata has an international customer base. To accommodate communications in different languages, Teradata supports non-Latin character sets, for example, Japanese, Chinese, and so forth. For detailed information about international character set support, see Chapter 4: “International Language Support.” Users of the client systems send requests to the Teradata Database through a choice of supported utilities and interfaces. For information about the interfaces, see “What Are Teradata Tools and Utilities” on page 2-7 and “Teradata Database Server Software” on page 2-5.
2–2
Introduction to Teradata Warehouse
Purpose in Development
Purpose in Development Teradata has designed a database that allows users to view and manage large amounts of data as a collection of related tables. Some of the capabilities of the Teradata Database are listed in the following table: Teradata Database provides…
That…
capacity
includes: • Terabytes of detailed data stored in billions of rows • Thousands of millions of instructions per second (MIPS) to process data
parallel processing
makes Teradata Database faster than other relational systems.
single data store
can be accessed by network-attached and channel-attached systems. supports the requirements of many diverse clients.
fault tolerance
automatically detects and recovers from hardware failures.
data integrity
ensures that transactions either complete or rollback to a stable state if a fault occurs.
scalable growth
allows expansion without sacrificing performance.
SQL
serves as a standard access language that permits users to control data.
Teradata developers designed the Teradata Database from mostly off-the-shelf hardware components. The result was an inexpensive, high-quality system that exceeded the performance of conventional relational database management systems. The hardware components of the Teradata Database evolved from those of a simple database machine into those of a general-purpose, massively parallel computer running the database software as a trusted parallel application (TPA). The architecture includes both single-node, symmetric multi-processing (SMP) systems and multi-node, massively parallel processing (MPP) systems in which the distributed functions communicate by means of a fast interconnect structure. The interconnect structure in the current architecture is the BYNET for MPP systems and the boardless BYNET for SMP systems.
Introduction to Teradata Warehouse
2–3
Shared Information Architecture
Shared Information Architecture A design goal of the Teradata Database was to provide a single data store for a variety of client architectures. This single source approach greatly reduces data duplication and inaccuracies that can creep into data that is maintained in multiple stores. This approach to data storage is known as the single version of the truth, and Teradata used Shared Information Architecture (SIA) to implement the database. SIA eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, network-attached workstations, and personal computers can access and manipulate the same database simultaneously. The following figure illustrates the principle of the SIA. In this figure the mainframes are attached via channel connections and other systems are attached via network connections.
IBM MVS mainframe
Teradata Database single data store Local Area Network
IBM VM mainframe
Personal Computer (running Windows)
UNIX workstation 1091F001
2–4
Introduction to Teradata Warehouse
Teradata Database Server Software
Teradata Database Server Software Teradata Database program software resides on the server and implements the relational database environment. The server software includes the following functional modules: Teradata Database Server Software This module…
Provides…
Database Window
a tool that you can use to control the operation of the Teradata Database.
Teradata Gateway
communications support. The serverresident program provides a pathway for applications running on network-attached clients to access the Teradata Database. The Teradata Gateway runs as a separate operating system task. The Gateway software validates messages from clients that generate sessions over the network and it controls encryption.
Parallel Data Extensions (PDE)
a software interface layer on top of the operating system that enables the database to operate in a parallel environment. For more information about PDE, see “Parallel Database Extensions” on page 3-15.
Teradata Database management software:
the following modules: •
Request dispatcher
•
Session controller
•
Access module processor (AMP)
•
Teradata file system
For more information about the Teradata file system, see “The Teradata File System” on page 3-16. Parsing Engine
Introduction to Teradata Warehouse
the following modules: •
Parser
•
Optimizer
•
Step Generator
•
Dispatcher
2–5
Parallel Upgrade Tool
Parallel Upgrade Tool The Parallel Upgrade Tool (PUT) automates much of the installation process for Teradata Database software. There are two major operational modes for PUT:
2–6
The operational mode…
Does the following…
Major upgrade
upgrades one or more software products to the next version.
Patch upgrade
applies patch packages to one or more software products.
Introduction to Teradata Warehouse
What Are Teradata Tools and Utilities
What Are Teradata Tools and Utilities Teradata Database runs with or without a channel- or network-attached client. Teradata Tools and Utilities is a comprehensive suite of management tools and utilities designed to operate in the client environment. The information in the following tables describes the available Teradata Tools and Utilities that can be installed on the client, recognizing that the client may be the computer system that runs the Teradata Database program software as well.
Introduction to Teradata Warehouse
2–7
What Are Teradata Tools and Utilities
The following table contains information about the utilities available for use on channel-attached mainframe clients: Mainframe Utilities
2–8
This package…
Provides…
For…
Basic Teradata Query (BTEQ)
an interactive and batch query processor/report generator
channel-attached clients.
Customer Information Control System (CICS)
an interface that enables CICS macro or commandlevel application programs to access Teradata Database resources
Host Utility Consoles (HUTCNS)
access to a number of AMP-based utilities
IBM IMS/DC
provides an Information Management System (IMS) interface to the Teradata Database
Teradata Archive/Recovery Utility
a means to save and restore data
Teradata Call-Level Interface Version 2 (CLIv2)
a collection of callable service routines that provide the interface between applications and the Teradata Gateway. The Gateway is the interface between CLI and the server
Teradata Director Program (TDP)
a high-performance interface for messages sent between the client and the Teradata Database
Introduction to Teradata Warehouse
What Are Teradata Tools and Utilities Mainframe Utilities This package…
Provides…
For…
Teradata C, COBOL, and PL/I Preprocessor2 (PP2)
a method of accessing data stored in the Teradata Database.
channel-attached clients.
Preprocessor2 interprets and expands Teradata SQL statements incorporated into an application program. Teradata Transparency Series/Application (TS/API)
gateway services allowing products that access either DB2 or SQL/DS databases to access data stored on the Teradata Database
The following table contains information about the Teradata Tools and Utilities available for use by channel- and network-attached-clients: Teradata Utility Pack This package…
Provides…
For…
BTEQ
an interactive and batch query processor/report generator
channel- and networkattached clients.
ODBC
access to the Teradata Database from various tools, increasing the portability of access
network-attached clients.
OLE DB provider
an interface for accessing and manipulating all types of data
Teradata Administrator
an interface that you can use to perform database administration tasks
Teradata Call-Level Interface Version 2 (CLIv2)
callable service routines that provide the interface between applications and the Teradata Gateway. Teradata Gateway is the interface between CLI and the server.
Introduction to Teradata Warehouse
channel- and networkattached clients.
2–9
What Are Teradata Tools and Utilities Teradata Utility Pack This package…
Provides…
For…
Teradata Driver for JDBC Interface
platform-independent, Java-application access to the Teradata Database from various tools increasing portability of data
network-attached clients.
Teradata MultiTool
an interface to various Teradata Database utilities
Teradata SQL Assistant
a means of retrieving data from any ODBCcompliant database server and of manipulating and storing the data on your desktop PC
Teradata Tools and Utilities provides tools that you can use to develop applications that access the Teradata Database by interpreting SQL statements in C, COBOL, or Programming Language 1 (PL/I) programs. The following table contains information about available preprocessors for use by channeland network-attached clients: Teradata Preprocessors - Application Development
2 – 10
This package…
Provides…
For…
Teradata COBOL Preprocessor
a mechanism for embedding SQL in COBOL programs
channel-attached clients and some networkattached clients.
Teradata C Preprocessor
a mechanism for embedding SQL in C programs
channel- and networkattached clients.
Teradata PL/I Preprocessor
a mechanism for embedding SQL in PL/I programs
channel-attached clients.
Introduction to Teradata Warehouse
What Are Teradata Tools and Utilities
The following table contains information about the load and unload utilities available for use by channel- and network-attached-clients: Load and Unload Utilities This package…
Provides…
For…
Data Connector
a block-level I/O interface to one or more access modules that interface to a data storage device
channel- and networkattached clients.
Teradata FastExport
a means of extracting large volumes of data from the Teradata Database
Teradata FastLoad
high-performance data loading from client files into empty tables
Teradata MultiLoad
high-performance data maintenance, including inserts, updates, and deletions to existing tables
Teradata Tools and Utilities Access Modules
a block-level I/O interface to data residing on a specific external data storage device
Teradata TPump
continuous update of tables; performs insert, update, and delete operations or a combination of these operations on multiple tables using the same source feed
Teradata Warehouse Builder
a means to load data into and export data from any accessible database in the Teradata Database or other data store for which an access operator or an access module exists
Introduction to Teradata Warehouse
2 – 11
What Are Teradata Tools and Utilities
The following table contains information about the database management tools available for use by channel- and network-attached-clients: Database Management Utilities This utility…
Provides…
For…
Teradata Dynamic Query Manager
a means to manage access to and use of the Teradata Database resources.
channel- and networkattached clients.
Teradata Manager
a graphical-based systems management platform containing a suite of specialized tools and applications for monitoring and controlling Teradata Database resource usage on one or more systems
network-attached clients.
Teradata Performance Monitor
an orderly presentation of performance, usage, status, contention, and availability data for Teradata Database at the overall, resource, and session levels
The following table contains information about Teradata Database Query Analysis Tools (DBQAT) for use by network-attached clients: Database Query Analysis Tools
2 – 12
This utility…
Provides…
For…
Teradata Index Wizard
analyses of various SQL query workloads and suggests candidate indexes to enhance performance of those queries
network-attached clients.
Teradata Statistics Wizard
automation for collecting workload statistics, or selecting recommended indexes or columns for statistics collection for re-collection
Introduction to Teradata Warehouse
What Are Teradata Tools and Utilities Database Query Analysis Tools This utility…
Provides…
For…
Teradata System Emulation Tool
the capability to examine the query plans generated by the test system optimizer as if the queries were processed on the production system
network-attached clients.
Teradata Visual Explain
a simplified depiction of the execution plan of complex SQL statements
The following table contains information about the storage management utilities available for use by channel- and network-attached-clients: Storage Management Utilities This utility…
Provides…
For…
Archive/Recovery
a means of archiving data to tape and restoring tape data to the Teradata Database
channel-attached clients.
Open Teradata Backup (OTB) includes the following:
open architecture products for backup and restore functions for Microsoft Windows clients
network-attached clients.
• NetVault • NetBackup
Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.
Introduction to Teradata Warehouse
2 – 13
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
Archive utilities
Teradata Archive/Recovery Utility Reference
BTEQ
Basic Teradata Query Reference
Communication using CLIv2
Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems
Database Query Log
Database Administration Data Dictionary Performance Optimization SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing
Embedded SQL
Teradata Preprocessor2 for Embedded SQL Programmer Guide SQL Reference: Stored Procedures and Embedded SQL
General Teradata Database architecture
Database Design
JDBC
Teradata Driver for the JDBC Interface User Guide
Load and unload utilities
Teradata FastExport Reference Teradata FastLoad Referencee Teradata MultiLoad Reference Teradata Parallel Data Pump Reference
ODBC
Teradata ODBC Driver User Guide
Parallel Upgrade Tool
Parallel Upgrade Tool (PUT) for MP-RAS User Guide Parallel Upgrade Tool (PUT) for Windows NT and Windows 2000 User Guide.
Preprocessor2
2 – 14
Teradata Preprocessor2 for Embedded SQL Programmer Guide
Introduction to Teradata Warehouse
For More Information IF you want to learn more about…
THEN see…
Priority Scheduler
Utilities - Volume 2, G-S
Query Capture Database
Database Design Teradata Manager User Guide SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing
SQL syntax and lexicon
SQL Reference: Fundamentals
Teradata Database utilities
Utilities
Teradata Director Program
Teradata Director Program Reference
Teradata Dynamic Query Manager
Teradata Dynamic Query Manager Administrator Guide Teradata Dynamic Query Manager User Guide
Teradata Index Wizard
Teradata Index Wizard User Guide
Teradata Manager
Teradata Manager User Guide
Teradata SQL Assistant
Teradata SQL Assistant for Microsoft Windows User Guide
Teradata Statistics Wizard
Teradata Statistics Wizard User Guide
Teradata System Level Emulation
Database Design Teradata System Emulation Tool User Guide
Teradata Visual Explain
Teradata Visual Explain User Guide
TS/API products
Teradata Transparency Series/Application Programming Interface User Guide
Introduction to Teradata Warehouse
2 – 15
For More Information
2 – 16
Introduction to Teradata Warehouse
Chapter 3:
Teradata Database Architecture This chapter briefly describes the Teradata Database hardware components and software architecture. The hardware that supports Teradata Database software is based on off-theshelf Symmetric Multiprocessing (SMP) technology. The hardware can be combined with a communications network that connects the SMP systems to form Massively Parallel Processing (MPP) systems. Topics include: • • • • • • • • • •
SMP and MPP platforms and the BYNET Disk arrays Cliques Hot standby nodes Virtual processors Request processing Parallel Database Extensions Teradata file system Workstations Teradata Database Window
Introduction to Teradata Warehouse
3–1
SMP and MPP Machines
SMP and MPP Machines The components of the SMP and Massively Parallel Processing (MPP) hardware platforms are: Component
Processor Node
Description
A hardware assembly containing several, tightly coupled, Central Processing Units (CPUs) in an SMP configuration. A single processor node is connected to one or more disk arrays with the following installed on the node:
Function
Serves as the hardware platform upon which the database software operates.
• Teradata Database software • Client interface software • Operating system • Multiple processors with shared-memory • Failsafe power provisions Note: An MPP is a configuration of two or more loosely coupled SMP nodes with shared SCSI access to multiple disk arrays. BYNET
Hardware interprocessor network to link nodes on an MPP system. Note: Single-node SMP systems use a softwareconfigured virtual BYNET driver to implement BYNET services.
Implements broadcast, multicast, or point-to-point communication between processors, depending on the situation.
These platforms use virtual processors that run a set of software processes on a node under the Parallel Database Extensions (PDE). Virtual processors (vprocs) provide the parallel environment that enables the Teradata Database to run on SMP and MPP systems. For more information about the PDE and vprocs, see the following sections in this chapter: • •
3–2
“Parallel Database Extensions” on page 3-15 “Virtual Processors” on page 3-8
Introduction to Teradata Warehouse
SMP and MPP Machines
The BYNET At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. However, this view does an injustice to the BYNET, because the capabilities of the network range far beyond those of a simple system bus. The BYNET also possesses high-speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions. A multinode system has at least two BYNETs. This creates a fault-tolerant environment and enhances interprocessor communication. Load-balancing software optimizes the transmission of messages over the BYNETs. If one BYNET should fail, the second can handle the traffic. The total bandwidth for each network link to a processor node is 10 megabytes. The total throughput available for each node is 20 megabytes, because each node has two network links and the bandwidth is linearly scalable. For example, a 16-node system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes. The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes. The following figure shows how the BYNET connects individual SMP nodes to create an MPP system.
BYNET Interconnect
SMP
SMP
SMP
SMP
SMP Nodes
SCSI Buses
Disk Arrays GG01B002
Introduction to Teradata Warehouse
3–3
SMP and MPP Machines
Boardless BYNET Single-node SMP systems use Boardless BYNET (or virtual BYNET) software to simulate the BYNET hardware driver. Both the SMP and MPP machines run the set of software processes called vprocs on a node under the Parallel Database Extensions (PDE) software layer. For more information about the PDE, see “Parallel Database Extensions” on page 3-15. Vprocs come in two types: Access Module Processors (AMPs) and Parsing Engines (PEs) For more detailed information on vprocs see “Virtual Processors” on page 3-8.
3–4
Introduction to Teradata Warehouse
Disk Arrays
Disk Arrays Teradata employs Redundant Array of Independent Disks (RAID) storage technology to provide data protection at the disk level. You use the RAID Manager to group disk drives into arrays to ensure that data is available in the event of a disk failure. Each array typically consists of from one to four ranks of disks, with up to five disks per rank. Redundant implies that either data, functions, or components are duplicated in the architecture of the array.
Logical Units The RAID Manager uses drive groups. A drive group is a set of drives that have been configured into one or more Logical Units (LUNs). A LUN is a portion of every drive in a drive group. This portion is configured to represent a single disk. Each LUN is uniquely identified and on NCR UNIX MP-RAS systems is sliced into one or more UNIX slices. The operating system recognizes a LUN as its disk and is not aware that it is actually writing to spaces on multiple disk drives. This technique allows RAID technology to provide data availability without affecting the operating system. The PDE translates LUNs into virtual disks (vdisks) using slices (in NCR UNIX MP-RAS) or partitions (in Microsoft Windows 2000).
Pdisks and Vdisks A pdisk is the portion of a LUN that is assigned to an AMP. For information about the role that AMPs play in the Teradata Database architecture, see “Virtual Processors” on page 3-8. Each pdisk is uniquely identified and independently addressable. The group of pdisks assigned to an AMP is collectively identified as a vdisk. Using vdisks instead of direct connections to physical disk drives permits the use of RAID technology without affecting Teradata Database.
Introduction to Teradata Warehouse
3–5
Cliques
Cliques The clique is a feature of multinode systems that physically groups nodes together by multiported access to common disk array units. Inter-node disk array connections are made using SCSI buses. Shared SCSI-II paths enable redundancy to ensure that loss of a processor node or disk controller does not limit data availability. The nodes do not share data. They only share access to the disk arrays. The following figure illustrates a four-node clique.
Node 1 MCA Q 720
MCA
Node 3
Node 2 MCA Q 720
MCA
MCA Q 720
MCA
Node 4 MCA
MCA
Q 720
SCSI
D A C
GG01A003
A clique is the mechanism that supports the migration of vprocs, the AMPs and PEs under PDE, following a node failure. If a node in a clique fails, then AMP and PE vprocs migrate to other nodes in the clique and continue to operate while recovery occurs on their home node. PEs for channel-attached hardware cannot migrate because they are dependent on the hardware that is physically attached to the node to which they are assigned. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMPs.
3–6
Introduction to Teradata Warehouse
Hot Standby Nodes
Hot Standby Nodes The Hot Standby Node feature allows spare nodes to be incorporated into the production environment so that the Teradata Database can take advantage of the presence of the spare nodes to improve availability. A hot standby node is a node that: • • •
Is a member of a clique Does not normally participate in the trusted parallel application (TPA) Can be brought into the TPA to compensate for the loss of a node in the clique
Node 1 MCA
Node 3
Node 2
MCA
Q 720
MCA Q 720
MCA
MCA Q 720
MCA
Hot Standby Node MCA
MCA
Q 720
SCSI
D A C
1091A001
Configuring a hot standby node can eliminate the system-wide performance degradation associated with the loss of a single node in a single clique. When a node fails, the Hot Standby Node feature migrates all AMP and PE vprocs on the failed node to other nodes in the system, including the node that you have designated as the hot standby. The hot standby node becomes a production node. When the failed node returns to service, it becomes the new hot standby node. Configuring hot standby nodes eliminates: • •
Restarts that are required to bring a failed node back into service. Degraded service period when vprocs have migrated to other nodes in a clique.
Introduction to Teradata Warehouse
3–7
Virtual Processors
Virtual Processors The versatility of the Teradata Database is based on virtual processors (vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under the Teradata Parallel Database Extensions (PDE) within the multitasking environment of the operating system. The following table contains information about the two types of vprocs: Type
Description
PE
The PE performs session control and dispatching tasks as well as parsing functions.
AMP
The AMP performs database functions to retrieve and update data on the vdisks.
A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128. Each vproc is a separate, independent copy of the processor software, isolated from other vprocs, but sharing some of the physical resources of the node, such as memory and CPUs. Multiple vprocs can run on an SMP platform or a node. Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This message communication is done using the Boardless BYNET Driver software on singlenode platforms or BYNET hardware and BYNET Driver software on multinode platforms.
Parsing Engine The PE is the vproc that communicates with the client system on one side and with the AMPs (via the BYNET) on the other side. Each PE executes the database software that manages sessions, decomposes SQL statements into steps, possibly parallel, and returns the answer rows to the requesting client.
3–8
Introduction to Teradata Warehouse
Virtual Processors
The PE software consists of the following elements: Parsing Engine Elements
Process
Parser
Decomposes SQL into relational data management processing steps
Optimizer
Determines the most efficient path to access data
Generator
Generates and packages steps
Dispatcher
Receives processing steps from the parser and sends them to the appropriate AMPs Monitors the completion of steps and handles errors encountered during processing.
Session Control
Manages session activities, such as logon, password validation, and logoff Recovers sessions following client or server failures
Access Module Processor The AMP is the heart of the Teradata Database. The AMP is a vproc that controls the management of the Teradata Database and the disk subsystem, with each AMP being assigned to a vdisk. AMP functions include…
For example…
database management tasks
accounting. journaling. locking tables, rows, and databases. during query processing: •
Sorting
•
Joining data rows
•
Aggregation
output data conversion. file-system management.
Introduction to Teradata Warehouse
disk space management.
3–9
Virtual Processors
Each AMP, as represented in the following figure, manages a portion of the physical disk space. Each AMP stores its portion of each database table within that space.
Parsing Engine
Parsing Engine
BYNET
AMP
AMP
AMP
AMP
Disk Storage Disk Storage Disk Storage Disk Storage
AMP Clusters AMPs are grouped into logical clusters to enhance the fault-tolerant capabilities of the Teradata Database. For more information on this method of creating additional fault tolerance in a system see Chapter 14: “Reliability.”
3 – 10
Introduction to Teradata Warehouse
Parsing Engine Request Processing
Parsing Engine Request Processing SQL is the language that you use to make requests of the Teradata Database. The SQL parser handles all incoming SQL requests. It processes an incoming request as follows: Stage
1
Process
The Parser looks in the Request cache to determine if the request is already there. IF the request is…
THEN the Parser…
in the Request cache
reuses the plastic steps found in the cache and passes them to gncApply. Go to stage 8 after checking access rights (stage 4). Plastic steps are directives to the database management system that do not contain data values.
not in the Request cache
2
begins processing the request with the Syntaxer.
The Syntaxer checks the syntax of an incoming request. IF there are…
THEN the Syntaxer…
no errors
converts the request to a parse tree and passes it to the Resolver.
errors
passes an error message back to the requestor and stops.
3
The Resolver adds information from the Data Dictionary (or cached copy of the information) to convert database, table, view, stored procedure, and macro names to internal identifiers.
4
The security module checks access rights in the Data Dictionary. IF the access rights are…
THEN the Security module…
valid
passes the request to the Optimizer.
not valid
aborts the request and passes an error message and stops.
Introduction to Teradata Warehouse
3 – 11
Parsing Engine Request Processing Stage
Process
5
The Optimizer determines the most effective way to implement the SQL request.
6
The Optimizer scans the request to determine where locks should be placed, then passes the optimized parse tree to the Generator.
7
The Generator transforms the optimized parse tree into plastic steps and passes them to gncApply.
8
gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps. Concrete steps are directives to the AMPs that contain any needed user- or session-specific values and any needed data parcels.
9
gncApply passes the concrete steps to the Dispatcher.
The Dispatcher The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET to be distributed to the AMP database management software as follows: Stage
Process
1
The Dispatcher receives concrete steps from gncApply.
2
The Dispatcher places the first step on the BYNET; tells the BYNET whether the step is for one AMP, several AMPS, or all AMPs; and waits for a completion response. Whenever possible, the Teradata Database performs steps in parallel to enhance performance. If there are no dependencies between a step and the following step, the following step can be dispatched before the first step completes, and the two execute in parallel. If there is a dependency, for example, the following step requires as input the data produced by the first step, then the following step cannot be dispatched until the first step completes.
3
3 – 12
The Dispatcher receives a completion response from all expected AMPs and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done.
Introduction to Teradata Warehouse
Parsing Engine Request Processing
The AMPs AMPs obtain the rows required to process the requests (assuming that the AMPs are processing a SELECT statement). The BYNET transmits messages to and from the AMPs. An AMP step can be sent to one of the following: • • •
One AMP A selected set of AMPs, called a dynamic BYNET group All AMPs in the system
The following figure is based on the example in the next section. If access is through a primary index, and a request is for a single row, the PE transmits steps to a single AMP, as shown at PE1. If the request is for many rows (an allAMP request), the PE makes the BYNET broadcast the steps to all AMPs as shown in PE2. To minimize system overhead, the PE can send a step to a subset of AMPs.
PE 2
PE 1 BYNET or Boardless BYNET
AMP 1
AMP 2
AMP 3
AMP 4
Disk
Disk
Disk
Disk
R1, R5, R9
R2, R6, R10
R3, R7, R11
R4, R8, R12 HD14A001
Example: SQL Statement As an example, consider the following Teradata SQL statements using a table containing checking account information. The example assumes that AcctNo column is the unique primary index for Table_01. For information about the types of indexes used by Teradata, see Chapter 8: “Data Distribution and Access Methods.” 1. SELECT * FROM Table_01 WHERE AcctNo = 129317 ; 2. SELECT * FROM Table_01 WHERE AcctBal > 1000 ;
Introduction to Teradata Warehouse
3 – 13
Parsing Engine Request Processing
In this example: • •
PEs 1 and 2 receive requests 1 and 2. The data for account 129317 is contained in table row R9 and stored on AMP1. Information about all account balances is distributed evenly among the disks of all four AMPs.
•
The following table lists the steps involved in processing the sample Teradata SQL statement: Stage
Process
1
PE 1 determines that the request is a primary index retrieval, which calls for the access and return of one specific row.
2
The Dispatcher in PE 1 issues a message to the BYNET containing an appropriate read step and R9/AMP 1 routing information. After AMP 1 returns the desired row, PE 1 transmits the data to the client.
3
The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the BYNET containing the appropriate read step to be broadcast to all four AMPs.
4
After the AMPs return the results, PE 2 transmits the data to the TDP.
The following table lists the sequence of AMP step processing: Step
1
Step Name
Lock
Function
Serializes access in situations where concurrent access would compromise data consistency. For some simple requests using Unique Primary Index (UPI), Non-unique Primary Index (NUPI), or Unique Secondary Index (USI) access, the lock step may be incorporated into step 2. For information about indexes and their uses, see Chapter 8: “Data Distribution and Access Methods.”
3 – 14
2
Operation
Performs the requested task. For complicated queries, there may be hundreds of operation steps.
3
End transaction
Causes the locks acquired in step 1 to be released. The end transaction step tells all AMPs that worked on the request that processing is complete.
Introduction to Teradata Warehouse
Parallel Database Extensions
Parallel Database Extensions Parallel Database Extensions (PDE) are a software interface layer on top of the operating system. The operating system can be either UNIX MP-RAS or Windows 2000. PDE provides the Teradata Database with the ability to: • • • •
Run the Teradata Database in a parallel environment Execute vprocs Apply a flexible priority scheduler to Teradata Database sessions Debug the operating system kernel and the Teradata Database using resident debugging facilities
Trusted Parallel Applications The PDE provide a series of parallel operating system services to a special class of tasks called a trusted parallel application (TPA). On an SMP or MPP system, the TPA is the Teradata Database. TPA services include: • • • • •
Facilities to manage parallel execution of the TPA on multiple nodes Dynamic distribution of execution processes Coordination of all execution threads, whether on the same or on different nodes Balancing of the TPA workload within a clique Resident debugging facilities in addition to kernel and application debuggers
PDE and MPP Systems The PDE also enables an MPP system to: • •
Take advantage of hardware features such as the BYNET and shared disk arrays Process user applications that were written on non-Trusted Parallel Application (non-TPA) nodes and disks
Start and Stop PDE You can start, reset, and stop the PDE on Windows systems using the Teradata MultiTool utility and on UNIX MP-RAS systems using the xctl utility. For information about the ctl and xctl utilities, see “Maintenance Utilities” on page 18-10.
Introduction to Teradata Warehouse
3 – 15
The Teradata File System
The Teradata File System The special-purpose Teradata file system is a layer of software between the Teradata Database layer and the PDE layer. Teradata file system service calls allow the Teradata Database to store and retrieve data efficiently without being concerned about the specific low-level operating system interfaces. The data block is a disk-resident structure that contains one or more rows from the same table and is the physical I/O unit for the Teradata file system. Data blocks are stored in physical disk space units called sectors which are logically grouped together in cylinders.
Cylinder Read Cylinder Read, a capability of the Teradata file system, allows full-file scan operations to run efficiently by reading the cylinder-resident data blocks with a single I/O operation. This means the system incurs I/O overhead once per cylinder, as opposed to being incurred once per data block when blocks are read individually. The system benefits from the reduction in I/O time for operations such as table-scans and joins that process most or all of the data blocks of a table. Block sizes range between 6144 bytes and nearly 128 KB, or from 12 to 255 sectors. You can set the default maximum data block size as follows: Set this value either…
Using…
as a system default
DBS Control utility.
for a table
the DATABLOCKSIZE specifier on the CREATE TABLE statement.
Disk I/O Integrity Checking To detect data corruption in the file system metadata, the Teradata Database verifies the following: • • • •
Version numbers Segment lengths Block types Block hole addresses in the data block, cylinder index (CI), master index (MI) internal file system structures
To help detect corrupt data in these structures, disk I/O integrity checking calculates an end-to-end checksum at various user-selectable data sampling rates.
3 – 16
Introduction to Teradata Warehouse
The Teradata File System
You can specify the CHECKSUM option as follows: Set …
Using…
CHECKSUM option to one of the following levels of checking:
a
one of the following statements:
•
CREATE TABLE
• NONE
•
CREATE JOIN INDEX
• LOW
•
CREATE HASH INDEX
• MEDIUM
•
ALTER TABLE
• HIGH
b the DBS Control utility based on the type of table you want to check.
• ALL
For example, you may want to assign a higher level of checking to a user table than you assign to a temporary table.
Introduction to Teradata Warehouse
3 – 17
Workstation Types and Available Platforms
Workstation Types and Available Platforms Workstations provide a window into the interworkings of the Teradata Database. The following types of workstations are available: • •
System console Administration Workstation
Some of the workstation types are only available on specific platforms. The following table shows which workstations are appropriate for the different platforms and how workstations are connected to the node. Type of Workstation
Platform
Description
System console
SMP
Connected directly to the SMP node
Administration Workstation
MPP
network-connected through an Ethernet card on the node
• UNIX workstation
SMP and
• PC with X Windows server
MPP
Connected remotely through network using an Ethernet card on the node
System Console The role of the system console is to: • • • • •
Provide an input mechanism for the system and database administrators Display system status Display current system configuration Display performance statistics Allow you to control of various utilities
Administration Workstation The Administration Workstation (AWS) performs many of the functions of a system console for MPP systems. The AWS is an intelligent workstation whose primary roles are to: • • •
3 – 18
Provide an input mechanism for the system and database administrator Provide a single-system view in the multinode environment Monitor system performance
Introduction to Teradata Warehouse
Teradata Database Window
Teradata Database Window The Teradata Database Window (DBW) allows database or system administrators to control the operation of the Teradata Database. Running in a graphical X Windows or Microsoft Windows 2000 environment, the DBW is also the primary vehicle for starting and controlling the operation of the Teradata Database utilities.
How the Database Window Communicates with Teradata Database The DBW communicates with the Teradata Database through the console subsystem (CNS), which is part of the PDE software. Because the CNS software manages this communication, you might see CNS messages from the system. From the DBW main window, you can access to the following subwindows: From this subwindow…
You can…
Applications 1 through 4
run one Teradata Database utility or program at a time in each of the four subwindows.
DBS I/O
view messages from Teradata Database programs that are not running in DBW application subwindows, for example, some SQL diagnostics appear here.
Supervisor
issue commands and invoke utilities.
Running DBW You can run DBW from the following locations: •System Console • Administration Workstation (AWS) • Remote workstation or PC
Introduction to Teradata Warehouse
3 – 19
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books.
3 – 20
IF you want to learn more about…
THEN see…
Database Window
Database Window
General Teradata Database software architecture
Database Design
System process flows
Database Design
Introduction to Teradata Warehouse
Chapter 4:
International Language Support This chapter describes the capabilities of Teradata international language support. Topics include: • • • • • • • •
Character set overview External and internal character sets Teradata Database character data storage Language support modes Standard language support Japanese language support Extended support Enabling international character support
Introduction to Teradata Warehouse
4–1
Character Set Overview
Character Set Overview To manipulate data successfully, the Teradata Database must be able to store and retrieve the characters that constitute a given written language. To manage storage and retrieval, the database determines the repertoire of characters required and provides a scheme for representing strings of these characters on a computer.
What Is a Repertoire Consider English for example. To write English, you need the alphabetic characters, A–Z, the digits, 0–9, and various punctuation characters. Many applications also commonly require the characters a–z, the lower case counterparts of A-Z. If an application is written in French, you need the alphabetic characters that are required for English, plus accented characters, for example, the é. However, some applications may need accented characters for English as well. The word résumé, borrowed from French, is often displayed in its accented form in English text. Similarly, ö may be used in English text to spell coördinate. You can see that a repertoire comprises the characters that we need to write a language, and clearly, what we include in our repertoire determines what we can write, and how we must write it.
Character Representation Representing strings of characters is essentially a two-step process: • •
Creating a mapping between each character required and an integer. Devising an encoding scheme for placing a sequence of numbers into memory.
The simplest systems map the required characters to small integers between 0 and 255, and encode sequences of characters as sequences of bytes with the appropriate numeric values. Representing characters for repertoires that require more than 256 characters, such as Japanese, Chinese, and Korean, requires more complex schemes.
4–2
Introduction to Teradata Warehouse
External and Internal Character Sets
External and Internal Character Sets Client systems communicate with the Teradata Database using their own external format for numbers and character strings. The Teradata Database converts numbers and strings to its own internal format when importing the data, and converts numbers and strings back to the appropriate form for the client when exporting the data. This approach allows data to be exchanged between mutually incompatible client data formats. Take for example, channel-attached clients using EBCDIC-based character sets and network-attached clients using ASCII-based character sets. Both clients can access and modify the same data in the database.
Character Data Translation Teradata Database translates the characters: • •
Received from a client system into a form suitable for storage and processing on the server. Returned to a client into a form suitable for storage, display, printing, and processing on that client.
Thus, the server translates data from client form to server form and from server form to client form.
What Teradata Database Supports The Teradata Database supports many external client character sets, and allows each application to choose the internal server character set best suited to each column of character data in the Teradata Database. No matter which server character set you chose, communication with the client is always in the client character set (also known as the session charset).
Introduction to Teradata Warehouse
4–3
Teradata Database Character Data Storage
Teradata Database Character Data Storage The Teradata Database uses internal server character sets to represent user data and data in the Data Dictionary within the system.
Internal Server Character Sets Server character sets include: • • • • •
LATIN UNICODE KANJI1 KANJISJIS GRAPHIC
User Data User data refers to character data that you store in a character data type column on the Teradata Database.
System Dictionary Data The term system dictionary data refers to the names of the following objects as they are stored in the Data Dictionary on the Database: • • • • • • • • • • •
4–4
Tables Databases Users Columns Views Macros Triggers Join indexes Hash indexes Stored procedures User-defined functions
Introduction to Teradata Warehouse
Language Support Modes
Language Support Modes During system initialization (sysinit) the database administrator can optimize the database for one of two language support modes: • •
Standard Japanese
The language support mode determines the: • •
Character set that Teradata Database uses to store system dictionary data. Default character set for user data.
IF you enable this language support mode …
THEN Teradata Database stores system dictionary data using this character set …
AND sets the user data default character set to …
Standard
LATIN
LATIN
Japanese
KANJI1
UNICODE
Default Character Set for User Data The language support mode sets the default server character set for a user if the DEFAULT CHARACTER SET clause does not appear in the CREATE USER statement. To override the default character set for a user, you can use the DEFAULT CHARACTER SET clause in a CREATE USER statement.
Introduction to Teradata Warehouse
4–5
Language Support Modes
Character Set for System Dictionary Data The character set that Teradata Database uses to store system dictionary data cannot be changed after you enable the language support mode during the sysinit process. IF you optimize the database for this language support mode…
THEN the names of objects stored in the Data Dictionary can contain …
Standard
only western European characters. Characters outside the ASCII range (all accented characters, for example), cannot appear in a regular identifier. Rather, they can only occur in a delimited identifier (one that is enclosed in double quotes).
Japanese
Japanese characters, but only if you use the Teradatasupplied Japanese client character sets. Japanese characters are stored using the KANJI1 server character set. KANJI1 data cannot necessarily be shared between clients with differing client character sets. If you use other multibyte client character sets, such as UTF8, Korean, or Chinese, only characters in the ASCII range can appear in an object name. Accented characters cannot be used.
Character Set for Dictionary Data Other Than Object Names Object names are only a small part of the character data that Teradata Database stores in the Data Dictionary. Teradata Database always uses the UNICODE server character set to store character data other than object names in the Data Dictionary, no matter which language support mode you enable.
4–6
Introduction to Teradata Warehouse
Standard Language Support Mode
Standard Language Support Mode If you choose the standard language support mode, then Teradata Database stores system dictionary data and user data using the LATIN character set.
LATIN Character Set Standard language support provides Teradata Database internal coding for the entire set of printable characters from the ISO 8859-1 (Latin1) and ISO 8859-15 (Latin9) standard, including diacritical marks such as ä, ñ, Ÿ, Œ, and œ, though the Z with caron in Latin9 is not supported. ASCII control characters are also supported for the standard language set. Note: The ASCII referred to in this chapter is based on Standard ASCII (X’00’ to X’7F’) with Teradata extensions to cover ISO 8859-1 (Latin1) and ISO 8859-15 (Latin9). ASCII, as used here, represents the characters that can be stored as the LATIN server character set, referred to as Teradata LATIN. The EBCDIC referred to in this chapter is the Teradata extended ASCII mapped to the corresponding EBCDIC code points.
Compatible Languages The LATIN server character set that Teradata Database uses in standard language support mode is sufficient for you to use client character sets that support the international languages listed in the following table: International Languages That are Compatible with Standard Language Support
Albanian
English
Germanic
Portuguese
Basque
Estonian
Greenlandic
Rhaeto-Romantic
Breton
Faroese
Icelandic
Romance
Catalonian
Finnish
Irish Gaelic (new orthography)
Samoan
Celtic
French
Italian
Scottish Gaelic
Cornish
Frisian
Latin
Spanish
Danish
Galician
Luxemburgish
Swahili
Dutch
German
Norwegian
Swedish
Introduction to Teradata Warehouse
4–7
Japanese Language Support Mode
Japanese Language Support Mode If you enable the Japanese language support mode during the sysinit process, Teradata Database, by default, stores user data using the UNICODE server character set and stores system dictionary data using the KANJI1 server character set.
Advantages of Storing System Dictionary Data Using KANJI1 The KANJI1 server character set is compatible with the Teradata-supplied Japanese client character sets, allowing you to use object names containing Kanji characters, Hiragana, Zenkaku (fullwidth) and Hankaku (halfwidth) Katakana, Zenkaku Romaji (Latin), and various other characters. You can also use the ASCII characters from other client character sets to name objects that are stored in the Data Dictionary.
Advantages of Storing User Data Using UNICODE Unicode is a 16-bit encoding of virtually all characters in all current languages in the world. The Teradata UNICODE server character set supports Unicode 2.1, and is designed eventually to store all character data on the server. UNICODE may be used to store all characters from all single- and multibyte client character sets. User data stored as UNICODE can be shared among heterogeneous clients.
4–8
Introduction to Teradata Warehouse
Extended Support
Extended Support Extended support allows you to customize the Teradata Database to provide additional support for local character set usage. A sufficiently privileged user can create single-byte and multibyte client character sets that support, with certain constraints, any subset of the Unicode repertoire. Moreover, such a user can customize a collation for the entire Unicode repertoire. Extended support is available on systems that have been enabled with standard language support or Japanese language support.
Introduction to Teradata Warehouse
4–9
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books:
4 – 10
IF you want to learn more about…
THEN see…
Data formatting
International Character Set Support
Introduction to Teradata Warehouse
Section Contents
Section 2:
The Teradata Database Structure
Introduction to Teradata Warehouse
Section Contents
Introduction to Teradata Warehouse
Chapter 5:
Structured Query Language (SQL) This chapter describes SQL, which is the ANSI standard language for relational database management. All application programming facilities ultimately make queries against the Teradata Database using SQL because it is the only language the Teradata Database understands. To enhance the capabilities of SQL, Teradata has added extensions that are unique to Teradata. This comprehensive language is referred to as Teradata SQL. The first part of this chapter describes the data definition and manipulation capabilities of SQL. This includes basic statements used for describing and defining entities and for manipulating and retrieving data. Topics include: • • • •
SQL statements and related topics SQL functions User-defined functions Cursors
Introduction to Teradata Warehouse
5–1
Why SQL
Why SQL SQL has the advantage of being the most commonly used language for relational database management systems. Because of this, both the data structures in the Teradata Database and the commands for manipulating those structures are controlled using SQL. Additionally, all applications, including those written in a client language with embedded SQL, macros, and ad-hoc SQL queries, are written and executed using the same set of instructions and syntax. Other database management systems use different languages for data definition and data manipulation and may not permit ad-hoc queries of the database. Teradata Database lets you use one language to define, query, and update your data.
5–2
Introduction to Teradata Warehouse
What is SQL
What is SQL In principle, the SQL language is a combination of at least three subordinate languages and the SELECT statement. The languages allow you to define database objects, to define user access to those objects, and to manipulate the data stored within them. These languages form the principal functional families of SQL: • • • •
Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML) SELECT
The following sections contain information about the functional families of Teradata SQL.
Data Definition Language You use DDL to define the structure and instances of a database. This section describes the data definition capabilities of Teradata SQL, emphasizing the basic definition statements and data types. DDL provides statements for the definition and description of database objects. The following table summarizes the basic DDL statements: Statement
Action performed
CREATE
Defines a new database object, such as a database, user, table, trigger, index, macro, stored procedure or view, depending on the object of the CREATE statement
DROP
Removes a table, database, user, trigger, index, macro, stored procedure or view definition, depending on the object of the DROP statement
ALTER
Changes a table, column, referential constraint, or trigger
ALTER PROCEDURE
Recompiles a stored procedure
MODIFY
Changes a database or user definition
RENAME
Changes the names of tables, triggers, views, stored procedures, and macros
REPLACE
Replaces macros, triggers, stored procedures, or views
SET
Specifies time zones and the collation or character set for a session
COLLECT
Collects statistics on a column, group of columns, or index
Introduction to Teradata Warehouse
5–3
What is SQL
Successful execution of a DDL statement automatically creates, updates, or removes entries in the Data Dictionary. For information about the contents of the Data Dictionary, see Chapter 9: “Data Dictionary.”
Data Control Language You use DCL statements to grant and revoke access to database objects and change ownership of those objects from one user or database to another. The results of DCL statement processing also are recorded in the Data Dictionary. The following table summarizes the basic DCL statements: Statement
Action
GRANT/REVOKE
Controls access rights of the users on an object
GRANT LOGON/REVOKE LOGON
Controls logon rights to a host (client) or host group (if the special security user is enabled)
GIVE
Gives a database object to another database object
HELP and SHOW
Provides help about object definitions such as: •
HELP DATABASE
•
HELP TABLE
•
HELP CONSTRAINT
•
HELP PROCEDURE
•
HELP TRIGGER, and so forth
Provides help about: •
Sessions and statistics
•
SQL statement syntax
•
Displays the SQL used to create the table, with all defaults explicitly shown
Data Manipulation You use DML statements to manipulate and process database values. You can insert new rows into a table, update one or more values in stored rows, or delete a row.
5–4
Introduction to Teradata Warehouse
What is SQL
The following table summarizes the basic DML statements: Statement
Description
INSERT
Inserts new rows into a table. For more information about a special case of INSERT, see Atomic Upsert later in this table.
UPDATE
Modifies data in one or more rows of a table. For more information about a special case of UPDATE, see Atomic Upsert later in this table. Atomic Upsert The upsert form of the UPDATE DML statement is a Teradata extension to the ANSI SQL-99 standard designed to enhance the performance of the Teradata TPump utility by allowing the statement to support atomic upsert. For more information about how TPump operates, see “What Are Teradata Tools and Utilities” on page 2-7. This feature allows Teradata TPump and all other CLIv2-, ODBC-, and JDBC-based applications to perform single-row upsert operations using an optimally efficient single-pass strategy. This single-pass upsert is called atomic to emphasize that its component UPDATE and INSERT SQL statements are grouped together and performed as a single, or atomic, SQL statement.
DELETE
Removes a row (or rows) from a table.
COMMENT
Inserts a text comment for a database object.
MERGE
Combines both UPDATE and INSERT in a single SQL statement. Supports primary index operations only, similar to Atomic Upsert but with fewer constraints.
These statements:
Allow you to better manage transactions.
• ABORT • ROLLBACK • COMMIT • BEGIN TRANSACTION • END TRANSACTION CHECKPOINT
Check points a journal. CHECKPOINT is a function that writes records to a restart log table that the you can use to restart in case of a hardware or software system failure.
DATABASE
Specifies a default database.
ECHO
Echoes a string or command to a client.
Introduction to Teradata Warehouse
5–5
SQL Data Types
SQL Data Types A data type phrase does the following: • •
Determines how data is stored on the Teradata Database Specifies how data is presented to the user
You must specify a data type for each column when you use SQL to create a table because Teradata Database does not provide a default data type. You can include a data type to specify data conversions in expressions.
Teradata and ANSI-Compliant Data Types Teradata Database supports two modes of data types: ANSI and Teradata. ANSI-mode data types adhere to the ANSI SQL standard. Teradata-mode data types were written in older non-ANSI-compliant versions of Teradata SQL. Teradata Database supports the following SQL data types: Teradata supports…
Including…
Teradata SQL data types
Byte. Graphic.
ANSI-compliant SQL data types
Binary Large Objects (BLOBs). Character. Character Large Objects (CLOBs). DateTime. Interval. Numeric.
Data Type Attributes You can use Teradata SQL to define the attributes of a data value. Data type attributes control the following: • •
Import format (internal representation of stored data) Export format (how data is presented for a column or an expression result).
You must define data type attributes when you define a column. You can override the default values of data type attributes. For example, when you create a table, you can use a FORMAT phrase to override the output format of a data type.
5–6
Introduction to Teradata Warehouse
SQL Data Types
The following table summarizes data type attributes: Data Type Attribute
NOT NULL
ANSI
Teradata Extension to ANSI
X
UPPERCASE
X
[NOT] CASESPECIFIC
X
FORMAT quote_string
X
TITLE quote_string
X
NAMED name
X
DEFAULT number
X
DEFAULT USER
X
DEFAULT DATE
X
DEFAULT TIME
X
DEFAULT NULL
X
WITH DEFAULT CHARACTER SET
Introduction to Teradata Warehouse
X X
5–7
Statement Punctuation
Statement Punctuation A typical SQL statement consists of a statement keyword, one or more column names, a database name, a table name, and one or more optional clauses introduced by keywords. You can use the punctuation to separate or identify the parts of an SQL statement: This syntax element…
Named…
Performs this function in a SQL statement…
.
period
separates database names from table names and table names from a particular column name (for example, personnel.employee.deptno).
,
comma
separates and distinguishes column names in the select list, or column names or parameters in an optional clause.
‘
apostrophe
delimits the boundaries of character string constants.
(
left and right parentheses
groups expressions or defines the limits of a phrase.
;
semicolon
separates statements in multi-statement requests and terminates requests submitted via certain utilities such as BTEQ.
“
quotation marks
identifies user names which might otherwise conflict with SQL keywords, or would not be valid names in the absence of the quotation marks.
:
colon
prefixes reference parameters or client system variables.
)
To include an apostrophe or show possession in a title, double the apostrophes.
5–8
Introduction to Teradata Warehouse
SQL Statements and Requests
SQL Statements and Requests A typical SQL statement consists of the following: • • • • •
A statement keyword One or more column names A database name A table name One or more optional clauses introduced by keywords
For example, in the following single-statement request, the statement keyword is SELECT: SELECT deptno, name, salary FROM personnel.employee WHERE deptno IN(100, 500) ORDER BY deptno, name ;
The select list for this statement is made up of the names: • • •
Deptno, name, and salary (the column names) Personnel (the database name) Employee (the table name)
The search condition, or WHERE clause, is introduced by the keyword WHERE: WHERE deptno IN(100, 500)
The sort ordering, or ORDER BY clause, is introduced by the keywords ORDER BY: ORDER BY deptno, name
Teradata offers the following ways to invoke an executable statement: • • • • • • •
Interactively from a terminal Embedded within an application program Dynamically created within an embedded application Embedded within a stored procedure Dynamically created within a stored procedure Via a trigger Embedded within a macro
Introduction to Teradata Warehouse
5–9
The SELECT Statement
The SELECT Statement The SELECT statement is probably the most frequently used SQL statement. It specifies the table columns from which to obtain the data you want, the corresponding database (if different from the current default database), and the table or tables that you need to reference within that database. The SELECT statement further specifies how, in what format, and in what order the system returns the set of result data. You can use the following variations with the SELECT statement to request data from the Teradata Database: • • • • • • • •
• •
DISTINCT option FROM list WHERE clause, including subqueries SAMPLE clause GROUP BY clause HAVING clause QUALIFY clause ORDER BY clause • CASESPECIFIC option • International sort orders WITH clause Query expressions and set operators
Another variation is the SELECT INTO statement, which is used in embedded SQL and stored procedures. This statement selects at most one row from a table and assigns the values in that row to host variables in embedded SQL or to local variables or parameters in Teradata stored procedures.
SELECT Statement and Set Operators The SELECT statement is the only SQL statement that can use the set operators UNION, INTERSECT, and MINUS/EXCEPT. These set operators allow you to manipulate the answers to two or more queries by combining the results of each query into a single result set. You can use the set operators within the following operations: • • • •
5 – 10
View definitions Derived tables Subqueries INSERT SELECT clauses
Introduction to Teradata Warehouse
The SELECT Statement
SELECT Statement and Joins A SELECT statement can reference data in two or more tables, and the relational join combines the data from the referenced tables. In this way, the SELECT statement defines a join of specified tables to retrieve data more efficiently than without defining a join of tables. You can specify both inner joins and outer joins: •
•
An inner join selects data from two or more tables that meets specific join conditions. Each source must be named and the join condition, that is the common relationship among the tables to be joined, must be specified in a WHERE clause. The outer join is an extension of the inner join that includes rows that qualify for a simple inner join, as well as a specified set of rows that do not match the join conditions expressed by the query.
Introduction to Teradata Warehouse
5 – 11
SQL Functions
SQL Functions SQL is a nonprocedural language. That means you use SQL statements to tell the Teradata Database what you want. You do not include instructions about how to get it. In procedural languages, such as C++, BASIC, or COBOL, you write instructions that define how to get what you want. It is a simple, but important, distinction. Procedural languages contain functions that perform complex operations. The usual SQL statements do not support many functions. However, to reduce the reliance on ancillary application code, SQL does support the following standard functions: • • •
Scalar Aggregate Ordered analytical
Scalar Functions You can use a scalar function in place of a column name in an expression. A scalar function works on input parameters to create a result. When it is part of an expression, the function is invoked in parallel as needed whenever expressions are evaluated for an SQL statement. When a function completes, its result is used by the expression in which the function was referenced.
Aggregate Functions Sometimes the information you want can only be derived from data in a set of rows, instead of individual rows. Aggregate functions produce results from sets of relational data that you have grouped (optionally) using a GROUP BY or ORDER BY clause. Aggregate functions process each set and produce one result for each set. The following table lists a few examples of aggregate functions:
5 – 12
The function…
Returns the…
AVG
arithmetic average of the values in a specified column.
COUNT
number of qualified rows.
MAX
maximum column value for the specified column.
MIN
minimum column value for the specified column.
SUM
arithmetic sum of a specified column.
Introduction to Teradata Warehouse
SQL Functions
Ordered Analytical Functions Ordered analytical functions are primarily statistical algorithms. They work over a range of data for a particular set of rows in some specific order to produce a result for each row in the set. Like aggregate functions, ordered analytical functions are called for each item in a set. But unlike an aggregate function, an ordered analytical function produces a result for each detail item. Ordered analytical functions allow you to perform sophisticated data mining on the information in your databases to get the answers to questions that standard SQL alone cannot provide. The following table lists a few examples of ordered analytical functions: The following function…
Returns the…
MSUM
sum using the current row and a number of preceding rows that you specify. This is called a moving sum.
RANK
ordered ranking of rows based on the value of the column being ranked.
Introduction to Teradata Warehouse
5 – 13
User-Defined Functions
User-Defined Functions You can create user-defined functions (UDFs) to address your particular data needs and to fill the void where standard SQL functions are lacking. These special functions can translate into time-saving measures by preprocessing data, or by optimizing query processing. You can use UDFs to map and manipulate non-text data, such as images, in a way that is impossible with standard SQL constructs. You can write new: • •
Scalar functions similar to the standard LOG, SQRT, ABS, and TRIM functions Aggregate functions, similar to SUM, MAX, MIN, and AVG.
Creating User-Defined Functions You create the source code for UDFs using the C programming language. Then You can simply use the CREATE FUNCTION statement and provide the location of the UDF source code. The Teradata Database will do all of the work, including validating the CREATE FUNCTION statement and compiling the C source. The source may be on the client system or on the server. Any compilation errors are reported. If no errors occur, the Teradata Database links the function object into a Dynamically Linked Library (DLL) and distributes it to all nodes in the system. The UDF is usable as soon as CREATE FUNCTION completes. Teradata customers can purchase precompiled UDFs from third-party vendors. To protect their intellectual property, vendors may not wish to make their source available. In those instances, they can simply provide a package in the form of a DLL. The DLL code does not have to be written in C, but the code must use C parameter-passing conventions. Teradata customers can use an option in the CREATE FUNCTION statement to provide just the object. The Teradata Database distributes the object automatically to all nodes. Installing just the object is also useful for sites that develop UDFs on a development system and then transfer the object to the production system.
5 – 14
Introduction to Teradata Warehouse
User-Defined Functions
SQL Statements Related to Functions In addition to creating new functions, you can replace a function by specifying the REPLACE keyword. The CREATE FUNCTION statement conforms to SQL-99. The REPLACE option is a Teradata extension to the ANSI standard. The following table provides information about the privileges you need to create and replace functions:
IF you want to…
THEN you must have the following privilege…
create a function
CREATE FUNCTION on the database in which you want to create the function.
replace an existing function
DROP FUNCTION on the function or the database containing the function.
The following table contains the SQL statements associated with UDFs: Use the following function…
To…
CREATE FUNCTION
originate a new function.
DROP FUNCTION
remove a function.
REPLACE FUNCTION
change a function.
SHOW FUNCTION
display the definition of a function, including the CREATE and REPLACE text. Source code appears if the user has DROP FUNCTION privilege on the UDF.
HELP database
display the specific name and type of function. Types include: •
F for function
•
A for aggregate function
HELP FUNCTION
display the function name, list of parameters, their type, and any comment associated with the parameter.
COMMENT
add a comment about the function.
RENAME FUNCTION
change the name of a function.
Introduction to Teradata Warehouse
5 – 15
Cursors
Cursors Traditional application development languages cannot deal with results tables without some kind of intermediary mechanism because SQL is a set-oriented language. The intermediary mechanism is the cursor. A cursor is a pointer that the application program uses to move through a results table. You declare a cursor for a SELECT statement, and then open the named cursor. The act of opening the cursor executes the SQL statement. You use the FETCH... INTO... statement to individually fetch and write the rows into host variables. The application can then use the host variables to do computations. Teradata Preprocessor2 uses cursors to mark or tag the first row accessed by an SQL query. Preprocessor2 then increments the cursor as needed. Stored procedures use cursors to fetch one result row at a time and then execute SQL and SQL control statements as required for each row. Local variables or parameters from the stored procedure can be used for computations.
5 – 16
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…
THEN see…
Large Objects
SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements SQL Reference: Data Types and Literals SQL Reference: Functions and Operators SQL Reference: Fundamentals SQL Reference: UDF Programming
Teradata SQL
Database Design SQL Reference: Fundamentals Teradata SQL Assistant for Microsoft Windows User Guide
User-Defined Functions
Introduction to Teradata Warehouse
SQL Reference: UDF Programming
5 – 17
For More Information
5 – 18
Introduction to Teradata Warehouse
Chapter 6:
Application Development This chapter describes the tools used to develop applications for the Teradata Database and the interfaces used to establish communications between the applications and the Teradata Database. Topics include: • • •
Types of SQL applications The importance of the EXPLAIN statement Third-party development
Introduction to Teradata Warehouse
6–1
Types of SQL Development
Types of SQL Development Application development for the Teradata Database falls into one of two categories: • •
Explicit SQL Implicit SQL
Explicit SQL Development Under explicit SQL application development you have the following tools: • • • •
Embedded SQL Macros Stored Procedures EXPLAIN statement
More information about each tool is provided in following sections of this chapter.
Implicit SQL Development Under implicit SQL application development, you have tools, such as Teradata and third-party products that generate SQL as their output. More information about third-party products is provided in following sections of this chapter.
6–2
Introduction to Teradata Warehouse
Embedded SQL Applications
Embedded SQL Applications This section describes using embedded SQL in applications.
What Is Embedded SQL When you write applications using embedded SQL, you insert SQL statements into your native language application program. Because third-generation application development languages do not have facilities for dealing with results sets, embedded SQL contains extensions to executable SQL that permit declarations. Embedded SQL declarations include: • •
Code to encapsulate the SQL from the native application language Cursor definition and manipulation
A cursor is a pointer device that you use to read through a results table one record/row at a time. For more information about cursors, see “Cursors” on page 5-16.
How Does an Application Program Use Embedded SQL The client application languages that support embedded SQL are all compiled languages. SQL is not defined for any of them. For this reason, you must precompile your embedded SQL code to translate the SQL into native code before you can compile the source using a native compiler. The precompiler tool is called Preprocessor2, and you use it to: • • •
Read your application source code to look for the defined SQL code fragments Interpret the intent of the code after it isolates all the SQL code in the application and translates it into Call Level Interface (CLI) calls Comment out all the SQL source
The output of the precompiler is native language source code with CLI calls substituted for the SQL source. After the precompiler generates the output, you can process the converted source code with the native language compiler. For information about Call Level Interface communications interface, see Chapter 13: “Data Communication Between Client and Teradata Database.”
Introduction to Teradata Warehouse
6–3
Embedded SQL Applications
Supported Languages and Platforms Preprocessor2 supports the following application development languages on the specified platforms: Application Development Language
C COBOL
PL/I
6–4
Platform
•
IBM mainframe clients
•
UNIX clients
•
IBM mainframe clients
•
Some workstation clients
•
IBM mainframes
Introduction to Teradata Warehouse
Macros as SQL Applications
Macros as SQL Applications Teradata macros are SQL statements that the server stores and executes. Macros provide an easy way to execute frequently used SQL operations. Macros are particularly useful for enforcing data integrity rules, providing data security, and improving performance.
SQL Used to Create a Macro You use the CREATE MACRO statement to create Teradata macros. The format of CREATE MACRO is similar to CREATE VIEW. For example, suppose you want to define a macro for adding new employees to the Employee table and incrementing the EmpCount field in the Department table. The CREATE MACRO statement looks like this: CREATE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 100) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;
This macro defines parameters that users must fill in each time they execute the macro. A leading colon (:) indicates a reference to a parameter within the macro.
Introduction to Teradata Warehouse
6–5
Macros as SQL Applications
Macro Usage The following example shows how to use the NewEmp macro to insert data into the Employee and Department tables. The information to be inserted is the name, employee number, and department number for employee H. Goldsmith. The EXECUTE macro statement looks like this: EXECUTE NewEmp (‘Goldsmith H’, 10015, 600);
SQL Used to Modify a Macro The following example shows how to modify a macro. Suppose you want to change the NewEmp macro so that the default department number is 300 instead of 100. The REPLACE MACRO statement looks like this: REPLACE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 300) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;
SQL Used to Delete a Macro The example which follows shows how to delete a macro. Suppose you want to drop the NewEmp macro from the database. The DROP MACRO statement looks like this: DROP MACRO NewEmp;
6–6
Introduction to Teradata Warehouse
Teradata Stored Procedures as SQL Applications
Teradata Stored Procedures as SQL Applications Teradata stored procedures are database applications created by combining SQL control statements with other SQL elements and condition handlers. They provide a procedural interface to the Teradata Database and many of the same benefits as embedded SQL. Teradata stored procedures conform to the ANSI SQL-99 (SQL3) standard with some exceptions.
SQL Used to Create Stored Procedures Teradata SQL supports creating, modifying, dropping, renaming, and controlling access rights of stored procedures through DDL and DCL statements. You can create or replace a stored procedure through the COMPILE command in Basic Teradata Query Facility (BTEQ) and BTEQ for Microsoft Windows systems (BTEQWIN). You must specify a source file as input for the COMPILE command. You can also create or modify a stored procedure using the CREATE PROCEDURE or REPLACE PROCEDURE statement from CLIv2, ODBC, and JDBC applications, and the Teradata SQL
Stored Procedure Example Assume you want to create a stored procedure named NewProc that you can use to add new employees to the Employee table and retrieve the department name of the department to which the employee belongs. You can also report an error, in case the row that you are trying to insert already exists, and handle that error condition. The following stored procedure definition includes nested, labeled compound statements. The compound statement labeled L3 is nested within the outer compound statement L1. Note that the compound statement labeled L2 is the handler action clause of the condition handler. This stored procedure defines parameters that must be filled in each time it is called (executed). The parameters are indicated with a leading colon (:) character when used in an SQL statement other than a control statement inside the procedure.
Introduction to Teradata Warehouse
6–7
Teradata Stored Procedures as SQL Applications CREATE PROCEDURE NewProc (IN name CHAR(12), IN num INTEGER, IN dept INTEGER, OUT dname CHAR(10) INOUT p1 VARCHAR(30)) L1: BEGIN DECLARE CONTINUE HANDLER FOR SQLSTATE value '23505' L2: BEGIN SET p1='Duplicate Row'; END L2; L3: BEGIN INSERT INTO Employee (Name, EmpNo, DeptNo) VALUES (:name, :num, :dept); SELECT DeptName INTO :dname FROM Department WHERE DeptNo = :dept; IF SQLCODE 0 THEN LEAVE L3; ... END L3; END L1;
SQL Used to Execute a Stored Procedures After compiling a stored procedure, procedures are stored as objects in the Teradata Database. You can execute stored procedures from Teradata client utilities using the SQL CALL statement. Arguments for all input (IN or INOUT) parameters of the stored procedure must be submitted with the CALL statement. BTEQ and other Teradata client utilities support stored procedure execution and DDL operations. These include: • • • •
• •
6–8
CLIv2 JDBC ODBC PP2 DDL statements are not supported from PP2; that is, you cannot create or modify stored procedures from PP2. Teradata SQL Assistant BTEQWIN (BTEQ for Windows)
Introduction to Teradata Warehouse
Teradata Stored Procedures as SQL Applications
You can use the following DDL statements with stored procedures: Use This Statement…
To…
CREATE PROCEDURE
direct the stored procedure compiler to create a procedure from the SQL statements in the remainder of the statement text.
ALTER PROCEDURE
direct the stored procedure compiler to recompile a stored procedure created in an earlier version of Teradata Database without executing SHOW PROCEDURE and REPLACE PROCEDURE statements.
DROP PROCEDURE
drop a stored procedure.
RENAME PROCEDURE
rename a procedure.
REPLACE PROCEDURE
direct the stored procedure compiler to replace the definition of an existing stored procedure. If the specified stored procedure does not exist, create a new procedure by that name from the SQL statements in the remainder of the source text.
HELP PROCEDURE … ATTRIBUTES
view all the parameters and parameter attributes of a procedure, or the creation time attributes of a procedure.
HELP ‘SPL’
display a list of all DDL and control statements associated with stored procedures.
HELP ’SPL command_name’
display help about the command you have named
SHOW PROCEDURE
view the current definition (source text) of a procedure. The text is returned in the same format as defined by the creator.
Introduction to Teradata Warehouse
6–9
The EXPLAIN Statement
The EXPLAIN Statement Teradata SQL supplies a very powerful EXPLAIN statement that allows you to see the execution plan of a query. The EXPLAIN modifier in front of any SQL statement displays the execution plan for that statement, which is parsed and optimized in the usual fashion, but is not submitted for execution.
How Is EXPLAIN Useful The EXPLAIN statement not only explains how a statement will be processed, but provides an estimate of the number of rows involved and the performance impact of the request. When you perform an EXPLAIN against any SQL statement, that statement is parsed and optimized. The access and join plans generated by the optimizer are returned in the form of a text file that explains the (possibly parallel) steps used in the execution of the statement. Also included is the relative time required to complete the statement given the statistics with which the optimizer had to work. If the statistics are not reasonably accurate, the time estimate may not be accurate. EXPLAIN helps you to evaluate complex queries and to develop alternative, more efficient, processing strategies. You may be able to get a better plan by collecting more statistics on more columns, or by defining additional secondary indexes. Your knowledge of the actual demographics information may allow you to identify row count estimates that seem badly wrong, and help to pinpoint areas where additional statistics would be helpful.
EXPLAIN With Simple Join Index Example The EXPLAIN example results from joining tables with the following table definitions. CREATE TABLE customer (c_custkey INTEGER, c_name CHAR(26), c_address VARCHAR(41), c_nationkey INTEGER, c_phone CHAR(16), c_acctbal DECIMAL(13,2), c_mktsegment CHAR(21), c_comment VARCHAR(127)) UNIQUE PRIMARY INDEX( c_custkey ); CREATE TABLE orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice DECIMAL(13,2) NOT NULL,
6 – 10
Introduction to Teradata Warehouse
The EXPLAIN Statement o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_orderpriority CHAR(21), o_clerk CHAR(16), o_shippriority INTEGER, o_commment VARCHAR(79)) UNIQUE PRIMARY INDEX(o_orderkey); CREATE TABLE lineitem (l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER, l_linenumber INTEGER, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(13,2) NOT NULL, l_discount DECIMAL(13,2), l_tax DECIMAL(13,2), l_returnflag CHAR(1), l_linestatus CHAR(1), l_shipdate DATE FORMAT 'yyyy-mm-dd', l_commitdate DATE FORMAT 'yyyy-mm-dd', l_receiptdate DATE FORMAT 'yyyy-mm-dd', l_shipinstruct VARCHAR(25), l_shipmode VARCHAR(10), l_comment VARCHAR(44)) PRIMARY INDEX( l_orderkey );
The following statement defines a join index on these tables. CREATE JOIN INDEX order_join_line AS SELECT ( l_orderkey, o_orderdate, o_custkey, o_totalprice ), ( l_partkey, l_quantity, l_extendedprice, l_shipdate ) FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkey ORDER BY o_orderdate PRIMARY INDEX (l_orderkey);
The following EXPLAIN shows that the optimizer used the newly created join index, order_join_line, even though there is no reference to the index in the SQL text. EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem , orders WHERE l_orderkey = o_orderkey; Explanation -------------------------------------------------------------1) First, we lock a distinct LOUISB."pseudo table" for read on a Row Hash to prevent global deadlock for LOUISB.order_join_line. 2) Next, we lock LOUISB.order_join_line for read. 3) We do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line by way of an all-rows scan with a condition of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into
Introduction to Teradata Warehouse
6 – 11
The EXPLAIN Statement Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 1,000,000 rows. The estimated time for this step is 4 minutes and 27 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
For information about the types of indexes that Teradata supports, see Chapter 8: “Data Distribution and Access Methods.”
6 – 12
Introduction to Teradata Warehouse
Third-Party Development
Third-Party Development The Teradata Database supports many third-party software products. The two general components of supported products include those of the transparency series and the native interface products.
TS/API Products The Transparency Series/Application Program Interface (TS/API) product provides a gateway between the IBM mainframe relational database products DB2 (MVS/TSO) and SQL/DS (VM/CMS) and the Teradata Database. TS/API permits an SQL statement formulated for either DB2 or SQL/DS to be translated into Teradata SQL to allow DB2 or SQL/DS applications to access data stored in a Teradata Database.
Compatible Third-Party Software Products Many third-party, interactive query products operate in conjunction with the Teradata Database, permitting queries formulated in a native query language to access a Teradata Database. The list of supported third-party products changes frequently. For a current list, contact your NCR sales office.
Performance Monitor/Application Programming Interface The Performance Monitor/Application Programming Interface (PM/API) provides a way for third-party performance monitoring programs to access Performance Monitor and Production Control (PM and PC) functions resident within Teradata Database. PM and PC data is available using a specialized PM/API subset of the Call-Level Interface Version 2 (CLIv2).
Introduction to Teradata Warehouse
6 – 13
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
BTEQ
Basic Teradata Query Reference
Call-Level interface programming
Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems
Embedded SQL
SQL Reference: Stored Procedures and Embedded SQL Teradata Preprocessor2 for Embedded SQL Programmer Guide
JDBC
Teradata Driver for the JDBC Interface User Guide
ODBC
Teradat ODBC Driver User Guide
Performance
PM/API Reference Teradata Manager User Guide Resource Usage Macros and Tables
6 – 14
Teradata Director Program
Teradata Director Program Reference
Teradata SQL data manipulation statements
SQL Reference: Data Manipulation Statements
Teradata SQL preprocessor
Teradata Preprocessor2 for Embedded SQL Programmer Guide
Teradata stored procedures
SQL Reference: Stored Procedures and Embedded SQL
TS/API products
Teradata Transparency Series/Application Programming Interface User Guide
Introduction to Teradata Warehouse
Chapter 7:
The Teradata Database Model This chapter describes the mathematical concepts on which relational databases are modeled and discusses some of the objects that are part of a relational database. Topics include: • • •
The relational model The relational database Tables, rows, and columns
Introduction to Teradata Warehouse
7–1
What is a Relational Model
What is a Relational Model The relational model for database management is based on concepts derived from the mathematical theory of sets. Roughly speaking, set theory defines a table as a relation. The number of rows is the cardinality of the relation, and the number of columns is the degree. Any manipulation of a table in a relational database has a consistent, predictable outcome, because the mathematical operations on relations are well defined. By way of comparison, database-management products based on hierarchical, network, or object-oriented architectures are not built on rigorous theoretical foundations. Therefore, the behavior of such products is not as predictable as that of relational products. The SQL optimizer in the database uses relational algebra to build the most efficient access path to requested data. The optimizer can readily adapt to changes in system variables by rebuilding access paths without programmer intervention. This adaptability is necessary because database definitions can change from time to time.
7–2
Introduction to Teradata Warehouse
What is a Relational Database
What is a Relational Database Users perceive a relational database as a collection of objects, for example, tables, views, macros, stored procedures, and triggers, that are easily manipulated using SQL directly or specifically developed applications.
Set Theory and Relational Database Terminology Relational databases are a generalization of the mathematics of set theory relations. Thus, the correspondences between set theory and relational databases are not always direct. The information in the following table (relation) notes the corresponds between set theory and relational database terms: Set Theory Term
Relational Database Term
Relation
Table
Tuple
Row (or record)
Attribute
Column
Introduction to Teradata Warehouse
7–3
Tables, Rows, and Columns
Tables, Rows, and Columns Tables are two-dimensional objects consisting of rows and columns. Data is organized in table format and presented to the users of a relational database. References between tables define the relationships and constraints of data inside the tables themselves.
Table Constraints You can define conditions that must be met before the Teradata Database writes a given value to a column in a table. These conditions are called constraints. Constraints can include value ranges, equality or inequality conditions, and intercolumn dependencies. The Teradata Database supports constraints at both the column and table levels. During table creation and modification, you can specify constraints on single column values as part of a column definition or on multiple columns using the CREATE and ALTER statements.
Permanent and Temporary Tables To manipulate tabular data, you must submit a query in a language that the database understands. In the case of the Teradata Database, the language is SQL. You can store the results of multiple SQL queries in tables. Permanent storage of tables is necessary when different sessions and users must share table contents. When tables are required for only a single session, the system creates temporary tables. Using this type of table, you can save query results for use in subsequent queries within the same session. Also, you can break down complex queries into smaller queries by storing results in a temporary table for use during the same session. When the session ends, the system automatically drops the temporary table.
Global Temporary Tables Global temporary tables are tables that exist only for the duration of the SQL session in which they are used. The contents of these tables are private to the session, and the system automatically drops the table at the end of that session. However, the system saves the global temporary table definition permanently in the Data Dictionary. The saved definition may be shared by multiple users and sessions with each session getting its own instance of the table.
7–4
Introduction to Teradata Warehouse
Tables, Rows, and Columns
Volatile Temporary Tables If you need a temporary table for a single use only, you can define a volatile temporary table. The definition of a volatile temporary table resides in memory but does not survive across a system restart. Using volatile temporary tables improves performance even more than using global temporary tables because the system does not store the definitions of volatile temporary tables in the Data Dictionary. Access-rights checking is not necessary because only the creator can access the volatile temporary table.
Derived Tables A special type of temporary table is the derived table. You can specify a derived table in an SQL SELECT statement.
Rows and Columns A column always contains the same kind of information. For example, a table that has information about employees would have columns for the first name and last name, and nothing other than the employee names should be placed in those columns. A row is one instance of all the columns in a table. For example, each row in the employee table would contain the first name and the last name for that employee, among other things. The rows and columns in a table represent entities or relationships. An entity is a person, place, or thing about which the table contains information. The table mentioned in the previous paragraphs contains information about the employee entity. Each table holds only one kind of row. The relational model requires that each row in a table be uniquely identified. To accomplish this, you define a primary key to identify each row in the table. For more information about primary keys, see “How Are Primary Keys and Primary Indexes Related” on page 8-3.
Introduction to Teradata Warehouse
7–5
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…
THEN see…
Relational model
Database Design
Tables, rows, and columns
Database Design Database Administration SQL Reference: Fundamentals
7–6
Introduction to Teradata Warehouse
Chapter 8:
Data Distribution and Access Methods This chapter describes how the Teradata Database handles data distribution and access. Topics include: • • •
Indexes Hashing Identity Column
Introduction to Teradata Warehouse
8–1
Teradata Database Indexes
Teradata Database Indexes An index is a physical mechanism used to store and access the rows of a table. Indexes on tables in a relational database function much like indexes in books— they speed up information retrieval. In general, the Teradata Database uses indexes to: • • • •
Distribute data rows. Locate data rows. Improve performance. (Indexed access is usually more efficient than searching all rows of a table.) Ensure uniqueness of the index values. Only one row of a table can have a particular value in the column or columns defined as a unique index.
The Teradata Database supports the following types of indexes: • • • •
Primary index may be unique or non-unique and optionally partitioned Secondary index may be unique or non-unique Join index Hash index
These indexes are discussed in the following sections.
8–2
Introduction to Teradata Warehouse
Primary Indexes
Primary Indexes The Teradata Database requires only one primary index for each table.
Primary Index Characteristics The most efficient access method is through the primary index. The two, sometimes conflicting, design goals of choosing a primary index that gives good distribution of data across the AMPs, and choosing a primary index that reflects the most common usage pattern of the table must be balanced. Primary indexes: • • • •
Affect the distribution of rows across AMPs Do not have subtables Can be unique or non-unique May or may not be partitioned For information about partitioned indexes, see “Partitioned Primary Indexes” on page 8-5.
How Are Primary Keys and Primary Indexes Related The values chosen for the unique index of a table are frequently the same values identified as the primary key during the data modeling process, but no hard and fast rule makes this so. In fact, physical database design considerations often lead to a choice of values other than those of the primary key for the primary index of a table.
Introduction to Teradata Warehouse
8–3
Primary Indexes
The following table describes some of the relationships between primary keys and primary indexes: Primary Key
Primary Index
Constraint used to ensure referential integrity
Physical access mechanism
Required by the Teradata Database only if referential integrity checks are to be performed
Required by Teradata Database
64-column limit
8–4
IF the Teradata Database performs…
THEN the column limit is…
referential integrity checks
64.
no referential integrity checks
no arbitrary limit.
Defined by CREATE TABLE statement
Defined by CREATE TABLE statement
Must be unique
May be unique or non-unique
Identifies a row uniquely
Distributes rows
Values cannot be changed
Values can be changed
May not be null
May be null
Does not imply access path
Defines most common access path
Causes a unique primary index or unique secondary index. to be created
N/A
Introduction to Teradata Warehouse
Partitioned Primary Indexes
Partitioned Primary Indexes Both unique and non-unique primary indexes can be partitioned. A partitioned primary index, like a non-partitioned primary index, provides an access path to the rows in the base table via the primary index values.
Non-partitioned Primary Indexes You can define a primary index as either partitioned or non-partitioned. The non-partitioned primary index is the standard Teradata Database primary index. When a table is created with a partitioned primary index, the rows are hashed to the appropriate AMPs and assigned to an appropriate partition based on the value of a partitioning expression that you define when you create or alter the table. Once assigned to a partition, the rows are stored in row hash order.
How Do Partitioned and Non-Partitioned Primary Indexes Compare Partitioned primary indexes are designed to optimize range queries. A range query requests data that falls within specified boundaries while providing efficient primary index join strategies. The following table provides a comparison of partitioned and non-partitioned primary index capabilities: Capabilities
Partitioned
Non-Partitioned
Hash partitioned, that is distributed to the AMPs by the hash of the primary index columns
Yes
Yes
Partitioned on each AMP on some set of columns
Yes
No
Ordered by hash of the primary index columns on each AMP
Yes (within each partition)
Yes
Introduction to Teradata Warehouse
8–5
Secondary Indexes
Secondary Indexes Secondary indexes allow access to information in a table by alternate, less frequently used paths and improve performance by avoiding full table scans. Secondary indexes add to table overhead, in terms of disk space and maintenance, however, you can drop and recreate secondary indexes as needed. Secondary indexes: • • •
Do not affect the distribution of rows across AMPs Can be unique or non-unique Are used by the optimizer when the indexes can improve query performance
Secondary Index Subtables The system builds subtables for all secondary indexes. The subtable contains the rows that associate the secondary index value with one or more rows in the base table. When column values change, the system updates the rows in the subtable. When you drop the secondary index, the system physically removes the subtable.
How Do Primary and Secondary Indexes Compare The following table provides a brief comparison of primary and secondary index features: Feature
Primary
Secondary
Yes
No
Both
Both
Affects row distribution
Yes
No
Create and drop dynamically
No
Yes
Improves access
Yes
Yes
Create using multiple data types
Yes
Yes
Requires separate physical structure
No
Yes, a subtable
Requires extra processing overhead
No
Yes
Is required Can be unique or nonunique
8–6
Introduction to Teradata Warehouse
Join Indexes
Join Indexes A join index is an indexing structure containing columns from one or more base tables. Some queries can be satisfied by examining only the join index when all referenced columns are stored in the index. Such queries are said to be covered by the join index. Other queries may use the join index to qualify a few rows, then refer to the base tables to obtain requested columns that aren't stored in the join index. Such queries are said to be partially-covered by the index. Because the Teradata Database supports multi-table, partially-covering join indexes, all types of join indexes, except the aggregate join index, can be joined to their base tables to retrieve columns that are referenced by a query but are not stored in the join index. Aggregate join indexes can be defined for commonly-used aggregation queries. Much like secondary indexes, join indexes impose additional processing on insert and delete operations and update operations which change the value of columns stored in the join index. The performance trade-off considerations are similar to those for secondary indexes.
Single-Table Join Indexes A single table join index replicates some or all of its columns in another table that is frequently hashed on a join column (usually to match the primary index of the table to which it is most often joined) rather than the primary index of the original base table.
Multi-Table Join Indexes When queries frequently request a particular join, it may be beneficial to predefine the join with a multi-table join index. The optimizer can use the predefined join instead of performing the same join repetitively.
Aggregate Join Indexes When query performance is of utmost importance, aggregate join indexes offer an extremely efficient, cost-effective method of resolving queries that frequently specify the same aggregate operations on the same column or columns. When aggregate join indexes are available, the system does not have to repeat aggregate calculations for every query.
Introduction to Teradata Warehouse
8–7
Join Indexes
You can define an aggregate join index on two or more tables, or on a single table. A single-table aggregate join index includes a summary table with: • •
A subset of columns from a base table Additional columns for the aggregate summaries of the base table columns
You can create an aggregate join index using: •
• •
SUM function A SUM aggregate join index contains a hidden column containing the row count, so that AVERAGE can be calculated from the join index. COUNT function GROUP BY clause
Sparse Join Indexes Another capability of the join index allows you to index a portion of the table using the WHERE clause in the CREATE JOIN INDEX statement to limit the rows indexed. You can limit the rows that are included in the join index to a subset of the rows in the table based on an SQL query result. Any join index, whether simple or aggregate, multi-table or single-table, can be sparse. For example, the following DDL creates J1, which is an aggregate join index containing only the sales records from 2002: CREATE JOIN INDEX J1 AS SELECT storeid, deptid, SUM(sales_dollars) FROM sales WHERE EXTRACT(year, sales_date) = 2003 GROUP BY storeid, deptid;
When you enter a query, the optimizer determines whether accessing J1 gives the correct answer and is more efficient than accessing the base tables. This sparse join index would be selected by the optimizer only for queries that restricted themselves to data from the year 2003.
8–8
Introduction to Teradata Warehouse
Hash Indexes
Hash Indexes The hash index provides a space-efficient index structure that can be hash distributed to AMPs in various ways. The index has characteristics similar to a single-table join index with a row identifier that provides transparent access to the base table. A hash index may be simpler to create than a corresponding join index and takes somewhat less disk storage. The hash index has been designed to improve query performance in a manner similar to a single-table join index. In particular, you can specify a hash index to: • •
Cover columns in a query so that the base table does not need to be accessed Serve as an alternate access method to the base table in a join or retrieval operation
Introduction to Teradata Warehouse
8–9
Index Specification
Index Specification All tables require a primary index. If you do not specify a column or set of columns as the primary index for the table, then CREATE TABLE specifies a primary index by default.
Creating Indexes The following table provides general information about creating indexes. To specify a…
Use the following statement…
And the following clause…
unique primary index (UPI)
CREATE TABLE
UNIQUE PRIMARY INDEX.
non-unique primary index (NUPI)
CREATE TABLE
PRIMARY INDEX.
unique secondary index (USI)
CREATE TABLE
UNIQUE INDEX.
non-unique secondary index (NUSI)
CREATE TABLE
INDEX.
CREATE INDEX
N/A.
join index
CREATE JOIN INDEX
N/A.
Note: A join index can provide an index across multiple tables. hash index
CREATE HASH INDEX
N/A.
Note: A hash index can provide an index across multiple tables.
Indexes are also created when the PRIMARY KEY and UNIQUE constraints are specified.
Strengths and Weaknesses of Various Types of Indexes Teradata Database does not require or allow users to explicitly dictate how indexes should be used for a particular query. The Teradata Database optimizer costs all of the reasonable alternatives and selects the least expensive. The object of any query plan is to return accurate results as quickly as possible. Therefore, the optimizer uses an index or indexes only if the index speeds up query processing. In some cases, the optimizer processes the query without using any index.
8 – 10
Introduction to Teradata Warehouse
Index Specification
Selection of indexes: • • •
Can have a direct impact on overall Teradata performance Is not always a straightforward process Is based partly on usage expectations
The following table assumes execution of a simple SELECT statement and explains the strengths and weaknesses of some of the various indexing methods: This access method…
Has the following strengths…
And the following weaknesses…
Unique Primary Index (UPI)
is the most efficient access method when the SQL statement cont.ains the primary index value
none, provided that the column or columns making up the index are well chosen.
involves one AMP and one row requires no spool file (for a simple SELECT) can obtain the most granular locks Non-unique Primary Index (NUPI)
provides efficient access when the SQL statement contains the primary index value
may slow down INSERTs.
involves one AMP
may decrease the efficiency of SELECTs containing the primary index value when some values are repeated in many rows.
can obtain granular locks but not as fine as a UPI may not require a spool file as long as the number of rows returned is small Unique Secondary Index (USI)
provides efficient access when the SQL statement contains the USI values, and you do not specify primary index values
requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs.
involves two AMPs and one row requires no spool file (for a simple SELECT)
Introduction to Teradata Warehouse
8 – 11
Index Specification This access method…
Has the following strengths…
And the following weaknesses…
Non-unique Secondary Index (NUSI)
provides efficient access when the number of rows per value in the table is small
requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs
involves all AMPS and probably multiple rows
will not be used by the optimizer if the number of data blocks accessed is a significant percentage of the data blocks in the table because the optimizer will determine that a full table scan is cheaper.
provides access using information that may be more readily available than a UPI value, such as employee last name, compared to an employee number may require a spool file Full table scan
Multi-table join index
accesses each row only once
examines every row.
provides access using any arbitrary set of column conditions
usually requires a spool file possibly as large as the base table.
can eliminate the need to perform certain joins and aggregates repetitively
requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs for any of the base tables that contribute to the multi-table join index.
may be able to satisfy a query without referencing the base tables
usually is not suitable for data in tables subjected to a large number of daily INSERTs, UPDATEs, MERGEs, and DELETEs.
can have a different primary index from that of the base table
imposes some restrictions on operations performed on the base table.
can replace an NUSI or a USI Single-table join index
can isolate frequently used columns (or their aggregates) from those that are seldom used
requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs.
can reduce number of physical I/Os when only commonly used columns are referenced
imposes some restrictions on operations performed on the base table.
can have a different primary index from that of the base table
8 – 12
Introduction to Teradata Warehouse
Index Specification This access method…
Has the following strengths…
And the following weaknesses…
Sparse join index
can be stored in less space than an ordinary join index
requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs to the base table.
reduces the additional overhead associated with INSERTs, UPDATEs, MERGE, and DELETEs to the base table when compared with an ordinary join index
imposes some restrictions on operations performed on the base table.
can exclude common values that occur in many rows to help ensure that the optimizer chooses to use the join index to access them
Introduction to Teradata Warehouse
8 – 13
Hashing
Hashing The Teradata Database uses hashing to distribute data to disk storage and uses indexes to access the data. Because the architecture of the Teradata Database is massively parallel, it requires an efficient means of distributing and retrieving its data. That efficient method is hashing. All Teradata indexes are based on (or partially based on) row hash values rather than table column values. For primary indexes, the Teradata Database obtains a row hash by hashing the primary index value. The row hash and a sequence number, which is assigned to distinguish between rows with the same row hash within a table, are collectively called a row identifier and uniquely identify each row in a table. A partition identifier is also part of the row identifier in the case of partitioned primary index tables. For more information on partitioned primary index, see “Partitioned Primary Indexes” on page 8-5. For secondary indexes, the Teradata Database implements the index as a row identifier based on the: • • •
8 – 14
Hash the secondary index value Actual value of the secondary index List of row identifiers for rows with that secondary index value.
Introduction to Teradata Warehouse
Identity Column
Identity Column Identity Column is a column attribute option defined in the ANSI standard. When associated with a column, this attribute causes the system to generate a unique, table-level number for every row that is inserted into the table. Identity columns have many applications, including the automatic generation of UPIs, USI, and primary keys. For example, an identity column can serve as a UPI to ensure even data distribution when you import data from a system that does not have a primary index. For more information about indexes, see “Teradata Database Indexes” on page 8-2.
Introduction to Teradata Warehouse
8 – 15
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…
THEN see…
Identity columns
SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements
Indexes and hashing
Database Design SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing
8 – 16
Introduction to Teradata Warehouse
Chapter 9:
Data Dictionary The Data Dictionary is a set of system tables that contain data about user databases and properties of those databases in addition to a great deal of administrative information about the Teradata Database. This chapter provides information about the Data Dictionary. Topics include: • • •
Definition of the Data Dictionary Data Dictionary views SQL used to access the Data Dictionary
Introduction to Teradata Warehouse
9–1
What is the Data Dictionary
What is the Data Dictionary The Data Dictionary comprises tables and views that reside in the system database called DBC. These tables and views are reserved for use by the system and contain information, called metadata, about the data associated with the Teradata Database.
Data Dictionary Content Data Dictionary system tables include current definitions, control information, and general information about the following: • • • • • • • • • • • • • • • • • • • • • • • • •
9–2
Databases Users Roles Profiles Accounts Tables Views Columns Indexes Constraints Sessions and session attributes Triggers Access rights Journal tables Disk space Events Resource usage Macros Stored procedures Logs Rules Translations Character sets Statistics User-defined functions
Introduction to Teradata Warehouse
What is the Data Dictionary
What Is in a Data Dictionary Table The following table contains information about what is stored in the Data Dictionary when you create some of the most important objects:
WHEN you create…
THEN the definition of the object is stored along with the following details…
a table
table name and table location. database name, creator name, and user names of all owners in the hierarchy. each column in the table, including column name, data type, length, and phrases. user/creator access privileges on the table. indexes defined for the table. constraints defined for the table. table backup and protection, including fallback status and permanent journals. date and time the object was created.
a database
database name, creator name, owner name, and account name. space allocation including: • Permanent • Spool • Temporary number of fallback tables. collation type. password string and password change date. creation time stamp. logon and account logon rules. the date and time the database was last altered and the name that altered it. role and profile names. a unique identifier for the name of the UDF library.
Introduction to Teradata Warehouse
9–3
What is the Data Dictionary
WHEN you create…
a user
THEN the definition of the object is stored along with the following details…
user-name, creator name, and owner name. the date and time the password was last modified. space allocation including: • Permanent • Spool • Temporary default account, database, collation, character type, and date form. creation time stamp. name and time stamp of the last alteration made to the user. role and profile name.
WHEN you create a…
THEN the following details are entered in the Data Dictionary…
view or macro
the text of the view or macro. creation time attributes. user and creator access privileges.
stored procedure
creation time attributes. parameters including parameter name, parameter type, data type, and default format. user and creator access privileges.
9–4
Introduction to Teradata Warehouse
What is the Data Dictionary
WHEN you create a…
THEN the following details are entered in the Data Dictionary…
trigger
The IDs of the: • Table • Trigger • Database and subject table database • User who created the trigger • User who last updated the trigger time stamp for the last update. indexes. trigger name and: • whether the trigger is enabled • the event that fires the trigger • the order in which triggers fire. default character set. creation text and time stamp. overflow text, that is, trigger text that exceeds a specified limit. fallback tables.
User-defined function
database name, function name, specific name. number, data type, and style of parameters. function ID, function type, and external name. source file language. character type. external file reference. platform.
Introduction to Teradata Warehouse
9–5
Teradata Database Data Dictionary Views
Teradata Database Data Dictionary Views You can examine the information in the system tables in database DBC directly or through a series of views. Typically, you use views to obtain information on the objects in the Data Dictionary rather than querying the actual tables, which can be very large. The database administrator controls who has access to views.
What Is in a View A view is a virtual table that you see as a base table. Think of a view as a dynamic window to the underlying tables in the database. A view is constructed from one or more base tables, or views. However, a view usually presents only a subset of the columns and rows in the base table or tables that comprise the view. Some view columns do not exist in the underlying base tables. For example, it is possible to present data summaries in a view (for example, an average), which you cannot maintain in a base table. You can create hierarchies of views in which views can be created on views. This can be useful, but you should be aware that deleting any of the lower-level views invalidates dependencies of higher-level views in the hierarchy.
Why Use Views There are at least four reasons to use views. Views provide all of the following: • • • •
9–6
A simplified user perception of the database Security for restricting table access and updates Well-defined, well-tested, high-performance access to data Logical data independence, which minimizes application modification if base tables require restructuring
Introduction to Teradata Warehouse
Who Uses Data Dictionary Views
Who Uses Data Dictionary Views Some Data Dictionary views may be restricted to special types of users, while others are accessible by all users. The database administrator controls access to views by granting access rights. The following table defines the information needs of various types of users: This type of user…
Needs to know…
End
•
Objects to which the user has access
•
Types of access available to the user
•
Access rights the user has granted to other users
•
How to create and organize databases
•
How to monitor space usage
•
How to define new users
•
How to allocate access privileges
•
How to create indexes
•
How to perform archiving operations
•
Performance
•
Status and statistics
•
Errors
•
Accounting
•
Access logging rules generated by the execution of BEGIN LOGGING statements
•
Results of access checking events, logged as specified by the access logging rules
•
Archive and recovery activities
Supervisory
Database administrator
Security administrator
Operations control
Introduction to Teradata Warehouse
9–7
SQL Access to the Data Dictionary
SQL Access to the Data Dictionary Every time you log on to the Teradata Database, perform an SQL query, or type a password, you are using the Data Dictionary. For security and data integrity reasons, the only SQL DML command you can use on the Data Dictionary is the SELECT statement. You cannot use the INSERT, UPDATE, MERGE, or DELETE SQL statements to alter the Data Dictionary in any way. You can use SELECT to examine any view in the Data Dictionary to which your database administrator has granted you access. For example, if you need to access information in the Personnel database, then you can query the DBC.Databases view as shown: SELECT Databasename, Creatorname, Ownername, Permspace FROM DBC.Databases WHERE Databasename=’Personnel’ ;
The query above produces a report like this:
9–8
Databasename
Creatorname
Ownername
Permspace
Personnel
Jones
Jones
1,000,000
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database book: IF you want to learn more about…
THEN see…
Data Dictionary
Data Dictionary
Introduction to Teradata Warehouse
9–9
For More Information
9 – 10
Introduction to Teradata Warehouse
Chapter 10:
Teradata Meta Data Services Services The Teradata® Meta Data Services product provides a means of storing, administering, and navigating metadata in a Teradata Warehouse. It is the only metadata management system optimized for and integrated with the Teradata Database environment. Topics include: • • •
What is metadata Types of metadata Teradata Meta Data Services
Introduction to Teradata Warehouse
10 – 1
What Is Metadata
What Is Metadata Metadata is the term applied to the definitions of the data stored in the Teradata Warehouse. Simply put, metadata is data about data. In a transaction processing database environment, a Data Dictionary generally satisfies the need for data about data. In the data warehouse environment, the requirements for a more elaborate metadata storage system can exceed the capabilities of the Data Dictionary. Metadata plays an important role across the Teradata Warehouse architecture. In the operational database environment, that role is very formal. All development should use metadata as a standard part of the design and development process. As far as the data warehouse is concerned, metadata is used to locate data. Without it, you cannot not interact with the data in the data warehouse because you have no means of knowing how the tables are structured, what the precise definitions of the data are, or where the data originated.
10 – 2
Introduction to Teradata Warehouse
Types of Metadata
Types of Metadata Metadata has been around for as long as there have been programs and data. Bu, in the world of data warehouses, metadata takes on a new level of importance. Using metadata, you can make the most effective use of the Teradata Warehouse. Metadata allows the decision support system (DSS), analyst, to navigate through the possibilities. The major component of the DSS environment is archival data, that is, data with a timestamp. Because archival data is timestamped, it makes sense to store metadata with the actual occurrences of data, which are time stamped as well. The following table contains information about the types of metadata: For the…
The following types of metadata are stored…
data model
description. specification. the layout of the physical data model tables. relation between the data model and the data warehouse.
data warehouse
data source (system of record). definition of the system of record. mapping from system of record to the data warehouse and other places defined in the environment. table structures and attributes. any relationship or artifacts of relationships transformation of data as it passes into the data warehouse. history of extracts. extract logging. common routines for data access.
Introduction to Teradata Warehouse
10 – 3
Types of Metadata For the…
The following types of metadata are stored…
columns
columns in a row. order in which the columns appear. physical structure of the columns. any variable-length columns. any columns with NULL values. unit of measure of any numeric columns. any encoding used.
database design
description of the layouts used. structure of data as known to the programmers and analysts.
10 – 4
Introduction to Teradata Warehouse
Teradata Meta Data Services
Teradata Meta Data Services Teradata Meta Data Services (MDS) is software that creates a repository in the Teradata Warehouse in which metadata is stored. MDS also permits the DSS analyst to administer and navigate metadata in the warehouse. Teradata Meta Data Services is the only metadata management system optimized for and integrated with the Teradata Warehouse environment. The following table provides information about the benefits of Teradata Meta Data Services to several user groups: For this type of user…
Teradata MDS…
application developers
• Provides a persistent store for application metadata so that developers can concentrate on developing application functions. • Allows the developer to manipulate metadata with the same techniques used to manipulate other data. • Provides security (MDS controls the read and write access). • Allows metadata to be shared between applications. This allows integration of tools such as ordered analytical functions and data mining tools. • Allows application data to be modeled around Teradata Database metadata maintained by MDS. MDS maintains the metadata so that the application is kept current with warehouse database changes.
Introduction to Teradata Warehouse
10 – 5
Teradata Meta Data Services For this type of user…
Teradata MDS…
database administrator
• Provides a common repository for Teradata Warehouse components. • Provides a single shared copy of metadata, or a single version of the truth. One copy eliminates multiple islands of redundant metadata that can cause confusion and administrative difficulties. • Provides the capabilities to browse through data in the repository and to drill-down to see successive levels of detail. • Shows interrelationships between different data definitions. • Provides impact analysis of proposed changes.
business user
• Provides the foundation for a “warehouse view” of enterprise computing. • Allows business analysts to quickly determine where their data comes from, how it was changed, when it was last updated, and how the answer was determined. This greatly increases the value of the detail data and implicitly the value of the metadata. • Supports third-party tools that can be used to import metadata into MDS for viewing. • Supports a web browser that provides general reporting and search capabilities and shows strategic metadata relationships.
Creating the Teradata Meta Data Repository The Teradata MDS repository is a set of tables that resides in the Teradata Database. You must use MDS program software to create these tables before metadata can be added, stored, or accessed.
Connecting to the Teradata Meta Data Repository Each system running a Teradata MDS application must have the following: • •
10 – 6
The appropriate Teradata ODBC driver An ODBC System Data Source Name (DSN) connection to the Teradata Database where the MDS repository resides.
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Meta Data Services and Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
Teradata Meta Data Services
Teradata Meta Data Services Installation and Administrator Guide Teradata Meta Data Services Programmer Guide
Teradata ODBC driver
Introduction to Teradata Warehouse
Teradata ODBC Driver User Guide
10 – 7
For More Information
10 – 8
Introduction to Teradata Warehouse
Chapter 11:
Other Database Objects This chapter provides more information about a few of the database objects stored in the Teradata Database. Topics include: • • • •
Views Stored Procedures Macros Triggers
Introduction to Teradata Warehouse
11 – 1
What Are Views
What Are Views View database objects are actually virtual tables that you can use (as if they were physical tables) to retrieve data defining columns from underlying views and/or tables. Views are an integral to the Data Dictionary because view definitions are stored there. For more information about the role views play in the Data Dictionary, see “What Is in a View” and “Why Use Views” on page 9-6. A view does not contain data and is not materialized until an SQL statement references it. Views are useful because they can simplify access to information in the Teradata Database.
SQL Statements Related to Views The following table lists SQL statements that you can use to implement and change views: Use…
To…
CREATE VIEW
name the view and columns contained in the view. define a SELECT on one or more columns from other tables and/or views.
REPLACE VIEW
alter the characteristics of an existing view.
Restrictions on Using Views You can use views as if they were tables in SELECT statements. Views are subject to some restrictions regarding the INSERT, UPDATE, MERGE, and DELETE statements. For more information, see “SQL Access to the Data Dictionary” on page 9-8.
11 – 2
Introduction to Teradata Warehouse
What Are Teradata Stored Procedures
What Are Teradata Stored Procedures The stored procedure database object that is executed on the Teradata Database server space. It is a combination of procedural control statements, SQL statements, and control declarations that provides a procedural interface to the Teradata Database.
Why Use Stored Procedures Using stored procedures, you can build large and complex database applications. In addition to a set of SQL control statements and condition handling statements, a stored procedure can contain the following: • • •
Multiple input and output parameters Local variables and cursors SQL DDL, DCL, DML, and SELECT statements, including dynamic SQL, with a few exceptions Dynamic SQL is a method of invoking an SQL statement by creating and submitting it at runtime from within a stored procedure.
Applications based on stored procedures provide the following benefits: • • • • • •
They reduce network traffic in the client-server environment because stored procedures reside and execute on the server. They allow encapsulation and enforcement of business rules on the server, contributing to improved application maintenance. They provide better transaction control. They provide better security by granting the user access to the procedures rather than to the data tables. They provide an exception handling mechanism to handle the runtime conditions generated by the application. All the SQL and SQL control statements embedded in a stored procedure are executed by submitting one CALL statement. Nested CALL statements further extend the versatility.
Introduction to Teradata Warehouse
11 – 3
What Are Teradata Stored Procedures
Elements of a Teradata Stored Procedure A Teradata stored procedure comprises some or all of the following elements: This elements…
Includes…
SQL control statements
nested or non-nested compound statements
Control declarations
Condition handlers in DECLARE HANDLER statements for completion and exception conditions: Note: Condition handlers can be: •
CONTINUE or EXIT type.
•
Defined for a specific SQLSTATE code, the generic exception condition SQLEXCEPTION, or generic completion conditions NOT FOUND and SQLWARNING
•
Local variable declarations in DECLARE statements
Local variable declarations in DECLARE statements Cursor declarations in DECLARE CURSOR statements Note: Cursors can be either updatable or read only type. These can also be declared in FOR iteration statements.
SQL transaction statements
DDL, DCL, DML, and SELECT statements, including dynamic SQL statements, with a few exceptions
LOCKING modifiers
with all supported SQL statements except CAL
bracketed and simple comments
Note: Nested bracketed comments are not allowed.
For more information, see “Teradata Stored Procedures as SQL Applications” on page 6-7.
11 – 4
Introduction to Teradata Warehouse
What Are Macros
What Are Macros The macro database object consists of one or more SQL statements that can be executed by performing a single statement. Each time the macro is performed, one or more rows of data can be returned.
SQL Statements Related to Macros The following table lists the basic SQL statements that you can use with macros: Use this statement…
To…
CREATE MACRO
incorporate a frequently used SQL statement or series of statements into a macro.
EXECUTE
run to a macro. Note: A macro can also contain an EXECUTE statement that executes another macro.
DROP MACRO
delete a macro.
Single-User and Multi-User Macros You can create a macro for your own use, or grant execution authorization to others. For example, your macro might enable a user in another department to perform operations on the data in the Teradata Database. When executing the macro, the user need not be aware of the database access, the tables affected, or even the results.
Macro Processing Regardless of the number of statements in a macro, the Teradata Database treats it as a single request. When you execute a macro, the system processes either all of the SQL statements, or processes none of the statements. If a macro fails, the system aborts it, backs out any updates, and returns the database to its original state.
Introduction to Teradata Warehouse
11 – 5
What Are Triggers
What Are Triggers The trigger defines events that happen when some other event, called a triggering event, occurs. This database object is essentially, a stored SQL statement associated with a table called a subject table. Teradata has ensured that its trigger implementation complies with ANSI SQL3 specifications. Triggers execute when any of the following modifies a specified column or columns in the subject table: • • •
DELETE INSERT UPDATE
Typically, the stored SQL statements perform a DELETE, INSERT, or UPDATE on a table different from the subject table.
Types of Triggers Teradata Database supports two types of triggers: This type of trigger…
Fires for each…
statement
statement that modifies the subject table.
row
row modified in the subject table.
When Do Triggers Fire You can specify when triggers fire: WHEN you specify…
THEN the triggered action…
BEFORE
executes before the completion of the triggering event. As specified in ANSI SQL3 standard, a BEFORE trigger cannot have data changing statements in the triggered action.
AFTER
executes after completion of the triggering event.
Sometimes a statement fires a trigger, which, in turn, fires another trigger. Thus the outcome of one triggering event can itself become another trigger. The
11 – 6
Introduction to Teradata Warehouse
What Are Triggers
Teradata Database processes and optimizes the triggered and triggering statements in parallel to maximize system performance.
ANSI-Specified Order When you specify multiple triggers on a subject table, both BEFORE and AFTER triggers execute in the order in which they were created as determined determined by the timestamp of each trigger. Triggers are sorted according to the preceding ANSI rule, unless you use the Teradata extension, ORDER. This extension allows you to specify the order in which the triggers execute, regardless of creation time stamp.
Trigger Functions You can use triggers to perform various functions: • •
• •
Define a trigger on the subject table to ensure that UPDATEs and DELETEs performed to the parent table are propagated to another table. Use triggers for auditing. For example, you can define a trigger which causes INSERTs in a log table when an employee receives a raise higher than 10%. Use a trigger to disallow massive UPDATEs, INSERTs, or DELETEs during business hours. Use a trigger to set a threshold. For example, you can use triggers to set thresholds for inventory of each item by store, to create a purchase order when the inventory drops below a threshold, or to change a price if the daily volume does not meet expectations.
SQL Statements Related to Triggers The following table lists the basic SQL statements that you can use with triggers: Use this statement…
To…
CREATE TRIGGER
create a trigger.
REPLACE TRIGGER
change the definition of a trigger without dropping and recreating it.
DROP TRIGGER
drop a trigger definition from a subject table.
HELP TRIGGER
display the attributes of the specified trigger.
SHOW TRIGGER
display the text used to create the trigger.
Introduction to Teradata Warehouse
11 – 7
What Are Triggers Use this statement…
To…
ALTER TRIGGER
enable, disable, or modify the creation time stamp of a trigger. Note: ALTER TRIGGER is a Teradata extension that is not included in ANSI specifications.
RENAME TRIGGER
change the name of a trigger.
Elements of a Trigger The definition of a database trigger resides in the Data Dictionary. The definition contains some or all of the following elements: Element
Comment
Trigger name
The trigger name must be unique within a database, that is, a trigger and any other object in the database cannot have the same name.
Enabled/Disabled
When you disable a trigger, the definition still resides in the Data Dictionary. To enable the disabled trigger, you can execute: ALTER TRIGGER ENABLED. Note: The ENABLE/DISABLE option is a Teradata extension to ANSI SQL3 triggers.
11 – 8
Table name
The name of the subject table must be the name of an existing base table, not a view, temporary table, join index, or hash index.
Trigger action time
The triggering statement executes based on whether you specify BEFORE or AFTER when you create the trigger: Use…
To…
BEFORE
fire the trigger before the triggering statement executes.
AFTER
fire the trigger after the triggering statement executes.
Introduction to Teradata Warehouse
What Are Triggers Element
Triggering event
Comment
The event is identified by the statement type that causes the trigger to fire.
IF the statement type is…
THEN triggering statement can be the following…
INSERT
• INSERT • INSERT/SELECT • Atomic Upsert • MERGE INTO
UPDATE
DELETE
• UPDATE •
Atomic Upsert
•
MERGE INTO
DELETE
Column name list
The list contains the column names that appear in the subject table for an UPDATE trigger. The columns list applies only when the triggering event is an UPDATE.
Order
When you define multiple triggers, you can specify the order in which the triggers execute. Order values are integers from 1 and 32767.
Transition Table and Transition Rows
The transition table is a temporary table comprising transition rows.
REFERENCING clause
The clause does the following:
The transition rows hold the old and new values for the rows that are modified by a data modifying statement. The transition table is not stored in the Data Dictionary.
• Allows the WHEN condition and triggered actions to reference a set of rows in the transition table • Permits a row trigger to reference variables representing columns of the current row in the transition table. The rules for BEFORE and AFTER triggers are: • AFTER statement triggers can reference transition tables only. • AFTER row triggers can reference both transition rows and transition tables. • BEFORE row triggers can reference transition rows only.
Introduction to Teradata Warehouse
11 – 9
What Are Triggers Element
Comment
Triggered action
• You can specify trigger granularity as either ROW or STATEMENT. • WHEN is the optional search condition. • The database evaluates the search condition as follows: Once for each execution of the triggering statement for a statement trigger Once for each row of the transition table of changed rows for a row trigger. • Cascading is not itself an element but derives from trigger definitions. Sometimes a statement fires a trigger, which, in turn, fires another trigger. Thus the outcome of one triggering event can itself become another trigger. • AFTER row and AFTER statement triggers can cascade. • Backward references to triggering statements are permitted in a chain of cascading triggers. In other words, recursive triggers are allowed.
Triggered SQL statement
Generally, triggered SQL statements comprise a single statement or a block of statements.
Restrictions on Triggers The following table lists restrictions associated with using triggers: Restriction
11 – 10
Comment
The FastLoad and MultiLoad utilities cannot load data into tables that have triggers defined.
You must disable triggers before running the FastLoad and MultiLoad utilities.
A positioned (updatable cursor) UPDATE or DELETE is not allowed to fire a trigger.
You will receive an error message.
You cannot define triggers, join indexes, or hash indexes on the same table.
N/A
The limit for cascading triggers is 16.
You will receive an error message when a triggering statement causes the cascading level to exceed 16.
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…
THEN see…
Teradata stored procedures
SQL Reference: Stored Procedures and Embedded SQL
Triggers, views, and macros
Database Design SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements SQL Reference: Fundamentals SQL Reference: Statement and Transaction Processing Security Administration
Introduction to Teradata Warehouse
11 – 11
For More Information
11 – 12
Introduction to Teradata Warehouse
Section Contents
Section 3:
Teradata Database System Operation
Introduction to Teradata Warehouse
Section Contents
Introduction to Teradata Warehouse
Chapter 12:
Normalization and Referential Integrity This chapter reviews some concepts of the normalization process. The following topics are described in the chapter. • Normal forms • Referential integrity
Introduction to Teradata Warehouse
12 – 1
Normalization
Normalization Normalization is the process of reducing a complex data structure into a simple, stable one. Generally this process involves removing redundant attributes, keys, and relationships from the conceptual data model.
Normal Forms Normalization theory is constructed around the concept of normal forms that define a system of constraints. If a relation meets the constraints of a particular normal form, we say that relation is “in normal form." Think of normal forms as an onion, with the outermost layer being the set of all relations, including unnormalized relations. As you work your way to the core of the onion, you must pass through each lower normal form. As a result, a relation that has achieved fifth normal form has also achieved first, second, third, and fourth normal forms. By definition, a relational database is always normalized to some degree, because the column values are always atomic. But to simply leave it at that invites a number of problems including redundancy and potential update anomalies. The higher normal forms were developed to correct those problems. The following figure illustrates the layers of normalization.
All relations 1NF relations 2NF relations 3NF relations
BCNF relations 4NF relations 5NF relations
FG04A001
12 – 2
Introduction to Teradata Warehouse
Normalization
Relational Database Terminology The table below defines some important terms that will help you understand discussion of normal forms: Term
Primary key
Definition
A unique identifier for a relation. Set theory (and relational database theory) does not allow duplicate rows for a relation with a primary key. However, commercially available relational databases often allow duplicate rows in relations. In those cases, the relation does not have a primary key. The Teradata Database permits enforcement of the no duplicates rule even when no primary key is specified.
Candidate key
One of multiple unique identifiers for a relation. Any relation might have multiple unique identifiers. A candidate key must satisfy the properties of uniqueness and minimality. That is, for any attribute, no two rows of the table may have the same value for that attribute, and if it is composite, no component can be eliminated without destroying the uniqueness property.
Alternate key
Any candidate key not chosen as the primary key.
Foreign key
A primary key in another relation that is also a column value in the current relation. Foreign keys are used to join tables and may be part of the primary key.
Functional dependence
Attribute X is functionally dependent on attribute Y, if and only if each Y value in the relation has associated with it exactly one X value.
Full functional dependence
Attribute X is fully functionally dependent on attribute Y, if and only if it is functionally dependent on Y and not functionally dependent on any proper subset of Y.
Transitive dependence
A state in which an attribute is fully functionally dependent but by means of an intermediate attribute. Transitive dependence is a state that normalization strives to eliminate.
Determinant
Any attribute on which some other attribute is fully functionally dependent.
Multivalued dependence
Given a relation with attributes X, Y, and Z, multivalued dependence holds if and only if the set of Y-values matching a given (X-value, Z-value) pair depends only on the X-value and is independent of the Z-value.
Join
An operation in which data is retrieved from more than one table.
Introduction to Teradata Warehouse
12 – 3
Normalization Term
12 – 4
Definition
Join dependency
A relation satisfies join dependency if and only if it is equal to the join of its projections on its component attributes.
Constraint
A well-defined physical restriction that can be defined for a table or a column.
Introduction to Teradata Warehouse
First, Second, and Third Normal Forms
First, Second, and Third Normal Forms This section describes the first three normal forms, including what they are, why we need them, and how to achieve them. These first three normal forms are stepping stones to the Boyce-Codd normal form and, when appropriate, the higher normal forms. The next section contains a discussion Boyce-Codd (BCNF) and higher normal forms.
First Normal Form First normal form (1NF) is definitive of a relational database. If we are to consider a database relational, then all relations in the database must be in 1NF. We say a relation is in 1NF if all fields within that relation (simple domains in mathematics) are atomic. This means that a field can contain one and only one value. We sometimes refer to this concept as the elimination of repeating groups from a relation. Furthermore, first normal form allows no hierarchies of data values. The formal definition of first normal form is as follows: For a relation to be in 1NF, the relationship between the primary key of the relation and each of the other attributes must be one-to-one (in that direction). In other words, all underlying simple domains of the relation may contain atomic values only. In this way, the non-key attributes are functionally dependent on the key. Note: A non-key attribute is any attribute that is not part of the primary key for the relation.
Second Normal Form Second normal form (2NF) deals with the elimination of circular dependencies from a relation. We say a relation is in 2NF if it is in 1NF and if every non-key attribute is fully dependent on the entire primary key. The formal definition of second normal form is as follows: For a relation to be in 2NF, the relationship between any portion of the primary key of a relation and each of the other columns must not be one-to-one (in that direction). In other words, the non-key columns are fully functionally dependent on the primary key.
Introduction to Teradata Warehouse
12 – 5
First, Second, and Third Normal Forms
Third Normal Form Third normal form (3NF) deals with the elimination of non-key attributes that do not describe the primary key. The formal definition of third normal form is as follows: For a relation to be in 3NF, the relationship between any two non-primary key columns or groups of columns in a relation must not be one-to-one in either direction. In other words, the non-key columns are non-transitively dependent upon each other and the key. No transitive dependencies implies no mutual dependencies. We say attributes are mutually independent if none of them is functionally dependent on any combination of the others. This mutual independence ensures that we can update individual attributes without any danger of affecting any other attribute in a row.
Advantages of Normalization The following list of benefits summarizes the advantages of implementing a normalized logical model in 3NF. • • • • •
12 – 6
Greater number of relations More primary index choices Optimal distribution of data Fewer full table scans More joins possible
Introduction to Teradata Warehouse
Boyce-Codd Normal Form and Higher Normal Forms
Boyce-Codd Normal Form and Higher Normal Forms When the relational model of database management was originally proposed, it only addressed the first three normal forms. Later work with the model showed that 3NF required further refinement to ensure that update anomalies would never occur. This section describes Boyce-Codd normal form and briefly mentions fourth and fifth normal forms for completeness.
Boyce-Codd Normal Form Third normal form (3NF) does not handle situations in which a relation has multiple composite candidate keys with overlapping attributes. To eliminate these problems, Codd developed the so-called Boyce-Codd normal form (BCNF), which reduces to 3NF whenever the special situation that defines this problem does not apply. A relation is in BCNF if and only if every determinant is a candidate key. This means that only determinants are candidate keys.
Fourth Normal Form We say a relation is in fourth normal form (4NF) if and only if, whenever a multivalued dependency exists in the relation (for example, say X multiply determines Y), then all attributes of the relation are also functionally dependent on X. In practice, we rarely see the need for 4NF.
Fifth Normal Form So far it has been possible to normalize a relation by decomposing it into two of its projections. In rare occasions, simple projections are not sufficient to decompose a non-normal relation into two relations. In these rare instances, we use fifth normal form (5NF) to decompose the unnormalized relation into three or more projections of the original relation. We say a relation is fifth normal form (5NF - sometimes called projection-join normal form, or PJ/NF) if and only if every join dependency in the relation is a consequence of the candidate keys of the relation. This makes 5NF the final possible normal form to be achieved by taking projections and using joins. It is guaranteed to be free of all anomalies that can be removed by taking projections, but not necessarily of all possible anomalies.
Introduction to Teradata Warehouse
12 – 7
Referential Integrity
Referential Integrity Traditional referential integrity is the concept of relationships between tables, based on the definition of a primary key and a foreign key. The concept states that a row cannot exist in a table with a non-null value for a referencing column if an equal value does not exist in a referenced column. Using referential integrity, you can specify columns within a referencing table that are foreign keys for columns in some other referenced table. You must define referenced columns as either primary key columns or unique columns. Referential integrity is a reliable mechanism that prevents accidental database inconsistencies when you perform INSERTS, UPDATES, and DELETES.
Referential Integrity in the Teradata Database To implement referential integrity in the Teradata Database, you have three choices: • • •
Use the referential integrity constraint checks supplied by the database software Write your own, site-specific macros, triggers, or stored procedures to enforce referential integrity Enforce constraints through application code
For information about bypassing the standard referential constraint checks, see “Referential Constraints” on page 12-11.
Referential Integrity Terminology We use the following terms to explain the referential integrity concept: Term
12 – 8
Definition
Parent Table
The table referred to by a Child table. Also called the “referenced table.”
Child Table
A table in which the referential constraints are defined. Also called the “referencing table.”
Parent Key
A primary or secondary key in the parent table.
Primary Key
With respect to referential integrity, a primary key is a parent table column set that is referred to by a foreign key column set in a child table.
Foreign Key
With respect to referential integrity, a foreign key is a child table column set that refers to a primary key column set in a parent table.
Introduction to Teradata Warehouse
Referential Integrity
Referencing (Child) Table We call the referencing table the Child table, and we call the specified Child table columns the referencing columns. Referencing columns should be of the same number and have the same data type as the referenced table key.
Referenced (Parent) Table A Child table must have a parent table, and the referenced table is referred to as the Parent table. The parent key columns are the referenced columns.
Why Is Referential Integrity Important Referential integrity is important, because it keeps you from introducing errors into your database. Suppose you have a table like the following: ORDER PART Order Number
Part Number
Quantity
PK FK
FK
Not Null
1
1
110
1
2
275
2
1
152
Part number and order number, each foreign keys in this relation, also form the composite primary key. Suppose you were to delete the row defined by the primary key value 1 in the PART NUMBER table. The foreign key for the first and third rows in the ORDER PART table would now be inconsistent, because there would be no row in the PART NUMBER table with a primary key of 1 to support it. Such a situation shows a loss of referential integrity. Teradata provides referential integrity to prevent this from happening. If you try to delete a row from the PART NUMBER table for which you have specified referential integrity, the database management system will not allow you to remove the row.
Introduction to Teradata Warehouse
12 – 9
Referential Integrity
Besides data integrity and data consistency, referential integrity provides these benefits: Benefit
12 – 10
Description
Increases development productivity
You do not need to code SQL statements to enforce referential integrity constraints, because the Teradata Database automatically enforces referential integrity.
Requires fewer written programs
All update activities are programmed to ensure that referential integrity constraints are not violated, because the Teradata Database enforces referential integrity in all environments. Additional programs are not required.
Introduction to Teradata Warehouse
Referential Integrity Constraints
Referential Integrity Constraints The combination of the foreign key, the parent key, and the relationship between the two is called the referential integrity constraint. The table containing the parent key is called the parent table and the table with the foreign key is called the child. Teradata provides two other features related to referential integrity constraints: • •
Referential constraints Batch referential integrity constraints
The following table summarizes the basic differences among these referential constraint types: Does This Type Enforce Referential Integrity
Referential Constraint Type
Level of Referential Integrity Enforcement
Referential constraint
No
None
Batch referential integrity constraint
Yes
Transaction
Referential integrity constraint
Yes
Row
Referential Constraints The referential constraint is a mechanism that allows you to specify a type of constraint that is not enforced by the Teradata Database. This capability avoids the database overhead of enforcing the referential integrity, but at the same time, the optimizer can use the constraint information. The ability to specify referential constraints, using the CREATE TABLE and ALTER TABLE statements, is particularly helpful in eliminating redundant joins based on parent key and foreign key relationships. Successful use of referential constraints depends heavily upon your knowledge of the database. To avoid the introduction of inconsistencies, you may choose to use another mechanism to enforce database integrity.
Batch Referential Integrity Teradata offers batch referential integrity as a middle ground between traditional referential integrity and referential constraints. Batch referential integrity is a reliable mechanism that prevents accidental database inconsistencies when you perform INSERTs, UPDATEs, and DELETEs. You can use the WITH CHECK OPTION clause to specify batch referential integrity in CREATE TABLE and ALTER TABLE statements. When you specify
Introduction to Teradata Warehouse
12 – 11
Referential Integrity Constraints
the WITH CHECK OPTION, the database enforces the referential integrity constraint as all or nothing. This means that all child rows have a match in the parent table, otherwise, the database aborts the alter table, insert, delete or update transaction. If you specify the WITH NO CHECK OPTION clause in CREATE TABLE and ALTER TABLE statements, the database does not enforce constraints. You should use extreme care when manipulating data within a NO CHECK environment. NO CHECK means that a row having a non-null value in a foreign key column is allowed to exist in a child table when an equal value does not exist in the parent key or alternate key column of the parent table. Operations such as INSERT, DELETE, or UPDATE are allowed on NO CHECK tables that cannot be performed on tables that have WITH CHECK OPTION specified. Data in the parent tables of these relationships can be deleted or corrupted. Depending on the operation, the database does not give a warning if such an error occurs. Batch referential integrity is less expensive to enforce than standard referential integrity for transactions affecting multiple rows because the database handles batch referential integrity on a transaction-basis rather than on a row-by-row basis.
Rules for Referential Integrity Constraints Referential integrity constraints must meet the following criteria: To implement referential integrity…
Must…
The parent key columns
exist when the referential constraint is defined. be either a unique primary index (UPI) or a Unique Secondary Index (USI) and not null.
The foreign and parent key
have the same number of columns and their data types must match. not exceed 64 columns. not be dropped or altered with the ALTER TABLE statement after you have defined a referential integrity constraint on them. To use ALTER TABLE to drop a foreign or parent key after a referential integrity constraint has been defined, first drop the referential constraint and then USE ALTER TABLE to drop the foreign or parent key columns.
12 – 12
Introduction to Teradata Warehouse
Referential Integrity Constraints To implement referential integrity…
Must…
Foreign key
be equal to the parent key, or it must be null.
When the parent and child tables are the same table, a condition called selfreference, the foreign key and parent keys
not consist of identical columns.
Referential constraints
not be duplicated.
The number of referential constraints defined per table
not exceed 64.
Referential Constraint Checks The Teradata Database performs referential constraint checks whenever you do any of the following: • • •
Add a referential constraint to a populated table Insert, delete, or update a row Modify a parent or foreign key, for example using ALTER table
The following table summarizes how the Teradata Database enforces referential constraint checks: WHEN performing…
The Teradata Database…
an INSERT into parent table
does nothing.
an INSERT into child table
ensures that the parent key value contains a matching value if the foreign key is not null.
a DELETE from parent table
aborts the request if the deleted parent key is referenced by any foreign key.
a DELETE from child table
does nothing.
an UPDATE parent table
aborts the request if the parent key is referenced by any foreign key.
an UPDATE child table
ensures that the new value matches the parent key when the foreign key is updated.
Introduction to Teradata Warehouse
12 – 13
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database book: IF you want to learn more about…
THEN see…
Normalization
Database Design
Referential integrity Relational model of database management
12 – 14
Introduction to Teradata Warehouse
Chapter 13:
Data Communication Between Client and Teradata Database This chapter describes various ways the client applications can communicate with the Teradata Database. Teradata provides the Call Level Interface (CLI) that provides the service routines needed by applications. In addition to CLI, Teradata supports other industry standard communications protocols. Topics in this chapter include: • • •
How clients attach to Teradata Database CLI for channel- and network-attached clients Other standard communications methods
Introduction to Teradata Warehouse
13 – 1
Attachment Methods
Attachment Methods Clients can connect to the Teradata Database using one of the following methods: • •
Channel attached through an IBM mainframe Network attached through a Local Area Network (LAN)
Client applications that manipulate data on the Teradata Database server communicate with the database indirectly by means of communications interfaces: • •
Call Level Interface Version 2 (CLIv2) for channel-attached systems Call Level Interface Version 2 (CLIv2) for network-attached systems
Both versions provide the same functions. The CLIv2 is a library of service routines that act as subroutines of the application. The modules in the CLIv2 library vary based on whether the client is channel- or network-attached. Other types of communications interfaces are available including interfaces for systems running Microsoft Windows 2000 and interfaces for systems running NCR UNIX MP-RAS. The interfaces include: • • •
Windows Call Level Interface (WinCLI) (Windows-based system) Open Database Connectivity (ODBC) (Windows and UNIX MP-RAS-based systems) Java Database Connectivity (JDBC) (Windows and UNIX MP-RAS-based systems)
The data communications interfaces are discussed in the following sections.
13 – 2
Introduction to Teradata Warehouse
CLIv2 for Channel-Attached Systems
CLIv2 for Channel-Attached Systems CLIv2 is a collection of callable service routines that provide the interface between applications and the Teradata Director Program (TDP) on an IBM mainframe client. TDP is the interface between CLIv2 and the Teradata Database server. CLIv2 can operate with all versions of IBM operating systems, including Multiple Virtual Storage (MVS), OS/390, Customer Information Control System (CICS), Information Management System (IMS), and Virtual Machine (VM).
What CLIv2 for Channel-Attached Clients Does By way of TDP, CLIv2 sends requests to the server, and provides the application with a response returned from the server by way of TDP. CLIv2 provides support for: • • • • •
Managing multiple serially-executed requests in a session Managing multiple simultaneous sessions to the same or different servers Using cooperative processing so that the application can perform operations on the client and the server at the same time Communicating with two-phase commit coordinators for CICS and IMS transactions Generally insulating the application from the details of communicating with a server
Teradata Director Program TDP manages communications between CLIv2 and a server. The program executes on the same mainframe as CLIv2, but runs as a different job or virtual machine. An individual TDP is associated with one logical server; note however, that any number of TDPs may operate, and be accessed by CLIv2 simultaneously on the same mainframe. Each TDP is referred to by the application with an identifier called the TDPid (TDP2, for example) that is unique in a mainframe. Functions of TDP include the following: • • • •
Session initiation and termination Logging, verification, recovery, and restart Physical input to and output from the server, including session balancing and queue maintenance Security
Introduction to Teradata Warehouse
13 – 3
CLIv2 for Channel-Attached Systems
Server A server implements the actual relational database that processes requests received from CLIv2 by way of TDP. The following figure illustrates the logical structure of the client-server interface.
Application Program
REQUESTS RESPONSES
CLIv2
TDP
TDP
TDP
Teradata Database Server
Teradata Database Server
Teradata Database Server
1091B004
13 – 4
Introduction to Teradata Warehouse
CLIv2 for Network-Attached Systems
CLIv2 for Network-Attached Systems CLIv2 is a collection of callable service routines that provide the interface between applications on a LAN-connected client and the Teradata Database server.
What CLIv2 for Network-Attached Clients Does CLI is the interface between the application program and the Micro Teradata Director Program (MTDP). CLIv2 can: • •
Build parcels that MTDP packages for sending to the Teradata Database using the Micro Operating System Interface (MOSI) Provide the application with a pointer to each of the parcels returned from the Teradata Database
Micro Teradata Director Program The MTDP must be linked to applications that will be network-connected to the Teradata Database. The MTDP performs many of the same functions as the channel-based TDP including: • • •
Session initiation and termination Physical input to and output from the server Logging, verification, recovery, and restart
Unlike TDP, MTDP does not control session balancing.
Micro Operating System Interface MTDP is the interface between CLI and MOSI. MOSI is a library of service routines that provides operating system independence among the clients that access the Teradata Database. By implementing the MOSI, only one version of MTDP is required to run on all network-connected platforms.
Introduction to Teradata Warehouse
13 – 5
CLIv2 for Network-Attached Systems
These modules and the relationships among them are illustrated in the following figure:
Application Program
REQUESTS RESPONSES
CLI
MTDP
MOSI
Teradata Database Server 1091B005
13 – 6
Introduction to Teradata Warehouse
Other Types of Data Communications
Other Types of Data Communications Other types of communications interfaces are available for systems running Windows 2000 or UNIX MP-RAS.
WinCLI WinCLI is a call-level interface for MS-DOS and Windows-based applications. CLI routines are provided as object modules that have been compiled or assembled according to standard linkage conventions. WinCLI uses the Dynamic Data Exchange (DDE) protocol to communicate with application programs.
ODBC The Open Database Connectivity (ODBC) Driver for the Teradata Database provides an alternate interface to Teradata Databases using the industry standard ODBC Application Programming Interface (API). The ODBC Driver for the Teradata Database provides Core-level SQL and Extension-level 1 (with some Extension-level 2) function call capability using the Windows Sockets (WinSock) Transmission Control Protocol/Internet Protocol (TCP/IP) communications software interface. ODBC operates independently of CLI and WinCLI.
JDBC Teradata developed the Teradata JDBC Driver that enables you to access the Teradata Database using the Java language. Java Database Connectivity (JDBC) is a specification for an API. The API allows platform-independent Java applications to access database management systems using SQL. The JDBC API provides a standard set of interfaces for: • • •
Opening connections to databases Executing SQL statements Processing results
The driver is a set of Java classes that use the TCP/IP communications software to connect to the Teradata JDBC Gateway, which is constantly listening on the network port for connection requests. For each gateway connection, a new session is created. The Java program can select different gateways by using different URLs. All JDBC function requests are routed to the gateway, which in turn accesses the Teradata Database using Teradata CLIv2. More than one gateway can run on the same host if the gateways are configured to use different network ports.
Introduction to Teradata Warehouse
13 – 7
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
Call-Level Interface programming
Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems
13 – 8
JDBC
Teradata Driver for the JDBC Interface User Guide
ODBC
Teradata ODBC Driver User Guide
Teradata Director Program
Teradata Director Program Reference
WinCLI
Teradata Call-Level Interface Version 2 Developers Kit for Microsoft Windows
Introduction to Teradata Warehouse
Chapter 14:
Reliability The Teradata Database addresses the critical requirements of reliability, availability, serviceability, usability, and installability (RASUI) by combining the following elements: • • •
Multiple microprocessors in a Symmetric Multiprocessing, (SMP) arrangement RAID disk storage technology Protection of the Teradata Database from operating anomalies of the client platform. Both hardware and software provide fault tolerance, some of which is mandatory and some of which is optional.
Topics include: • •
Software fault tolerance Hardware fault tolerance
Introduction to Teradata Warehouse
14 – 1
Software Fault Tolerance
Software Fault Tolerance This section explains the following Teradata Database facilities for software fault tolerance: • • • • • •
Vproc migration Fallback tables AMP clusters Journaling Archive/Recovery Table Rebuild utility
Vproc Migration Because the Parsing Engine (PE) and Access Module Processor (AMP) are vprocs and therefore software entities, they can migrate from their home node to another node within the same hardware clique if the home node fails for any reason. Although the system normally determines which vprocs migrate to which nodes, a user can configure preferred migratory destinations. Vproc migration permits the system to function completely during a node failure, with some degradation of performance due to the non-functional hardware.
14 – 2
Introduction to Teradata Warehouse
Software Fault Tolerance
The following figure illustrates vproc migration, where the large X indicates a failed node, and arrows pointing to nodes still running indicate the migration of AMP3, AMP4, and PE2.
PE1
AMP1
AMP2
PE2
AMP3
AMP4
PE3
AMP5
AMP6
Normal
ARRAY
PE1
AMP1
AMP4
AMP2
AMP3
PE3
AMP6
PE2 AMP5
Recovery
ARRAY
GG01A027
Note: PEs for channel-attached connections cannot migrate during a node failure, because they depend on the channel hardware physically attached to their node.
Fallback Tables A fallback table is a duplicate copy of a primary table. Each fallback row in a fallback table is stored on an AMP different from the one to which the primary row hashes. This storage technique maintains availability should the system lose an AMP and its associated disk storage in a cluster. In that event, the system would access data in the fallback rows. The disadvantage of fallback is that this method doubles the storage space and the I/O (on INSERTs, UPDATEs, and DELETEs) for tables. The advantage is that data is almost never unavailable because of one down AMP. Data is fully available during an AMP or disk outage, and recovery is automatic after repairs have been made.
Introduction to Teradata Warehouse
14 – 3
Software Fault Tolerance
The Teradata Database permits the definition of fallback for individual tables. As a general rule, you should run all tables critical to your enterprise in fallback mode. You can run other, non-critical tables in non-fallback mode in order to maximize resource usage. Even though RAID disk array technology may provide access to data even when you have not specified fallback, neither RAID1 nor RAID5 provides the same level of protection as fallback does. You specify whether a table is fallback or not using the CREATE TABLE (or ALTER TABLE) statement. The default is not to create tables with fallback.
AMP Clusters A cluster is a group comprising of from 2-16 AMPs that provide fallback capability for each other. A copy of each row is stored on a separate AMP in the same cluster. In a large system, you would probably create many AMP clusters. However, whether large or small, the concept of a cluster exists even if all the AMPs are in one cluster.
One-Cluster Configuration Pictures best explain AMP clustering. The following figure illustrates a situation in which fallback is present with one cluster, which is essentially an unclustered system.
AMP1
AMP2
AMP3
AMP4
Primary copy area
1,9,17
2,10,18
3,11,19
4,12,20
Fallback copy area
21,22,15
1,23,8
9,2,16
17,10,3
AMP5
AMP6
AMP7
AMP8
Primary copy area
5,13,21
6,14,22
7,15,23
8,16,24
Fallback copy area
18,11,4
19,12,24
20,5,6
13,14,7 FG10A001
Note that the fallback copy of any row is always located on an AMP different from the AMP which holds the primary copy. This is an entry-level fault tolerance strategy. In this example which shows only a few rows, the data on AMP3 is fallback protected on AMPs 4, 5, and 6. However, in practice, some of the data on AMP3 would be fallback protected on each of the other AMPs in the system. The system becomes unavailable if two AMPs in a cluster go down.
14 – 4
Introduction to Teradata Warehouse
Software Fault Tolerance
Smaller Cluster Configuration The following figure illustrates smaller clusters. Decreasing cluster size reduces the likelihood that two AMP failures will occur in the same cluster. The illustration shows the same 8-AMP configuration now partitioned into 2 AMP clusters of 4 AMPs each.
AMP1
AMP2
AMP3
AMP4
Primary copy area
1,9,17
2,10,18
3,11,19
4,12,20
Fallback copy area
2,3,4
1,11,12
9,10,20
17,18,19
Cluster A AMP5
Cluster B AMP6
AMP7
AMP8
Primary copy area
5,13,21
6,14,22
7,15,23
8,16,24
Fallback copy area
6,7,8
5,15,16
13,14,24
21,22,23 FG10A002
Compare this clustered configuration with the earlier illustration of an unclustered AMP configuration. In the example, the (primary) data on AMP3 is backed up on AMPs 1, 2, and 4 and the data on AMP6 is backed up on AMPs 5, 7, and 8. If AMPs 3 and 6 fail at the same time, the system continues to function normally. Only if two failures occur within the same cluster does the system halt. Performance is the primary factor that determines cluster size. While 2-AMP clusters provide maximum protection against system loss, because the likelihood of both AMPs in a cluster going down simultaneously is very small, this configuration also suffers from a higher workload per AMP in the event of a failure. Typically, a cluster size is four to eight AMPs. For most applications, a cluster size of four provides a good balance between data availability and system performance.
Introduction to Teradata Warehouse
14 – 5
Software Fault Tolerance
Journaling The Teradata Database supports tables which are devoted to journaling. A journal is a record of some kind of activity. The Teradata Database supports several kinds of journaling. The system does some journaling on its own, while you can specify whether to perform other journaling. The following table explains the different journals capabilities of the Teradata Database: This type of journal…
Down AMP recovery
Transient
Permanent
14 – 6
Does the following…
And Occurs …
•
Is active during an AMP failure only
always.
•
Journals fallback tables only
•
Is used to recover the AMP after the AMP is repaired, then is discarded
•
Logs BEFORE images for all transactions
•
Is used by system to roll back failed transactions aborted either by the user or by the system
•
Captures:
–
Begin/End Transaction indicators
–
"Before" row images for UPDATE and DELETE statements
–
Row IDs for INSERT statements
–
Control records for CREATE and DROP statements
•
Keeps each image on the same AMP as the row it describes
•
Discards images when the transaction or rollback completes
•
Is active continuously
•
Is available for tables or databases
•
Can contain "before" images, which permit rollback, or after images, which permit rollforward, or both before and after images
•
Provides rollforward recovery
•
Provides rollback recovery
•
Provides full recovery of nonfallback tables
•
Reduces need for frequent, full-table archives
Introduction to Teradata Warehouse
always.
as specified by the user
Software Fault Tolerance
Teradata Archive/Recovery The Teradata Archive/Recovery utility backs up and restores data for channel-attached and network-attached clients: If you want to…
Then…
archive data
copy all or selected: •
Tables
•
Databases
•
Data Dictionary tables
Note: If your system is used only for decision support and is updated regularly with data loads, you may not want to archive the data. restore data
copy an archive from the client or server back to the database, and restore data to all AMPs, to clusters of AMPs, or to a specific AMP (as long as the Data Dictionary contains the definitions of the table or database you want to restore). Note: If the table does not have a definition in the Data Dictionary because of a DROP or RENAME statement, you can still restore data using the COPY statement.
Similar restore and recovery capabilities are available for systems running the Microsoft Windows 2000 operating system using the Windows NetVault and NetBackup. For more information, see “Open Teradata Backup” on page 16-2. Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.
Table Rebuild Utility Use the Table Rebuild utility to recreate a table, database, or entire disk on a single AMP under the following conditions: • •
The table structure or data is damaged because of a software problem, head crash, power failure, or other malfunction. The affected tables are enabled for fallback protection.
Table rebuild can create all of the following on an AMP-by-AMP basis: • • • •
Primary or fallback portions of a table An entire table (both primary and fallback portions) All tables in a database All tables on an individual AMP
The Table Rebuild utility can also remove inconsistencies in stored procedure tables in a database. An NCR System Engineer, Field Engineer, or System Support Representative usually runs the Table Rebuild utility.
Introduction to Teradata Warehouse
14 – 7
Hardware Fault Tolerance
Hardware Fault Tolerance The Teradata Database provides the following facilities for hardware fault tolerance: Facility
Description
Multiple BYNETs
Multinode Teradata Database servers are equipped with at least two BYNETs. Interprocessor traffic is never stopped unless both BYNETs fail. Within a BYNET, traffic can often be rerouted around failed components.
RAID disk units
Teradata Database servers use Redundant Arrays of Independent Disks (RAIDs) configured for use as RAID1, RAID5, or RAIDS. Non-array storage cannot use RAID technology. RAID1 arrays offer mirroring, the method of maintaining identical copies of data. RAID5 or RAIDS protects data from single-disk failures with a 25 percent increase in disk storage to provide parity. RAID1 provides better performance and data protections than RAID5/RAIDS, but is more expensive.
Multiple-channel and -network connections
In a client-server environment, multiple channel connections between mainframe and network-based clients ensure that most processing continues even if one or several connections between the clients and server are not working. Vproc migration is a software feature supporting this hardware issue.
Isolation from client hardware defects
In a client-server environment, a server is isolated from many client hardware defects and can continue processing in spite of such defects.
Battery backup
All cabinets have battery backup in case of building power failures.
Power supplies and fans
Each cabinet in a configuration has redundant power supplies and fans to ensure fail-safe operation.
14 – 8
Introduction to Teradata Warehouse
Hardware Fault Tolerance Facility
Description
Hot swap capability for node components
Cliques
The Teradata Database can allow some components to be removed and replaced while the system is running. This process is known as hot swap. Teradata Database offers hot swap capability for the following: •
Disks within RAID arrays
•
Fans
•
Power supplies
A clique is a group of nodes sharing access to the same disk arrays. The nodes and disks are interconnected through shared SCSI buses and each node can communicate directly to all disks. This architecture provides and balances data availability in the case of a node failure. A clique supports the migration of vprocs following a node failure. If a node in a clique fails, then its vprocs migrate to another node in the clique and continue to operate while recovery occurs on their home node. Migration minimizes the performance impact on the system. PEs for channel-attached hardware cannot migrate, because they depend on the hardware that is physically attached to the assigned node. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMP vprocs. To ensure maximum fault tolerance, no more than one node in a clique is placed in the same cabinet. Usually the battery backup feature makes this precaution unnecessary, but if you want maximum fault tolerance, then plan your cliques so the nodes are never in the same cabinet.
Introduction to Teradata Warehouse
14 – 9
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books:
14 – 10
IF you want to learn more about…
THEN see…
Physical database design
Database Design
Restore/Recovery utilities
Teradata Archive/Recovery Utility Reference
Table Rebuild utility
Utilities
Introduction to Teradata Warehouse
Section Contents
Section 4:
Management and Monitoring
Introduction to Teradata Warehouse
Section Contents
Introduction to Teradata Warehouse
Chapter 15:
Concurrency Control and Transaction Recovery This chapter describes the concurrency control in relational database management systems and how to use transaction journaling (permanent journaling) to recover lost data or restore an inconsistent database to a consistent state. The initial sections of this chapter deal with the concepts of transactions and locks. The latter sections describe the closely related topics of concurrency control and recovery. Topics include: • • • • •
Concurrency control Recovery Transactions Locks System and media recovery
Introduction to Teradata Warehouse
15 – 1
What is Concurrency Control
What is Concurrency Control Concurrency control involves preventing concurrently running processes from improperly inserting, deleting, or updating the same data. A system maintains concurrency control through two mechanisms: • •
Transactions Locks
The concepts of transactions and locks are discussed in subsequent sections.
15 – 2
Introduction to Teradata Warehouse
What is Recovery
What is Recovery Recovery is a process by which an inconsistent database is brought back to a consistent state. Transactions play the critical role in this process because they are used to “play back” a series of updates (using the term in its most general sense) to the database, either taking it back to some earlier state or bringing it forward to a current state.
Introduction to Teradata Warehouse
15 – 3
Concept of a Transaction
Concept of a Transaction This section describes the concept of a transaction. Transactions are a mandatory facility for maintaining the integrity of a database while running multiple, concurrent operations.
Definition of a Transaction A transaction is a logical unit of work and the unit of recovery. The statements nested within a transaction must either all happen or none happen. Transactions are atomic: a partial transaction cannot exist.
Definition of Serializability A set of transactions is serializable if the set produces the same result as some arbitrary serial execution of those same transactions for arbitrary input. A set of transactions is correct only if it is serializable. Use of a Two-Phase Locking (2PL) protocol may serialize transactions. The two phases are the growing phase and the shrinking phase: In the…
A transaction must…
growing phase
first acquire a lock on an object before operating on it.
shrinking phase
never acquire any more locks after it has released a lock. Lock release is an all-or-none operation.
For more information on the 2PL protocol, see “Two-Phase Commit Protocol” on page 15-14.
Transaction Semantics The Teradata Database supports both ANSI transaction semantics and Teradata transaction semantics. A system parameter specifies the default transaction mode for a site. However, you can override the default for a session. The Teradata Database returns an error when a transaction operating in Teradata semantics mode issues a COMMIT statement. The Teradata Database supports the ANSI COMMIT statement in ANSI transaction mode.
15 – 4
Introduction to Teradata Warehouse
ANSI Mode Transactions
ANSI Mode Transactions All ANSI transactions are implicit. Either of the following events opens an ANSI transaction: • •
Execution of the first SQL statement in a session Execution of the first statement following the close of a previous transaction
Transactions close when the application performs a COMMIT, ROLLBACK, or ABORT statement. When the transaction contains a DDL statement, including DATABASE and SET SESSION, which are considered DDL statements in this context, the statement must be the last statement.
BEGIN TRANSACTION/END TRANSACTION Statements A session executing under ANSI transaction semantics allows neither the BEGIN TRANSACTION statement, the END TRANSACTION statement, nor the two-phase commit protocol. When an application submits these statements in an ANSI situation, the database software generates an error.
Roll Back an ANSI Transaction In ANSI mode, the system rolls back the entire transaction if the current request: • • •
Results in a deadlock Performs a DDL statement that aborts Executes an explicit ROLLBACK or ABORT statement
Teradata Database accepts the ABORT and ROLLBACK statements in ANSI mode, including conditional forms of those statements. If the system detects an error for either a single or multistatement request, it only rolls back that request, and the transaction remains open, except in special circumstances. Application-initiated, asynchronous aborts also cause full- transaction rollback in the ANSI environment.
Introduction to Teradata Warehouse
15 – 5
Teradata Mode Transactions
Teradata Mode Transactions Teradata mode transactions can be either implicit or explicit. Multistatement requests and macros are examples of implicit transactions.
BEGIN TRANSACTION/END TRANSACTION Statements An explicit, or user-generated, transaction is a single set of BEGIN TRANSACTION/END TRANSACTION statements surrounding one or more requests. All other transactions are implicit. Consider the following transaction: BEGIN TRANSACTION; DELETE FROM Employee WHERE Name = ‘Smith T’ ; UPDATE Department SET EmpCount=EmpCount-1 WHERE DeptNo=500; END TRANSACTION;
Roll Back a Teradata Mode Transaction If an error occurs during the processing of either the DELETE or UPDATE statement within the BEGIN TRANSACTION and END TRANSACTION statements, the system restores both Employee and Department tables to the states at which they were before the transaction began. If an error occurs during a Teradata transaction, then the system rolls back the entire transaction.
15 – 6
Introduction to Teradata Warehouse
Concept of a Lock
Concept of a Lock A lock is a means of claiming usage rights to some resource. The Teradata Database can lock several different types of resources in several different ways.
Overview of Teradata Database Locking Most locks used on Teradata resources are obtained automatically. Users can override some locks by making certain lock specifications, but the Teradata Database only allows overrides when it can assure data integrity. The data integrity requirement of a request decides the type of lock that the system uses. A request for a locked resource by another user is queued until the process using the resource releases its lock on that resource.
Why Do Database Management Systems Require Locking The lost update anomaly best explains why database management systems, in which multiple processes are accessing the same database, require locks.
Introduction to Teradata Warehouse
15 – 7
Concept of a Lock
The following figure provides an example of this anomaly.
Execution of transaction T1
Execution of transaction T2
Database $500.00
READ Balance
Add $1,000.00
$500.00
$500.00
READ Balance
$1,500.00
$2,500.00
Add $2,000.00
$1,500.00 WRITE result to database $2,500.00
WRITE result to database
FG11A001
This example shows a nonserialized set of transactions. If locking had been in effect, the database would not have been able to add $3000.00 to $500.00 and get two different and wrong results. This example demonstrates the most common problem encountered in a transaction processing system without locks. Although several other problems arise when locking is not in effect, the lost update problem sufficiently illustrates the need for locking.
Lock Levels The Teradata lock manager implicitly locks the following objects: Object Locked
15 – 8
Description
Database
Locks rows of all tables in the database
Table
Locks all rows in the table and any index and fallback subtables
Row hash
Locks the primary copy of a row and all rows that share the same hash code within the same table
Introduction to Teradata Warehouse
Concept of a Lock
A user can lock the following resource types in a Teradata Database: • • •
Database Table Row Hash
Levels of Locks Types Users can apply four different types of locking on Teradata Database resources. The following table explains these types: Lock Type
Description
Exclusive
The requester has exclusive rights to the locked resource. No other process can read from, write to, or access the locked resource in any way.
Write
The requester has exclusive rights to the locked resource except for readers not concerned with data consistency.
Read
Several users can hold Read locks on a resource, during which the system permits no modification of that resource. Read locks ensure consistency during read operations such as those that occur during a SELECT statement.
Access
The requestor is willing to accept minor inconsistencies of the data while accessing the database (an approximation is good enough). An access lock permits modifications on the underlying data while the SELECT operation is in progress.
This same information is illustrated in the following table:
Lock Request
Lock Type Held None
Access
Read
Write
Exclusive
Access
Granted
Granted
Granted
Granted
Queued
Read
Granted
Granted
Granted
Queued
Queued
Write
Granted
Granted
Queued
Queued
Queued
Exclusive
Granted
Queued
Queued
Queued
Queued
Introduction to Teradata Warehouse
15 – 9
Concept of a Lock
Automatic Database Lock Levels The Teradata Database applies most of its locks automatically. The following table illustrates how the Teradata Database applies different locks for various types of SQL statements: Locking Level by Access Type Type of SQL Statement
Locking Mode UPI/NUPI/USI
NUSI/Full Table Scan
SELECT
Row Hash
Table
Read
UPDATE
Row Hash
Table
Write
DELETE
Row Hash
Table
Write
INSERT
Row Hash
Not applicable
Write
CREATE DATABASE DROP DATABASE MODIFY DATABASE
Not applicable
Database
Exclusive
CREATE TABLE DROP TABLE ALTER TABLE
Not applicable
Table
Exclusive
Deadlocks and Deadlock Resolution A deadlock occurs when transaction 1 places a lock on resource A, and then needs to lock resource B. But resource B has already been locked by transaction 2, which in turn needs to place a lock on resource A. This state of affairs is called a deadlock or a deadly embrace. To resolve a deadlock, Teradata Database aborts one of the transactions and performs a rollback. If you used BTEQ to submit the transaction, the database reports the deadlock abort to BTEQ. BTEQ resubmits only the statement that caused the error, not the complete transaction. This behavior can result in partially committed transactions. Therefore, you must take care when writing the BTEQ script to ensure that the transaction is one statement. To illustrate, in BTEQ, a statement ends with a semicolon (;) as the last non-blank character in the line. BTEQ sees the following example as two statements: sel * from x; sel * from y;
However, if you write the same statements as shown in this example, BTEQ sees them as only one statement: sel * from x ; sel * from y;
15 – 10
Introduction to Teradata Warehouse
Host Utility Locks
Host Utility Locks The locking operation that the client-resident Teradata Archive/Recovery utility uses is different from the locking operation that the Teradata Database performs. The Teradata Database documentation and utilities frequently refer to archive locks as HUT (Host Utility) locks.
HUT Lock Types Teradata Database places HUT locks as follows: Lock Type
Object Locked
Read
Any object being dumped
Group Read
Rows of a table being dumped if and only if the table is defined for an after-image permanent journal and if you select the appropriate option on the DUMP command
Write
Permanent journal table being restored
Write
All tables in a ROLLFORWARD or ROLLBACKWARD during recovery operations
Write
Journal table being deleted
Exclusive
Any object being restored
HUT Lock Characteristics HUT locks have the following characteristics: • • • • •
•
Associated with the currently logged-on user who entered the statement rather than with a job or transaction Placed only on objects on the AMPs that are participating in a utility operation Placed at the cluster level during a CLUSTER dump Never conflict with a utility lock at another level that was placed on the same object for the same user Remain active until they are released either by the RELEASE LOCK option of the utility command or by the execution of a Teradata SQL RELEASE LOCK statement after a utility operation completes Automatically reinstated following a Teradata Database restart if they had not been released
Introduction to Teradata Warehouse
15 – 11
System and Media Recovery
System and Media Recovery This section describes the conditions under which the Teradata Database performs: • • •
An unscheduled restart A transaction recovery Down AMP recovery
System Restarts Unscheduled restarts occur for one of the following reasons: • • •
AMP or disk failure Software failure Parity error
Failures and errors affect all software recovery in the same way. Hardware failures take the affected component offline and it remains offline until repaired or replaced.
Transaction Recovery Two types of automatic transaction recovery can occur: • •
Single transaction recovery Database recovery
The following table details what happens when the two automatic recovery mechanisms take place: This recovery type…
Happens when the Teradata Database…
single transaction
aborts a single transaction because of: • Transaction deadlock • User error • User-initiated abort command • An inconsistent data table Single transaction recovery uses the transient journal to effect its data restoration.
database
performs a restart for one of the following reasons: • Hardware failure • Software failure • User command
15 – 12
Introduction to Teradata Warehouse
System and Media Recovery
Down AMP Recovery When an AMP fails to come online during system recovery, the Teradata Database continues to process requests using fallback data. When the down AMP comes back online, down AMP recovery procedures begin to bring the data for the AMP up to date as follows: IF there are…
THEN the AMP recovers…
a large number of rows to be processed
offline.
only a few rows to be processed
online.
After all updates are made, we consider the AMP to be fully recovered.
Introduction to Teradata Warehouse
15 – 13
Two-Phase Commit Protocol
Two-Phase Commit Protocol Two-phase commit (2PC) is a protocol for assuring concurrency of data in multiple databases in which each participant votes to either commit or abort the changes. The participants wait before committing the change until they know that all participants can commit. By voting to commit, the participant guarantees that it can either commit or roll back its part of the transaction, even if it crashes before receiving the result of the vote. The 2PC protocol allows the development of (Customer Information Control System (CICS) and Information Management System (IMS) applications that can update one or more Teradata Database databases and/or databases under some other DBMS in a synchronized manner. The result is that all updates requested in a defined unit of work will either succeed or fail.
Definition of Participant A participant is a database manager that performs some work on behalf of the transaction, and that commits or aborts changes to the database. A participant can also be a coordinator of participants at a lower level. In such cases, the coordinator/participant relays a vote request to its participants, and sends its vote to the coordinator only after determining the outcome of its participants. Any number of participants can engage in a two-phase commit operation. A participant is defined as being in doubt from the time it votes to commit or abort until the time it receives a commit or abort instruction from the coordinator, which is the controlling database manager with respect to the distributed transaction. A transaction is in doubt if any of the participants are in doubt.
Definition of Coordinator The coordinator is never in doubt. Selection of the coordinator is arbitrary. However, with respect to the Teradata Database, it is always either IMS or CICS. There can be only one coordinator per transaction at any given time.
15 – 14
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
Specifying transactions in an embedded SQL program
SQL Reference: Stored Procedures and Embedded SQL Teradata Preprocessor2 for Embedded SQL Programmer Guide
Transaction processing in general
SQL Reference: Statement and Transaction Processing
Two-phase commit
Teradata Director Program Reference IBM CICS Interface for Teradata Reference IBM IMS/DC Interface for Teradata Reference
Introduction to Teradata Warehouse
15 – 15
For More Information
15 – 16
Introduction to Teradata Warehouse
Chapter 16:
Database Management and Analysis Tools An important part of the overall design of Teradata Warehouse is the means to manage the hardware and software that make up the system. Teradata offers a wide variety of utilities, management and analysis tools, and peripherals. Some of these tools are resident on the Teradata Database and others are available in Teradata Tools and Utilities, a management suite available for installation in client environments. With these, you can backup and restore important data, save dumps, and investigate and control the Teradata Database configuration, user sessions, and various aspects of its operation and performance. This chapter describes the management and analysis tools that you can use to keep the database running at optimum performance levels. These tools fall into the several basic categories. Topic include: • • • • •
Data archiving Data load and export utilities Database management tools Query analysis tools Query facilities
Introduction to Teradata Warehouse
16 – 1
Teradata Tools and Utilities - Archive Utilities
Teradata Tools and Utilities - Archive Utilities Storing data for future retrieval is an important part of system administration. Teradata offers the following archive and recovery utilities: • •
Teradata Archive/Recovery (for channel-attached and network attached systems) Open Teradata Backup products for Microsoft Windows systems including: • NetBackup (network-attached systems) • NetVault (network-attached systems)
Teradata Archive/Recovery Utility The Teradata Archive/Recovery utility (ARC) supports archiving and restoring Teradata Database databases, individual tables, or permanent journals to any of the following media: • •
Client tape Client file
ARC also includes recovery with rollback and rollforward functions for data tables defined with a journal option. For more information about rollback and rollforward, see Chapter 15: “Concurrency Control and Transaction Recovery.”
Open Teradata Backup Open Teradata Backup (OTB) supports open architecture products that provide backup and restore functions for Microsoft Windows clients. The following products are available: •
•
NetVault The NetVault Teradata Module is a backup system that allows you to graphically select databases and tables and specify the kinds of backups (distributed, online, and so forth) you want to perform. NetBackup NetBackup for Teradata supports parallel backups and restores coordinated across multiple hosts connected to a single Teradata Database. The full functional capabilities of the NetBackup server and the multiple media servers are realized in this product. In addition, NetBackup uses an Administrative Host, which contains a Graphical User Interface (GUI) to provide object browsing and selection, automatic script generation and centralized job monitoring. Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.
16 – 2
Introduction to Teradata Warehouse
Teradata Tools and Utilities - Data Load and Export Utilities
Teradata Tools and Utilities - Data Load and Export Utilities Data load utilities are usually designed following one of two design philosophies: Utilities operate either…
For example…
And are typically used…
as fast as possible, with little regard for the impact on system users
Teradata MultiLoad
in a decision support environment where transactions for the day are loaded during a nightly batch window when there are few interactive users.
or, in the background and limit the impact on interactive users
Teradata TPump
Teradata FastLoad
to process a continuous feed of near-realtime updates while interactive users require rapid responses.
Teradata MultiLoad The Teradata MultiLoad utility supports bulk INSERTs, UPDATEs, and DELETEs against initially unpopulated or populated tables. Both the client and server environments support Teradata MultiLoad. Teradata MultiLoad can: • • • •
Run against multiple tables Perform block transfers with multi-session parallelism Load data from multiple input source files Pack multiple SQL statements and associated data into a request
Teradata FastLoad The Teradata FastLoad utility loads data in unpopulated tables only. Both the client and server environments support Teradata FastLoad. Teradata FastLoad can: • •
Load data into empty tables Perform block transfers with multi-session parallelism
Introduction to Teradata Warehouse
16 – 3
Teradata Tools and Utilities - Data Load and Export Utilities
Teradata Parallel Data Pump The Teradata Parallel Data Pump (TPump) utility uses standard SQL/DML (not block transfers) to maintain data in tables. It also contains a method whereby you can specify the percentage of system resources to be used for the operations on tables. This allows background maintenance for INSERT, DELETE, and UPDATE operations to take place at any time of day while the Teradata Database is in use. Teradata TPump can: • •
Maintain up to 60 tables at a time Support the same restart, portability, and scalability features as Teradata MultiLoad
Teradata FastExport Utility To export data, Teradata Tools and Utilities provides the Teradata FastExport utility. The Teradata FastExport utility exports data in parallel. The utility exports large quantities of data from the Teradata Database to a client and is the functional complement of the FastLoad and MultiLoad utilities. Teradata FastExport can: • •
16 – 4
Export tables to client files Perform block transfers with multi-session parallelism
Introduction to Teradata Warehouse
Database Management Tools
Database Management Tools Teradata provides tools for investigating and managing active sessions and configurations. The tools are discussed in the following sections.
Teradata Database - Active Session and Configuration Database management tools include utilities for investigating active sessions and the state of the Teradata Database configuration, such as: • • •
Query Session Query Configuration Gateway Global utility
The following table contains information about the capabilities of each utility: This utility…
Does the following…
Query Session also known as Sessions States
provides information about active Teradata Database sessions. monitors the state of all or selected sessions on selected logical host IDs attached to the Teradata Database. provides information about the state of each session including session details for Teradata Index Wizard. For more information about Teradata Index Wizard, see “Teradata Tools and Utilities - Teradata Index Wizard” on page 16-13.
Query Configuration
Introduction to Teradata Warehouse
provides reports on the current Teradata Database configuration, including: •
Node
•
AMP
•
PE identification and status.
16 – 5
Database Management Tools This utility…
Does the following…
Gateway Global
allows you to monitor and control the sessions of Teradata Database networkconnected users. The gateway software runs as a separate operating system task and is the interface between the network and the Teradata Database. supports up to 1200 sessions per gateway, depending on available system resources and the number of allotted PEs. Note: At least one PE that can support up to 120 sessions is required for each logical network attachment. allows client programs that communicate through the gateway to the Teradata Database to be installed and running on either: •
The Teradata Database server,
•
Or, network-attached workstations
Client programs that run on a channelattached host bypass the gateway completely.
16 – 6
Introduction to Teradata Warehouse
System Resource Management
System Resource Management Other tools allow you to monitor and manage system resources, such as: • • • • •
Ferret utility Priority Scheduler Teradata Statistics Wizard Teradata Dynamic Query Manager Teradata MultiTool
The utilities and tools are discussed in the following sections.
Teradata Database - Ferret Utility The Ferret utility is a tool that you can use to set various disk space utilization attributes associated with the Teradata Database while maintaining the integrity of the data managed by the Teradata Database file system. After you have selected the attributes and functions, Ferret dynamically reconfigures the data on the disks to correspond with the selections. Depending on the functions, Ferret can operate at the vproc, table, subtable, disk, or cylinder level.
Teradata Database - Priority Scheduler Utility The Priority Scheduler is a resource management tool that oversees the dispersal of system resources based on a blueprint that you construct to satisfy your site-specific requirements. The Priority Scheduler is active in all Teradata Database systems. The Teradata Database itself automatically moves internal jobs into different priority levels, especially when a quick boost to one activity is critical to overall throughput. Priority Scheduler does the following: • •
Keeps resource usage in your data warehouse balanced around your specific needs Offers flexibility for prioritizing users differently and specifying scheduling options
The Priority Scheduler controls the allocation and consumption of the computer resources available to the Teradata Database on the following: • •
A session-related priority designation The system-level priority strategy that you define
Although the default state of Priority Scheduler assigns the same priority to the jobs of all users, you can take advantage of the capabilities of the Priority Scheduler by doing the following: •
Assigning different priorities to different types of jobs
Introduction to Teradata Warehouse
16 – 7
System Resource Management
•
Assigning jobs of favored users more CPU and faster I/O than the lowerpriority jobs
The Priority Scheduler Administrator available in Teradata Manager enhances the usability of the Priority Scheduler by providing a graphical interface for configuration, management, and monitoring. For information, see “Teradata Manager” on page 19-2.
Teradata Tools and Utilities - Teradata Statistics Wizard The Teradata Statistics Wizard is a graphical tool that was developed to improve the performance of queries and the entire database. The Statistics Wizard automates the process of collecting statistics for a particular workload or selecting arbitrary indexes or columns for collection or re-collection. Additionally, the Statistics Wizard permits you to validate the proposed statistics on a production system. The validation capability enables the you to verify the performance of the proposed statistics before applying the recommendations. The following table contains information about the capabilities of Teradata Statistics Wizard: You can…
For…
select a workload
analysis and receive recommendations based on the results
select a database or select several tables, indexes, or columns
analysis and receive recommendations based on the results.
defer
the schedule for the collection or recollection of statistics.
display and modify statistics
a column or index.
receive recommendations
analysis that are based on table demographics and general heuristics.
As changes are made within a database, the Statistics Wizard identifies those changes and recommends which tables should have statistics collected, based on age of data and table growth, and the columns/indexes that would benefit from having statistics defined and collected for a specific workload. The administrator is then given the opportunity to accept or reject the recommendations.
16 – 8
Introduction to Teradata Warehouse
System Resource Management
Teradata Database - Teradata Dynamic Query Manager The Teradata Dynamic Query Manager (DQM) is an application that lets you manage access to and use of the Teradata Database resources. Managed access allows you to use the database efficiently and manipulate workload capacity. The functions for rules processing and SQL validation are integrated in Teradata Database. To manage queries effectively, Teradata DQM has capabilities that support effective query management: • •
Query Management Request Scheduling
The following table provides information about these capabilities: Teradata DQM provides…
That…
Query Management functions
Examine login and query requests. Check the users who issue requests, the accounts they use to log in, the performance groups they are associated with, and the objects referenced in the requests against criteria that you have previously defined. Rejects or delays those requests that fail to meet the defined criteria.
Request Scheduling Tools
Can be use to schedule single- or multistatement query requests for execution at a later time. The Scheduled Request (SR) function comprises both client and Teradata Database server components. The SR client components submit and monitor scheduled requests, and the SR server piece checks, saves, and executes the requests.
Introduction to Teradata Warehouse
16 – 9
System Resource Management
The following table provides information about the restrictions you can create in Teradata DQM: You can create restrictions based on…
Such as…
names
account names. user and group logon IDs. database names. database object, such as:
resources involved in a query
•
Tables
•
Views
•
Macros
•
Stored procedures
processing time. number of rows returned. joins or full-scans.
date and time
N/A.
SQL queries entering the system, regardless of the source of the request, can be blocked, including queries received from: • • • •
Basic Teradata Query (BTEQ) Call Level Interface (CLI) Open Database Connectivity (ODBC) Java Database Connectivity (JDBC)
You can enable or disable query management as desired. When enabled, all login and query requests, regardless of their origin, are managed by Teradata DQM.
16 – 10
Introduction to Teradata Warehouse
Teradata Database - Teradata MultiTool
Teradata Database - Teradata MultiTool Teradata MultiTool is a Teradata Database utility that offers a graphical user interface (GUI) on Windows systems that Teradata administrators and support personnel can use as an interface to command-line-based Teradata and PDE tasks. You can start specific utilities using the options available in the GUI. The following table lists the tools accessible from Teradata MultiTool: The tool …
Is used to …
Control GDO Editor (CTL)
display and modify the fields of the PDE GDO (Globally Distributed Object).
Database Window (DBW)
activate the Supervisor window and subwindows.
Database Initialization Program (DIP)
execute one or more of the standard Database Initialization Program Structured Query Language (SQL) scripts packaged with the database.
Vproc Manager
perform the following functions: • Obtain the status of vprocs • Change vproc states • Initialize and boot a specific vproc • Initialize the vdisk associated with a specific vproc • Force a database restart
Introduction to Teradata Warehouse
16 – 11
Database Query Analysis Tools
Database Query Analysis Tools The Teradata Database Query Analysis Tools (DBQAT) are designed to improve the overall performance analysis capabilities of the Teradata Database. Teradata Tools and Utilities and Teradata Database tools are described in the following sections. Tools in the following list are discussed in more detail in the following sections: • • • • • • •
16 – 12
Teradata Index Wizard The Query Capture Facility Teradata Visual Explain Database Query Log Target Level Emulation on the Server Teradata System Emulation Tool on the Client Database Object Use Count
Introduction to Teradata Warehouse
Teradata Tools and Utilities - Teradata Index Wizard
Teradata Tools and Utilities - Teradata Index Wizard The Teradata Index Wizard is a tool that interfaces with the Teradata Database. This utility analyzes various SQL query workloads and suggests candidate indexes to enhance the performance of those queries in the context of the defined workloads. The workload definitions, supporting statistical and demographic data, and index recommendations are stored in various Query Capture Database (QCD) tables.
What Can the Teradata Index Wizard Do Using data from a QCD or the Database Query Log (DBQL), the wizard: •
• •
•
•
Recommends secondary indexes for the tables based on workload details, including data demographics, that are captured using the Query Capture Facility (QCF) Allows you to validate index recommendations before implementing the new indexes Allows you to perform what-if analysis on the workload. The Teradata Index Wizard allows you to determine whether your recommendations actually improve query performance Interfaces with other Teradata Tools and Utilities, such as Teradata System Emulation Tool (TSET) to perform offline query analysis by importing the workload of a production system to a test system Uses the Teradata Visual Explain and Compare (VEComp) tool to provide a comparison of the query plans with and without the index recommendations
Teradata Index Wizard can be started from Teradata Visual Explain, Teradata System Emulator Tool, Teradata Statistics Wizard, and Teradata Manager. Index Wizard can also open these applications (except Teradata Manager) to help in your evaluation of recommended indexes.
Introduction to Teradata Warehouse
16 – 13
Teradata Tools and Utilities - Teradata Index Wizard
Demographics The Teradata Index Wizard needs demographic information to perform index analysis and to make recommendations. You can collect the following types of data demographics using SQL: •
•
16 – 14
Query demographics Use the INSERT EXPLAIN statement with the WITH STATISTICS and DEMOGRAPHICS clauses to collect table cardinality and column statistics. Table demographics Use the COLLECT DEMOGRAPHICS statement to collect the row count and the average row size in each of the subtables in each AMP on the system.
Introduction to Teradata Warehouse
Teradata Database - Query Capture Facility
Teradata Database - Query Capture Facility The Query Capture Facility (QCF) is available on the Teradata Database. The QCF captures the data pertaining to an execution plan and stores the data in a set of relational tables in a QCD. Applications of QCF and QCD: • •
• •
Provide the foundation for the Teradata Index Wizard utility. Can store all query plans for customer queries. You can then compare and contrast queries as a function of software release, hardware platform, and hardware configuration. Provide the foundation for the Visual EXPLAIN tool, which displays EXPLAIN output graphically. Provide data so that you can generate your own detailed analyses of captured query steps using standard SQL DML statements and third party query management tools.
You can execute the COLLECT, DROP, and HELP STATISTICS SQL statements against a QCD.
QCD Schema Improvement QCD schema is designed to: • • •
Minimize the number of tables required by capturing information in a generic fashion Promote usability Improve the overall performance of data storage and retrieval
Teradata Index Wizard Support A QCD is the central repository for the information used in the analyses performed by the Teradata Index Wizard. A QCD supports the Teradata Index Wizard by capturing and storing the data demographics and index wizardrelated information that you specify. The workload definitions, supporting statistical and demographic data, and index recommendations are stored in various QCD tables. The Teradata Index Wizard analyzes various SQL query workloads and suggests candidate indexes to enhance the performance of those queries in the context of the defined workloads.
Introduction to Teradata Warehouse
16 – 15
Teradata Tools and Utilities - Teradata Visual Explain
Teradata Tools and Utilities - Teradata Visual Explain Teradata Visual Explain is a tool that visually depicts the execution plan of complex SQL statements in a simplified manner. When you specify the EXPLAIN modifier in the SQL statement, Teradata Visual Explain presents a graphical view of the statement broken down into discrete steps showing the flow of data during execution. Because comparing optimized queries is easier with Teradata Visual Explain, application developers and database administrators can fine-tune the SQL statements so that the Teradata Database can access data in the most effective manner. In order to view an execution plan using Teradata Visual Explain, the execution plan information must first be captured into the QCD using the Query Capture Facility (QCF using the following commands: • • •
INSERT EXPLAIN DUMP EXPLAIN
Teradata Visual Explain reads the execution plan, which has been stored in a QCD, and turns it into a series of icons.
16 – 16
Introduction to Teradata Warehouse
Teradata Database - Database Query Log
Teradata Database - Database Query Log The Database Query Log (DBQL) is a Teradata Database tool that provides a series of predefined tables that can store, based on rules you specify, historical records of queries and their duration, performance, and target activity. DBQL is flexible enough to log information on the variety of SQL requests that run on the Teradata Database, from short transactions to longer-running analysis and mining queries. After implementing DBQL, you use simple SQL statements to control the start, extent, and duration of the logging activity. You can define rules, for instance, that log the first 4000 SQL characters of any query that runs during a session invoked by a specific user under a specific account, if the time to complete that query exceeds the specified time threshold. You can request that DBQL log particular query information or just a count of qualified queries. You can specify that the recording criteria be a mix of: • •
•
Users and accounts Elapsed time, where time can be expressed as: • A series of intervals • A threshold limit Processing detail, including any or all: • Objects • Steps • SQL text
In addition to the query-related data, DBQL stores the following information to help identify the query: • •
User name Session number and account information
DBQL data also can be input to Target Level Emulation, and Teradata Tools and Utilities, including Teradata Manager, and VEComp. Teradata Tools and Utilities aid in analysis and present the information in a graphic form that is easily manipulated and understood.
Introduction to Teradata Warehouse
16 – 17
Teradata Database - Target-Level Emulation
Teradata Database - Target-Level Emulation Teradata Database supports Target-Level Emulation both on the Teradata Database server and in the client as follows: Teradata supports…
On the…
Target-Level Emulation (TLE)
Teradata Database server.
Teradata System Emulation Tool
client.
The Teradata Database provides the infrastructure for Target-Level Emulation (TLE). You can use the standard SQL interface to capture the system configuration details and table demographics on one system and store them on another. Usually the information is obtained from a production system, then stored on a smaller test or development system. With this capability, the optimizer can generate access plans similar to those that are generated on a production system. You can use the plans to in analyze optimizer-related production problems. This information can also be used by the Teradata System Emulation Tool.
16 – 18
Introduction to Teradata Warehouse
Teradata Tools and Utilities - Teradata System Emulation Tool
Teradata Tools and Utilities - Teradata System Emulation Tool When TLE information is stored on a test system, Teradata System Emulation Tool (TSET,) a Teradata Tools and Utilities tool, allows you to examine the query plans generated by the test system optimizer as if the plans were processed on the production system. Using TSET you can: • • •
Change system configuration details and table demographics and model the impact of various changes on SQL statement performance Determine the source of various optimizer-based production problems Provide an environment in which Teradata Index Wizard can produce recommendations for a production system workload
Introduction to Teradata Warehouse
16 – 19
Teradata Database - Database Object Use Count
Teradata Database - Database Object Use Count The database administrator and application developer can use Database Object Use Count to capture the number of times an application refers to an object. Database Object Use Count captures counts for the following: • • • • • • • • •
Database Table Column Index View Macro Teradata stored procedure Trigger User-defined function
Once captured, you can use the information to identify obsolete or unused database objects, particularly those that occupy significant quantities of valuable disk space. Further, the DBU Count information can be useful to database query analysis tools like Teradata Index Wizard.
16 – 20
Introduction to Teradata Warehouse
Query Facilities
Query Facilities A request to the Teradata Database consists of one or more SQL statements, and can span any number of input lines. The Teradata Database can receive and execute statements that are: • • •
Entered interactively, or submitted as a script or a batch job, through the Basic Teradata Query interface Entered using Teradata SQL Assistant Embedded in an application program that is written in a procedural language
Each facility is discussed in the following sections.
Introduction to Teradata Warehouse
16 – 21
Teradata Tools and Utilities - Basic Teradata Query Utility
Teradata Tools and Utilities - Basic Teradata Query Utility The Basic Teradata Query Utility (BTEQ) is an SQL front-end utility that runs on all client platforms. It resides on the client portion of either a channelattached or network-attached system and communicates with one or more Teradata Database systems residing on the server. BTEQ allows you to create and submit SQL queries either interactively or in batch mode from an interactive terminal.
BTEQ Support BTEQ supports the following facilities: • • • • •
Multiple Teradata SQL statements per request Read from and write to client data files Management of multiple sessions per job Sophisticated report format Stored procedure objects in the Teradata Database
BTEQ Communication The client system communicates with the Database as described in the following table:
16 – 22
IF your client system is…
THEN communication occurs over a…
channel attached
high-speed I/O channel.
network attached
Local Area Network (LAN).
Introduction to Teradata Warehouse
Teradata Tools and Utilities - Teradata SQL Assistant
Teradata Tools and Utilities - Teradata SQL Assistant Teradata SQL Assistant is a Teradata Tools and Utilities tool that provides information discovery capabilities on Windows-based systems. Teradata SQL Assistant retrieves data from any ODBC-compliant database server and allows you to manipulate and store the data on your desktop PC. You can then use this data to produce consolidated results or perform analyses on the data using tools such as Microsoft Excel. The following table contains information about key feature of Teradata SQL Assistant: This feature…
Allows you to…
Reports
Create reports from any database that provides an ODBC interface Use an imported file to create many similar reports (query results or answer sets), for example, display the DDL (SQL) that was used to create a list of tables
Data manipulation
Export data from the database to a file on a PC Import data from a PC file directly to the database Create a historical record of the submitted SQL with timings and status information, such as success or failure Use the Database Explorer Tree to easily view database objects
Queries
Use SQL syntax examples to help compose your SQL statements Send queries to any ODBC database or the same query to many different databases Limit data returned to prevent runaway queries
Teradata stored procedures
Introduction to Teradata Warehouse
Use a procedure builder that gives you a list of valid statements for building the logic of a stored procedure, using Teradata syntax
16 – 23
Teradata Tools and Utilities - Teradata SQL Assistant
Teradata SQL Assistant electronically records your SQL activities with data source identification, timings, row counts, and notes. Having this historical data allows you to build a script of the SQL that produced the data. The script is useful for data mining.
16 – 24
Introduction to Teradata Warehouse
Teradata Tools and Utilities - Preprocessor2
Teradata Tools and Utilities - Preprocessor2 The Teradata Tools and Utilities provides a preprocessing facility that lets you include SQL statements in your application programs. The Preprocessor 2 parses application code for SQL statements, converts the statements to Call-Level Interface (CLI) calls, and comments out the SQL statements. After the Preprocessor2 processes the application code, you can submit processed code to your client application language compiler. For more information about embedded SQL, see “Embedded SQL Applications” on page 6-3.
Introduction to Teradata Warehouse
16 – 25
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities Foundation books: IF you want to learn more about…
THEN see…
Archive utilities
Teradata Archive/Recovery Utility Reference
BTEQ
Basic Teradata Query Reference
Database Query Log
Database Administration Data Dictionary Performance Optimization Teradata Visual Explain User Guide
Embedded SQL
Teradata Preprocessor2 for Embedded SQL Programmer Guide SQL Reference: Data Manipulation Statements
General Teradata Database software architecture
Database Design
Load and unload utilities
Teradata FastExport Reference Teradata FastLoad Reference Teradata MultiLoad Reference Teradata Parallel Data Pump Reference
Priority Scheduler
Utilities
Query Capture Database
Database Design Teradata Manager User Guide SQL Reference: Statement and Transaction Processing
Teradata Database management utilities
Utilities
Teradata Dynamic Query Manager
Teradata Dynamic Query Manager Administrator Guide Teradata Dynamic Query Manager User Guide
16 – 26
Teradata Index Wizard
Teradata Index Wizard User Guide
Teradata Manager
Teradata Manager User Guide
Introduction to Teradata Warehouse
For More Information IF you want to learn more about…
THEN see…
Teradata SQL
SQL Reference: Fundamentals Teradata SQL Assistant for Microsoft Windows User Guide
Teradata SQL Assistant
Teradata SQL Assistant for Microsoft Windows User Guide
Teradata System Level Emulation
Database Design SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing Teradata System Emulation Tool User Guide
Teradata Visual Explain
Teradata Visual Explain User Guide
Introduction to Teradata Warehouse
16 – 27
For More Information
16 – 28
Introduction to Teradata Warehouse
Chapter 17:
Security and Integrity This chapter describes security and integrity for the Teradata Database. Topics include: • • • • • •
Security and integrity Resource access control Encryption Password security features SQL used to control logon Security policies and physical access control
The descriptions include both client and server security and Teradata Database user privileges.
Introduction to Teradata Warehouse
17 – 1
Security and Integrity
Security and Integrity Security is the protection of data against unauthorized access. You can secure programs and data by issuing identification numbers and passwords to authorized users of a computer. The operating system can check passwords to prevent users from logging onto the system in the first place, or the system can check passwords in software, such as in a database, where each user is assigned an individual view of the database. Although you can take precautions to detect an unauthorized user, determining if a valid user is performing unauthorized tasks is extremely difficult. Integrity is the process of preventing accidental erasure or corruption of data in a database.
17 – 2
Introduction to Teradata Warehouse
System Integrity
System Integrity The Teradata Database provides support for referential integrity to ensure that every foreign key in a referencing table matches a primary key in a referenced table. Users may also provide their own facilities for monitoring referential integrity in the Teradata Database. For more information about referential integrity, see “Referential Integrity” on page 12-8. You can also write macros and stored procedures that enforce the referential integrity of each table in your system.For more information about macros and stored procedures, see Chapter 11: “Other Database Objects.”
Introduction to Teradata Warehouse
17 – 3
System Security
System Security The four categories of solutions for system security are: Category
Description
Resource access control
Software-enforced access restrictions
Physical access control
Restrictions detailed in a formalized security policy
Encryption
Logon and network data encryption
Security policy
A sound, well-enforced data center security policy
Auditing and accountability
System auditing of security-related user actions
These categories are discussed in the following sections.
17 – 4
Introduction to Teradata Warehouse
Resource Access Control
Resource Access Control This section introduces the Teradata Database software tools you can use to enforce access restrictions. These tools include: • • • • • •
User identifiers (user names) Channel or network (LAN) identifiers (host or client identifiers) Logon policies TDP user security interface Client security Single Sign On
User Identifiers Teradata access control is based on a user identifier. The security administrator can optionally enforce access control based on a channel- or network-client identifier as well. A user name is the name defined in a CREATE USER statement. The security administrator must perform one CREATE USER statement for each authorized user in order to establish the user name, define its password, and allocate user disk space. The DBase table stores user names and database names and resides in the space allocated to a system user named DBC. You can retrieve information about user names from the DBC.DBase table by querying the system view named DBC.Users.
Client Identifiers Any number of different client types can connect to the Teradata Database server. Each connection must have its own unique client identifier. You use the Configuration utility to assign each connection a unique value and define the value to the Teradata Database. Each defined value functions as a client identifier or hostid.
Logon Policies Users must issue a logon request so that the Teradata Database can identify the user and establish a session. The logon string must include a user name that is already established in the system DBase table.
Introduction to Teradata Warehouse
17 – 5
Resource Access Control
The logon string may also include any combination of the following operands: Operand
Definition
tdpid
Each copy of the TDP on a given channel-attached client is assigned a unique tdpid to identify it. The tdpid is a client-based operand and is not transmitted to the Teradata Database.
password
A password authenticates a user request to initiate a Teradata session under the supplied user name. To create a password: The security administrator can use the CREATE USER statement to establish a password for a user. The default is that the password must appear in the user logon string. To logon without a password: If you enable the security administrator user, the security administrator can issue a GRANT LOGON statement containing the WITH NULL PASSWORD option for the user. On IBM mainframe clients…
On Microsoft Windows 2000 clients…
TDP security user exit TDPLGUX must acknowledge that the logon string is valid without a password.
Single Sign On provides the ability to use industry standard network authentication to identify users. For information about this feature, see “Single Sign On” on page 17-7.
Note: Because the null password applies only to logging onto the Teradata Database, all other system security measures continue to be enforced. acctid
The account id can be used for resource accounting. Each user name may have one or more acctids. The logon processor assigns a default value for the acctid if it detects none in the logon string for a user. The acctid can also contain a priority-level prefix that can be used when interactive users are competing for system resources with longrunning batch jobs.
TDP Security IBM mainframe clients running either MVS or VM have the option of enforcing security at the TDP level using tdpids. The TDP provides a user logon exit called TDPLGUX which you can embed in a user-written routine to process logon requests. Using TDPLGUX, you can reject, accept, provide, or modify any logon request to the Teradata Database.
17 – 6
Introduction to Teradata Warehouse
Resource Access Control
TDPLGUX also permits users to set any of the following options: • • •
No logon string (implicit logon) A user id for which the user routine provides a password A user id that can be validated to require no password
You can use TDPGLUX alone or in conjunction with any security package such as: • • •
RACF CA-ACF2 CA-TOP SECRET
Single Sign On The Single Sign On feature allows users of the Teradata Database on Microsoft Windows 2000 systems to access Teradata Database based on their authorized network usernames and passwords. This feature simplifies the logon procedure that requires users to enter an additional username and password when running client applications that access the database. For the Single Sign On feature to work, it must be enabled on the Teradata Database server as well as on the Teradata Gateway. The database administrator can turn Single Sign On OFF or ON for the database using one of the following: • •
Teradata Database Window (DBW) (DBW is the preferred way.) DBS Control utility
To turn the feature ON or OFF on the Teradata Gateway, the database administrator can use the Gateway Global utility. Note: Single Sign On the Teradata Database is not available on NCR UNIX MP-RAS systems. Authentication can be accomplished in several ways. A field in the Gateway Global utility indicates the authentication method the client used to log on to the database. The Authentication field has four values: IF the field contains…
THEN authentication was provided by…
DATABASE
the database. This was the method used before Single Sign On was implemented.
NEGOTIATE
Windows Negotiate.
NTLM
Windows NTLM.
KERBEROS
Windows kerberos.
Introduction to Teradata Warehouse
17 – 7
Resource Access Control
Single Sign On provides the following benefits: • •
•
17 – 8
Enhances site security because authentication mechanisms do not send passwords across the network Supports the use of alternative security mechanisms that automate logon by eliminating the need for an application to declare or store a password on the client system Saves time
Introduction to Teradata Warehouse
Encryption
Encryption Teradata enhances security between the Teradata Database and network-attached clients by implementing encryption. Call-Level Interface version 2 (CLIv2) supports encryption. Other interface products included in Teradata Tools and Utilities, such as ODBC and JDBC type-4, support only logon encryption, which is subset of network data encryption. The encryption feature supports the following: • •
Network data encryption Logon encryption
Network Data Encryption Teradata Tools and Utilities supports network data encryption between client applications and the following: • •
The Teradata Gateway on Microsoft Windows 2000 NCR UNIX MP-RAS systems
A client application can enable or disable network data encryption for the duration of a request by setting the data_encryption flag in dbcarea. When the flag is set to Y, network traffic is encrypted in both directions between the client application and the Teradata Gateway. Clients that do not support encryption on a request-by-request basis can take advantage of network data encryption by enabling encryption on a global rather than a request basis. To accomplish global encryption, the clispb.dat file associated with the client application must have data_encryption=Y.
Logon Encryption and the Teradata Gateway The client application does not enable or disable logon encryption. Encryption is determined by the settings of the Teradata Gateway, which is the target security domain. The database administrator (owner of this security domain) can control encryption using an option in the Gateway Control utility. When operating under default conditions, the Teradata Gateway accepts only encrypted logons and rejects unencrypted ones. For the gateway to accept both encrypted and unencrypted logons, the database administrator must set a Gateway Control option to yes.
Introduction to Teradata Warehouse
17 – 9
Security Features
Security Features You can use a number of attributes to enhance Teradata Database password security.
Password Attributes The following table lists and describes password security attributes: Password Attribute
Description
Expiration
Defines a time span during which the password is valid. After that duration, the user must change the password.
Number of characters, digits, special characters
Restricts the number of characters, digits, or special characters permitted in a password.
Maximum logon attempts
Defines the sequential number of erroneous logon attempts permitted before locking the user from further attempts
Lockout time
Sets the time duration of the user lockout after the user has exceeded the maximum number of erroneous logon attempts. Note: An administrator can do the following:
Reuse
•
Set the lockout duration by specifying a value of up to 32000 minutes (about 23 days)
•
Lock out the user indefinitely
•
End an existing lockout
Defines the time span that must elapse before you can reassign a previously used password.
The DBC.SysSecDefaults table stores password attributes for the Teradata Database. Teradata Database passwords are encrypted and stored in the PasswordString field of the DBC.DBase table.
17 – 10
Introduction to Teradata Warehouse
Security Features
User-Level Password Attributes You can assign the eight password security attributes in a user profile: • • • • • • • •
Password Expiration Password MinChar Password MaxChar Password Digits Password SpecChar MaxLogonAttempts LockedUserExpire Password Reuse
The administrator assigns users to the profile, thus effectively implementing password security at the user level. To learn more about simplifying system administration using capabilities in roles and profiles “Roles and Profiles for Users” on page 18-6.
DBC.DBase Table Teradata stores password information in encrypted form in the DBC.DBase system table. The table contains the date and time a user defined a password, along with the encrypted password. An administrator can modify passwords temporarily when the PasswordLastModDate plus a fixed number has been reached. This allows you to ensure that users change their passwords regularly. Passwords are always encrypted. The PasswordString column of the DBC.DBase table displays encrypted passwords. The password is never decrypted.
Introduction to Teradata Warehouse
17 – 11
SQL Used to Control Logon
SQL Used to Control Logon The Teradata Shared Information Architecture (SIA) allows multiple clients to connect to the Teradata Database simultaneously. By default, the system grants logon permission to all users from all connections. However, the Teradata Database provides tools for restricting logons from specific clients. Use the statements GRANT LOGON and REVOKE LOGON to associate specific user names with specific client (host) ids. You can only grant logons using GRANT LOGON if the user is already created in the Teradata Database and if the client (host) id corresponds to a value assigned to a network- or channel-connection by the Teradata Database. You can retract the privileges granted by a GRANT LOGON statement by using the REVOKE LOGON statement.
Data Access Control The first level of access to the Teradata Database is at the level of the user and the database. This section discusses explicit access rights as controlled by the GRANT and REVOKE statements. These statements grant or remove from a user or group of users one or more privileges on a database, user, table, view, stored procedure, or macro. You must be an owner of the object being controlled, or must have GRANT/REVOKE privileges to the object, before you can submit GRANT or REVOKE statements. If the object is a view, stored procedure, or macro, then the owner must also have the GRANT privilege and any other applicable privileges on the object or objects referenced by the view, stored procedure, or macro. You cannot grant more privileges on an object than you have yourself on that object. When a user explicitly grants privileges to another user or database, certain rules determine whether, how, and on what object the requested privilege is implemented.
Ownership and Implicit Rights As an owner of an object, you have implicit rights on the object. These rights allow you access to the object in certain cases even when you do not have explicit rights for the object.
17 – 12
Introduction to Teradata Warehouse
SQL Used to Control Logon
System Views for Access Information The Teradata Database supplies numerous system views for accessing information in the Data Dictionary. These views provide information about users and access rights and grant, logon, and access activities. For details about views in the Data Dictionary, see “Teradata Database Data Dictionary Views” on page 9-6.
Introduction to Teradata Warehouse
17 – 13
Security Policies and Physical Access Control
Security Policies and Physical Access Control You can use the following methods to ensure the security of physical access to your Teradata Database and the hardware on which it runs.
Principle Considerations of a Security Policy The principle consideration for physical access control is establishing a security policy. The security policy is based on identification of: • •
Security needs Policies and procedures to meet those needs
Key Implementation Elements of a Security Policy The security policy for your Teradata Database should include two essential implementation elements: • •
System-enforced security features Personnel-enforced security features
You should write a set of security policies and procedures to be distributed to all users of the system. Among the topics you should cover in this document are: • • • •
17 – 14
Why security is needed Benefits of the security policy for the users and for the company Suggested security actions for users to follow Required security actions for users to follow
Introduction to Teradata Warehouse
Auditing and Accountability
Auditing and Accountability You can periodically audit events on Teradata Database to detect the following security hazards: • • •
Potential break-ins Attempts to gain unauthorized access to database resources Attempts to alter the behavior of Teradata Database auditing facilities
Teradata Database automatically audits all logon and logoff activity. However, you can specify additional audits of attempts to access data, by configuring the system to log one or any combination of the following parameters: • • •
All access requests made (for all or specific users) All access requests denied (for all or specific users) Specific types of access requests made (for all or specific users)
You can examine or print the audit data during normal system operations, or archive the data to review offline and generate reports. You can use SQL to select data from the audit log during normal operations. If you identify unauthorized or undesirable activity, you take one of the following remedial actions to address the problem: • • • • •
Change the security policy Change compromised passwords Audit intensively all actions of particular users Change access rights Deny the offending users any access to Teradata Database (in extreme cases)
Introduction to Teradata Warehouse
17 – 15
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: If you want to learn more about…
THEN see…
Auditing
Security Administration
C2 level or equivalent security Client (TDP) security
Teradata Director Program Reference
Database Window
Database Window
DBS Control utility
Utilities
Gateway Global utility GRANT logon permissions
Security Administration
Security administration and security System views related to security
Data Dictionary
Tables in the Data Dictionary
17 – 16
Introduction to Teradata Warehouse
Chapter 18:
System Administration This chapter discusses space allocation, roles and profiles, accounting, and maintenance on the Teradata Database as they relate to system administration. Topics include: • • • •
Space allocation for databases and users Roles and profiles for users Accounting Maintenance utilities
Introduction to Teradata Warehouse
18 – 1
Space Allocation for Databases and Users
Space Allocation for Databases and Users Space allocation for the Teradata Database relates not only to the disk space that databases require, but to the space required to define users. In the Teradata Database, a database is a collection of related tables, views, stored procedures, and macros. A database also contains an allotment of space from which users can create and maintain their own tables, views, macros, stored procedures, or other users or databases. A database and a user are almost the same thing in the Teradata Database. The difference is that a user can log on to the system whereas the database cannot. A user identifies someone who can log on to both the system and a database.
Databases and Users When the Teradata Database is first installed on a server, only one user exists on the system, that is user, DBC. The database administrator typically manages this user and assigns space from the user DBC to all other organizations. The user DBC owns all other databases and users in the system. To protect the security of system tables within the Teradata Database, the database administrator typically creates a database administrator user from DBC. The usual procedure is to assign all database disk space that system tables do not require to the new administrator database. The database administrator then uses this database as a resource from which to allocate space to the databases and users of the system.
How to Create a Finance and Administration Database When you create a new database or allocate space to a user, the system assigns disk space from the space belonging to an existing database or user. The creating database (or user) is the owner of the new database (or user space). The owner permanently grants a specified amount of space to the new database or user, which is then subtracted from the total unused space available to the user. Consider the following scenario: the database administrator needs to create a Finance and Administration (F&A) department database with user Jones as a supervisory user, or database administrator within the F&A department. The database administrator first creates the F&A database, then allocates space from it to Jones to act as the F&A database administrator. The database administrator also allocates space from F&A to Jones for his personal use and for creating a Personnel database, as well as other databases, and other user space allocations.
18 – 2
Introduction to Teradata Warehouse
Space Allocation for Databases and Users
The following figure shows the hierarchy of this relationship.
DBC User/ Database
System Administrator User/Database
F&A Database
Personnel Database
User Jones
• • •
Other Department Database
Other Users and Databases for the Department HD08B001
The F&A Database owns Personnel and all the other department databases. F&A also owns User Jones and all other users within the department. Because the user DBC ultimately owns all other databases and users, it is the final owner of all the databases and user space belonging to the organization. This hierarchical ownership structure provides the owner of a database or user space with complete control over the security of owned data. The owner can archive the database or control access to it by granting or revoking privileges on it. For more information on granting and revoking access privileges, see Chapter 17: “Security and Integrity.”
Introduction to Teradata Warehouse
18 – 3
Space Allocation for Databases and Users
How to Create Databases The previous section explains the concept of databases and users in the Teradata Database environment. This section explains the mechanics of how to create a database from DBC. Before you can create tables, views, users, stored procedures, or macros, you must first create a database. Use the SQL statement CREATE DATABASE to create a database. The following example shows the SQL statement used to create the Personnel database from the database, Administration: CREATE DATABASE Personnel FROM Administration AS PERMANENT = 5000000 BYTES, FALLBACK, BEFORE JOURNAL, DUAL AFTER JOURNAL, DEFAULT JOURNAL TABLE = Personnel.FinCopy;
The Personnel database is created from the space available in Administration. The 5000000 value represents bytes of storage. To create a database, the initiator must have CREATE DATABASE privileges on the FROM entry. In this example, the initiator must have CREATE DATABASE privileges on Administration. The new database receives all privileges that have been granted to the initiator. The FALLBACK keyword specifies that a duplicate copy of each table is stored in addition to the original for each table created in the Personnel database. The JOURNAL option specifies that a single copy of the before change image and dual copies of the after change image are maintained for each data table. A duplicate before change image is maintained automatically for any table in this database that uses both the fallback and the journal defaults. The DEFAULT JOURNAL TABLE clause is required because journaling is requested. This clause specifies that a new journal table named “FinCopy” is to be created in the new database.
How to Create Users This section explains the mechanics of how to create a user. The SQL statement for creating a user is CREATE USER. The statement authorizes a new user identification (user name) for the database and specifies a password for user authentication. Because the system creates a database for each user, the CREATE USER statement is very similar to the CREATE DATABASE statement.
18 – 4
Introduction to Teradata Warehouse
Space Allocation for Databases and Users
The following example shows the SQL statement used to create user Jones in the F&A database: CREATE USER Jones FROM F&A AS PERMANENT = 1000000 BYTES, SPOOL = 1000000 BYTES, PASSWORD = Jan, FALLBACK, ACCOUNT = ‘Administration’, STARTUP = ‘DATABASE F&A;’ ;
The optional STARTUP clause specifies one or more Teradata SQL statements that the system can execute automatically when the user establishes a session. Any user who performs this statement must have a CREATE USER privilege on the owner database or be its owner. The system automatically grants the new user all privileges on tables, views, and macros created in this space. The new user gets only DROP PROCEDURE privilege on the stored procedure objects created in this space. Note: In the Microsoft Windows environment, the Single Sign On feature negates the need for users to enter user names, passwords, and account ids. For more information about this feature see “SQL Used to Control Logon” on page 17-12.
Introduction to Teradata Warehouse
18 – 5
Roles and Profiles for Users
Roles and Profiles for Users The task of system administration can be simplified using the features provided by roles and profiles. You may think of a role as pseudo-user with privileges on a number of database objects. A profile can be viewed as a container that holds a set of parameters, such as database, spool space, temporary space, and accounts, to which the system administrator assigns certain values. After creating roles and profiles, the system administrator assigns them to users. Roles and profiles simplify system administration by: Using…
Simplifies administration because …
roles to automatically grant rights to database objects to all users assigned to the role
when users change jobs within their organizations, changing roles is far easier than deleting old rights and granting new rights that go along with their new jobs.
profiles to efficiently change the parameter values associated with users
you change a parameter value once in the profile instead of updating the value for each user.
Teradata allows you to make all roles available to a user by doing one of the following: • •
By submitting a SET ROLE ALL statement during the current session Upon logon, when the default role of the user was set to ALL through a CREATE USER or MODIFY USER statement
Having access to the privileges in all roles is useful when validating access rights. To learn more about using profiles to ensure password security, see “Encryption” on page 17-9.
18 – 6
Introduction to Teradata Warehouse
Accounting
Accounting This section describes the accounting options available for the Teradata Database. Among the areas covered are: • • •
Session management Account usage Account performance groups
Session Management Users must log on to the Teradata Database and establish a session before they can do any accounting.
Establishing a Session To establish a session, the user logs on to the database. The procedure varies depending on the client system, the operating system, and whether the user is an application program, or a user in an interactive terminal session using BTEQ or a third-party query processing product.
Logon Operands The logon string can include any of the following operands: • • • •
Optional identifier for the database, called a tdpid User name Password Optional account number
Note: In the Windows environment, the Single Sign On feature negates the need for users to enter usernames, passwords, and account ids after they have logged on using their authorized user names and passwords. For more information about this feature, see “SQL Used to Control Logon” on page 17-12.
Session Requests A session is established after the database accepts the user name, password, and account number and returns a session number to the process.
Introduction to Teradata Warehouse
18 – 7
Accounting
Subsequent Teradata SQL requests generated by the user and responses returned from the database are identified by: • • •
Host id Session number Request number
The database supplies the identification automatically for its own use. The user is unaware that it exists. The context for the session also includes a default database name that is the same as the user name. When the session ends, the system discards the context and accepts no further Teradata SQL statements from the user.
Account String Expansion The principal Teradata Database feature for accounting is the optional Account String Expansion (ASE) capability. You must modify user logon strings in order to use ASE. To enable ASE, you establish one or more account identifiers for new users when the users are created or modified. When the users log on, they must supply an account identifier as a part of the logon string. The users may enter the identifier explicitly, or the system will supply an identifier by default. Each time the system determines that a new account string is in effect, it begins collecting new AMP usage and I/O statistics. The system stores the accumulated statistics for a user/account string pair as a row in DBC.AMPUsage table in the Data Dictionary. Each user/account string pair results in a new set of statistics and an additional row. You can use this information in capacity planning or in chargeback and accounting software. At the finest granularity, ASE can generate a summary row for each SQL request. You can also direct ASE to generate a row for each user, each session, or for an aggregation of the daily activity for a user. ASE permits you to use substitution variables to include date and time information in the account id portion of a user logon string. The system inserts actual values for the variables at Teradata SQL execution time.
Account Performance Groups Resource partitions divide system resources for allocation to major user groups. Each session is assigned, either explicitly or implicitly, to a performance group, and each performance group is assigned a proportional resource weight. This allows administrators to control resources of the group rather than individual users based either on time of day or resource consumption.
18 – 8
Introduction to Teradata Warehouse
Accounting
The Priority Scheduler is used to manage the workload based on the relative priority of the resource weight of each group. This weight does not guarantee system responsiveness in a corresponding proportion because responsiveness is a function of overall system activity. When an account id prefixed with a group code is provided in a LOGON string, the session is assigned to the associated performance group when the logon is successful. If this form of account id is not present, the session is assigned a default value that corresponds to a medium priority for the default performance group.
Introduction to Teradata Warehouse
18 – 9
Maintenance Utilities
Maintenance Utilities A large number of utilities are available to perform maintenance functions on the Teradata Database. Most, but not all, utilities are invoked from the Database Window (DBW). The following table lists the Teradata Database utilities. The utility …
Allows you to …
Abort Host
abort all outstanding transactions running on a failed host until the system restarts the host.
CheckTable
check for inconsistencies between internal data structures, such as table headers, row identifiers, and secondary indexes.
ampload
display the load on all AMP vprocs in a system, including the number of: •
Available AMP worker tasks (AWTs)
•
Waiting messages waiting (message queue length)
cnsrun
start and run a database utility from a script.
Configuration
define AMPs, PEs, and hosts and their interrelationships for a Teradata Database.
ctl
display and modify the fields of the Parallel Data Extensions (PDE) Control Parameters Globally Distributed Objects (GDOs). Note: ctl is a Windows 2000 utility.
Database Initialization Program (DIP)
execute one or more of the standard DIP Structured Query Language (SQL) scripts packaged with Teradata Database.
DBS Control
interactively display and modify the DBS Control Record fields.
Dump Unload/ Load (DUL)
save or restore system dump tables onto tape.
Ferret
do the following: •
Define the scope of an action, such as a range of selected tables or vprocs
•
Display the parameters and scope of the action
•
Perform the action by either: Moving data to reconfigure data blocks and cylinders Or, displaying disk space and cylinder free space percent in use of the defined scope
Filer
18 – 10
find and correct problems within the Teradata File System.
Introduction to Teradata Warehouse
Maintenance Utilities The utility …
Allows you to …
fsgwizard
manipulate Teradata Database file segments that have been placed in an errored state. Note: This is a UNIX MP-RAS utility.
Gateway Control
modify default values in the fields of the Gateway Control Globally Distributed Object (GDO).
Gateway Global
monitor and control the Teradata network-connected users and their sessions.
Lock Display
view a snapshot capture of all real-time database locks and their associated currently running sessions.
Locking Logger
log the following: •
Transaction identifiers
•
Session identifiers
•
Lock object identifiers
•
Lock levels associated with executing SQL statements.
modmpplist
modify the node list file (mpplist).
Priority Scheduler
prioritize process scheduling.
Query Configuration
report the current Teradata Database configuration, including the node, AMP, and PE identification and status.
Query Session
monitor the state of all or selected Teradata Database sessions on all or selected logical host ids.
Reconfiguration
use the component definition created by Configuration to establish an operational Teradata Database.
Reconfiguration Estimator
estimate an elapsed time for reconfiguration based upon the number and size of tables on your current system and provides estimates for the following phases:
Processes have an externally assigned priority associated with their Teradata Database session. Priority Scheduler uses the priority to allocate CPU and I/O resources.
•
Redistribution
•
Deletion
•
Nonunique secondary index (NUSI) building
Recovery Manager
display information used to monitor progress of a Teradata Database recovery.
Resource Check Tools
do the following: •
Identify a slow-down or hang of the Teradata Database
•
Display system statistics that could lead to the cause of the slow down or hang
Introduction to Teradata Warehouse
18 – 11
Maintenance Utilities The utility …
Allows you to …
RSSmon
do the following: •
Display PDE real-time resource usage per node
•
Select relevant data fields from a specific Resource Sampling Subsystem (RSS) table to be examined for PDE resource usage monitoring purposes.
Note: This is a UNIX MP-RAS utility. Showlocks
display locks placed by Archive and Recovery and Table Rebuild operations on databases and tables.
System Initializer
do the following: •
Initialize the Teradata Database. Create
•
Update the DBS Control Record and other Globally Distributed Objects (GDOs)
•
Initialize or update configuration maps
Set hash function value in the DBS Control Record. Table Rebuild
rebuild tables that the Teradata Database cannot automatically recover, including the following: •
Primary or fallback portion of a table
•
An entire table
•
All tables in a database
•
All tables in an Access Module Processor (AMP)
Table Rebuild can be run as an interactive or a background task. tdlocaledef
convert the Source Specification for Data Formatting (SDF) into an internal form usable byTeradata Database.
tdnstat
do the following: •
Perform a GetStat/ResetStat operations
•
View, get, or clear the Teradata Network Services specific statistics.
tdntune
perform a read/write of tdn tunables. You can use the interface to view, get, or update the Teradata Network Services, which are specific to tunable parameters.
Teradata MultiTool
use a Windows Graphical User Interface (GUI) to run commandline-based Teradata Database and PDE tasks. Note: This is a Windows 2000 utility.
TPCCONS
18 – 12
perform the following 2PC-related functions: •
Display a list of coordinators that have in-doubt transactions
•
Display a list of sessions that have in-doubt transactions
•
Resolve in-doubt transactions
Introduction to Teradata Warehouse
Maintenance Utilities The utility …
Allows you to …
tsklist
display information about PDE processes and their tasks. Note: This is a Windows 2000 utility.
Update DBC
recalculate the PermSpace and SpoolSpace values in the DBASE table for the user DBC and the MaxPermSpace and MaxSpoolSpace values of the DATABASESPACE table for all databases based on the values in the DBASE table.
Update Space
recalculate the permanent, temporary, or spool space used by a single database or by all databases in a system.
vpacd
improve the performance of systems with several CPUs and a high level of concurrency. Note: This is an NCR UNIX MP-RAS utility.
Vproc Manager
manage the virtual processors (vprocs), such as obtain status of all or some vprocs, initialize vprocs, force a vproc restart, and force a Teradata Database restart.
xctl
display and modify the fields of the Parallel Database Extensions (PDE) Control Parameters Globally Distributed Objects (GDOs). Note: This is a UNIX MP-RAS utility.
xmppconfig
manipulate the contents of the node table file, which contains a list of nodes and their configurations. The system configuration information is provided to the Procedural Management Subsystem (PROC) of PDE. Note: This is a UNIX MP-RAS utility.
xperfstate
display real-time performance data for a PDE system, including system-wide CPU utilization, system-wide disk utilization, and more. Note: This is a UNIX MP-RAS utility.
xpsh
use a GUI front-end for performing various system-level tasks in an MPP system environment, such as debugging, analyzing, monitoring, sysadmins, and so forth. Note: This is a UNIX MP-RAS utility.
Introduction to Teradata Warehouse
18 – 13
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…
THEN see…
Accounting
Database Administration
Archive and recovery utilities
Teradata Archive/Recovery Utility Reference
CREATE DATABASE statement
SQL Reference: Data Definition Statements
Maintenance utilities
Utilities
Roles and Profiles for Users
Database Design SQL Reference: Data Definition Statements SQL Reference: Functions and Operators SQL Reference: Fundamentals
Space Allocation for Databases and Users
18 – 14
Database Administration
Introduction to Teradata Warehouse
Chapter 19:
System Monitoring This chapter discusses various aspects of monitoring the Teradata Database, including the monitoring tools used to track system and performance issues. Topics include: • • • •
Teradata Manager System and configuration status through the Database Window Resource usage monitoring Performance monitoring
Introduction to Teradata Warehouse
19 – 1
Teradata Manager
Teradata Manager The Teradata Manager is suite of management tools and applications available in Teradata Tools and Utilities. You can use them to monitor, control, and administer one or more Teradata Database servers. The suite of performance monitoring applications collects, queries, manipulates, and displays performance and usage data. This information allows you to quickly identify and resolve resource usage abnormalities. Teradata Manager can displays dynamic and historical data in graphical and tabular formats. The client/server feature of Teradata Manager replicates performance data in the Teradata Database server for access by any number of clients. Because data is collected once, workload on the Teradata Database remains constant while the number of client applications varies. You can access information from a desktop, laptop, or the Wireless Palm VII.
19 – 2
Introduction to Teradata Warehouse
Teradata Manager
The information in the following table summarizes Teradata Manager control applications: Function
Performance applications
Application/Description
Teradata Performance Monitor (PMON) Provides seven functional areas for monitoring system activity: •
Configuration summary
•
Performance summary
•
Resource usage (both physical and virtual)
•
Session and lock information
•
Session history
•
Control functions
•
Graphic displays of resource and session data
Teradata Priority Scheduler Administrator •
Provides administrative capabilities for Teradata Priority Scheduler.
•
Prevents bottlenecks and speeds responses to queries by automatically balancing the database workload.
•
Ensures that queries requiring immediate handling are given priority treatment by letting the jobs cut in line ahead of lower priority work.
Centralized Alerts/Event Management Facilitates the monitoring of performance characteristics and faults. It can automatically send a page or an e-mail when certain events occur. Alert Policy Editor Allows you to define actions and specify when action should be taken based on thresholds that you set for the following:
Introduction to Teradata Warehouse
•
Teradata Database performance metrics
•
Database space utilization,
•
Messages in the database Event Log.
19 – 3
Teradata Manager Function
Performance applications (continued)
Application/Description
The Alert Viewer Allows you to easily view system status for multiple systems. Trend Analysis Allows you to study Teradata Database resource utilization trends from summarized reports displayed as charts. You can do the following:
Database management applications
•
Detect resource usage abnormalities
•
Determine the onset of a problem
•
Analyze the impact of the problem on the system
Teradata Administrator Allows you to perform database administration tasks, such as: •
CREATE, MODIFY and DROP users or databases
•
CREATE tables (using ANSI- or Teradata-mode syntax)
•
GRANT or REVOKE access/monitor rights
•
COPY table, view, or macro definitions to another database or to another system
•
DROP or RENAME tables, views, or macros
•
Move space from one database to another
•
Run an SQL query
•
Display information about a database
•
Display information about a table, view, or macro
Space Usage Monitors disk space utilization and re-allocates permanent space from one database to another. System Maintenance Provides various macros for performing clean-up of system tables.
19 – 4
Introduction to Teradata Warehouse
Teradata Manager Function
Operational control
Application/Description
Session Information Monitors the status of sessions on Teradata. The status information includes: •
Idle
•
Active
•
Blocked
•
Responding
•
Parsing
•
Aborting
•
Details
•
Prolonged idles
Remote Console Allows you to run many of the Teradata console utilities from the Teradata Manager PC. Error Log Analyzer Provides an interface to view the error logs for an associated Teradata Database. LogOnOff Usage Presents daily, weekly, and monthly logon statistics. BTEQ Window (BTEQWIN) Provides a graphical Windows-type interface to BTEQ. Gives Teradata Manager applications a consistent, graphical interface. Access management
Allow you to manage security access to the database using the features of Teradata Administrator and Profile capabilities. Teradata Administrator establishes account and privilege assignments that control access to the Teradata Database. Profile capabilities allow you to create user profiles that define who can access certain Teradata Database and Teradata Manager applications.
Introduction to Teradata Warehouse
19 – 5
System and Configuration Status
System and Configuration Status The Database Window (DBW) is the primary vehicle for starting and controlling the operation of the Teradata Database utilities, and runs in a graphical X Window or Microsoft Windows environment. The DBW communicates with the Teradata Database through the console subsystem (CNS), which is part of the Parallel Database Extensions (PDE) software. By definition, the Teradata Database is always in one of several states. You can monitor these states from the Database Window (DBW). The following table lists and describes the states: Status
Offline
Description
Either the processor to which the database console is attached or the entire database has been started offline. The database cannot be accessed from a client or used for processing.
19 – 6
Startup
The system is starting up but is not ready to process requests.
Logoff
No new sessions may log on (logons are disabled), but one or more sessions remain logged on.
Logoff/Quiet
No new sessions may log on, and no sessions are currently logged on. The system is quiescent.
Logon
New sessions may log on (logons are enabled) and one or more sessions are currently logged on.
Logon/Quiet
New sessions may log on (logons are enabled), but no sessions are logged on.
Reconfig
The reconfiguration program is running.
Introduction to Teradata Warehouse
Resource Usage Monitoring
Resource Usage Monitoring The Teradata Database has facilities that permit you to monitor the use of resources such as: • • • •
CPUs AMPs Disk activity BYNET activity
Resource usage, or ResUsage, is the collection and reporting of statistical information about these resources. You can use resource usage data to: • Measure system benchmarks • Measure component performance • Assist with on-site job scheduling • Identify potential performance impacts • Plan installation, upgrade, and migration • Analyze performance degradation and improvement • Identify problems such as bottlenecks and parallel inefficiencies
Resource Usage Tables and Views Resource usage data is stored in Teradata Database tables and views in the DBC database. Macros installed with Teradata Database generate reports that display the data. You can also write your own queries or macros on resource usage data. As with other database data, you can access resource usage data using SQL. You need to decide which kinds of resource usage data you want to collect and the level of detail you want it to cover.
Resource Usage Data Categories Each row of resource usage data contains two broad categories of information: • •
Housekeeping, containing identifying information Statistical
Each item of statistical data falls into a defined kind and class. Each kind corresponds to one (or several) different things that may be measured about a resource.
Introduction to Teradata Warehouse
19 – 7
Resource Usage Monitoring
Resource Usage Data Handling Resource usage data handling is divided into two phases: Stage
Action
1
Various subsystems gather resource usage data and the Resource Sampling Subsystem (RSS) collects the data in collect buffers.
2
The collected data is logged to ResUsage tables periodically (as determined by user-defined logging intervals).
The logged resource usage data is then available for analysis by the various ResUsage macros.
Resource Usage Macros The facilities for analyzing resource usage data are provided by means of a set of ResUsage macros tailored to retrieving information from a set of system views designed to collect and present resource usage information.
How to Control Collection and Logging of Resource Usage Data Several mechanisms exist within the Teradata Database for setting the collection and logging rates of resource usage data. The control sets allow users to do any of the following: • • • •
Specify data collection rate Specify data logging rate Enable or disable ResUsage data logging on a table-by-table basis Enable or disable summarization of the data
Collection rates control the frequency that resource usage data is made available to applications. Logging rates control the frequency that resource usage data is logged to the ResUsage tables. You can specify data collection without specifying logging. This capability saves space in system tables while making resource usage data available to applications, such as Teradata Performance Monitor. You can use the Database Window (DBW) command SET LOG to establish the logging of resource usage information. The system inserts data into ResUsage tables every logging period for the tables that have logging enabled. You can use the statistics collected in the ResUsage tables to analyze system bottlenecks, determine excessive swapping, and detect system load imbalances.
19 – 8
Introduction to Teradata Warehouse
Resource Usage Monitoring
Summary Mode You can activate summarization mode for many ResUsage tables independently. This mode reduces database I/O by summarizing data from multiple vprocs and other objects on each node in one representative row. The summarization reduces detail, but the data is very useful for exploratory analysis of performance problems and general resource usage issues. When the summarization mode is active, the different classes of data are summarized as follows: • •
The cnt and cur fields contain the sum of all the summarized values they represent. The max fields contain the maximum of all the summarized values they represent.
Introduction to Teradata Warehouse
19 – 9
Performance Monitoring
Performance Monitoring Several facilities exist for monitoring and controlling system performance. This section briefly discusses many of these facilities.
The TDPTMON The Teradata Director Program (TDP) User Transaction Monitor (TDPTMON) is a client routine that enables a system programmer to write code to track TDP elapsed time statistics.
System Management Facility The System Management Facility (SMF) is available in the Multiple Virtual Storage (MVS) environment only. This facility collects data about Teradata Database performance, accounting, and usage. Data is grouped into the following categories: • • •
Session information Security violations PE stops
The Performance Monitor/Application Interface The Performance Monitor/Application Programming Interface (PM/API) provides hooks into the Performance Monitor and Production Control (PM and PC) functions resident within the Teradata Database. PM and PC data is available through a log-on partition called MONITOR using a specialized PM/API subset of the Call-Level Interface version 2 (CLIv2) routines. The PM/API uses the Resource Sampling System (RSS) to collect performance data, and set data sampling and logging rates. Collected data is stored in memory buffers, and is available to the PM/API with little or no performance impact. Using PM/API commands, you can collect performance data on: • • • •
Current system configuration, status, and utilization Resource usage and status of an individual AMP, PE, or node Resource usage and status of individual sessions Problem SQL requests
PM/API data may be used to show how efficiently the Teradata Database is using its resources, to identify problem sessions and users, and to abort sessions and users having a negative impact on system performance.
19 – 10
Introduction to Teradata Warehouse
For More Information
For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books. IF you want to learn more about…
THEN see…
Controlling operation of Teradata Database using Database Window
Database Window
Performance Monitor/Application Interface
PM/API Reference
Priority Scheduler
Utilities Teradata Manager User Guide
Resource Usage
Resource Usage Macros and Tables
Teradata Performance Monitor
PM/API Reference Teradata Manager User Guide
Teradata Manager
Introduction to Teradata Warehouse
Teradata Manager User Guide
19 – 11
For More Information
19 – 12
Introduction to Teradata Warehouse
Index Numerics 1NF, first normal form 12–5 2NF, second normal form 12–5 2PL 15–14 3NF, third normal form 12–6 4NF, fourth normal form 12–7 5NF, fifth normal form 12–7
A Access lock 15–9 Access Processor Modules. See AMPs Account String Expansion. See ASE Accounting account performance groups 18–8 ASE 18–8 DBC.AMPUsage table 18–8 session management 18–7 Active session management Gateway Global utility 16–6 Query Configuration 16–5 Query Sessions 16–5 Administration Workstation. See AWS Aggregate join indexes 8–7 Alternate key, definition 12–3 AMPs clusters 3–10, 14–4 data distribution using hashing 8–14 data distribution using indexes 8–3 down AMP journal 14–6 down AMP recovery 15–13 functions 3–9 operation 3–13 SELECT statement processing 3–14 step processing 3–14 vproc migration 14–2 vprocs 3–8 ANSI mode transactions 15–5 Application development embedded SQL applications 6–3 explicit 6–2 implicit 6–2 platforms 6–4 Preprocessor2 6–4
Application development languages C 6–4 COBOL 6–4 PL/I 6–4 Architecture BYNET 3–2 cliques 3–6 disk arrays 3–5 hot standby nodes 3–7 MPPs 3–2 processor node 3–2 SMPs 3–2 TPA 2–3 vprocs 3–8 workstations 3–18 Archive utilities NetBackup 16–2 NetVault 16–2 Teradata Archive/Recovery 2–8, 14–7, 16–2 Teradata Tools and Utilities 2–13 ASE account string identifiers 18–8 accounting 18–8 logon string 18–8 Attachment methods channel 2–2 network 2–2 Audits addressing problems 17–15 identification of security hazards 17–15 AWS platform 3–18 purpose 3–18
B Basic Teradata Query Facility. See BTEQ Batch referential integrity constraint definition 12–11 level of enforcement 12–11 Battery backup 14–8 BCNF, Boyce-Codd normal form 12–7 Boardless BYNET 3–4 BTEQ attachment methods 16–22 capabilities 16–22 Teradata Tools and Utilities 2–8, 2–9
Introduction to Teradata Warehouse
Index –1
BYNET boardless 3–4 inter-network communication 3–2 multiple 14–8 purpose 3–2, 3–3
C C application development language 6–4 Teradata Tools and Utilities 2–10 Call Level Interface Version 2. See CLIv2 Candidate keys Boyce-Codd normal form 12–7 definition 12–3 fifth normal form 12–7 Channel-attached systems mainframe 2–2 multiple connections 14–8 supported operating systems 13–3 TDP 13–3 Checksums 3–17 Child table 12–8 Cliques disk arrays 3–6 hardware fault tolerance 14–9 purpose 3–6 vproc migration 14–9 CLIv2 channel-attached systems 13–3 definition 13–2 network-attached systems 13–5 PM/API 19–10 support for network data encryption 17–9 Teradata Tools and Utilities 2–8, 2–9 Clusters AMPs 3–10 fault tolerance 3–10 COBOL application development language 6–4 Teradata Tools and Utilities 2–10 Columns attributes 7–3 identity 8–15 Communications interfaces CLIv2 13–2 JDBC 13–7 MOSI 13–5
Index –2
MTDP 13–5 ODBC 13–7 TDP 13–3 WinCLI 13–7 Comparison of partitioned and non-partitioned primary indexes 8–5 Concurrency control definition 15–2 locks 15–7 transactions 15–4 using 2PL 15–14 Constraints and normal forms 12–2 definition 12–4 referential integrity 12–11 rules for referential integrity 12–12 table 7–4 Cursors definition 5–16 Preprocessor2 5–16 SQL statements related to 5–16 stored procedures 5–16 Customer Information Control System 2–8 Cylinder Read maximum default data block size 3–16 purpose 3–16
D Data access control explicit access rights 17–12 implicit rights 17–12 levels of 17–12 views 17–13 Data attributes purpose 5–6 summary of 5–7 Data communications communications interfaces 13–2 for Microsoft Windows and UNIX systems 13–7 Data connector 2–11 Data Control Language. See DCL Data Definition Language. See DDL Data Dictionary DBC.AMPUsage table 18–8 SQL statements related to 9–8 structure 9–6 views 9–6, 9–7 Data distribution hashing 8–14 indexes 8–2
Introduction to Teradata Warehouse
Data load and unload utilities data connector 2–11 Teradata FastExport 16–4 Teradata FastLoad 16–3 Teradata MultiLoad 16–3 Teradata TPump 16–4 Data management active sessions 16–5 archive utilities 16–2 Open Teradata Backup 16–2 system resources 16–7 Data Manipulation Language. See DML Data types ANSI-compliant 5–6 purpose 5–6 Teradata 5–6 Data warehouse active data warehouse 1–3 definition 1–2 Database level locks 15–8 Database object use count 16–20 Database Query Analysis Tools. See DBQAT Database Query Log. See DBQL Database Window. See DBW Databases creation 18–4 database object use count 16–20 DBQAT 16–12 DBQL 16–17 DBW 3–19 space allocation 18–2 DBQAT database object use count 16–20 DBQL 16–17 QCD 16–15 Query Capture Facility 16–15 Teradata Index Wizard 16–13 Teradata Visual Explain 16–16 TSET 16–19 DBQL query information 16–17 TLE support 16–17 user information 16–17 DBW supervisor window 16–11 Teradata MultiTool 16–11 use 3–19, 16–11, 18–10 DCL access control capabilities 5–4 statements 5–4
DDL data definition capabilities 5–3 statements 5–3 Deadlocks resolution 15–10 transaction rollback 15–10 Dependencies full functional 12–3 functional 12–3 multivalued 12–3 Determinant 12–3 DIP Teradata MultiTool 16–11 use 16–11 Disk arrays LUNs 3–5 pdisks 3–5 RAID 3–5 RAID1 14–8 vdisks 3–5 Disk I/O Integrity Checking checksums 3–17 purpose 3–16 SQL statements related to 3–17 Dispatcher operation 3–12 purpose 3–9 DML request processing capabilities 5–4 statements 5–5 Down AMP journal 14–6 recovery 15–13
E Embedded SQL applications 6–3 Encryption CLIv2 17–9 Gateway Control utility 17–9 logon 17–9 network data 17–9 Teradata Gateway 17–9 Exclusive HUT lock 15–11 Exclusive lock 15–9 EXPLAIN statement definition 6–10 use 6–10 Explicit access rights 17–12 Explicit application development 6–2
Introduction to Teradata Warehouse
Index –3
Extended language support. See International character support
F Fallback table 14–3 Fault tolerance clusters 3–10 hardware 14–8 software 14–2 Ferret 16–7 Foreign keys and referential integrity 12–8 and system integrity 17–3 definition 12–3 Full table scans, strengths and weaknesses 8–12 Functional dependencies definition 12–3 full functional 12–3 Functions aggregate 5–12 definition 5–12 ordered analytical 5–13 scalar 5–12 user-defined 5–14
G Gateway Global utility 16–6, 17–7 Gateway. See Teradata Gateway Generator, purpose 3–9 Global temporary tables 7–4 Group Read HUT lock 15–11
H Hardware fault tolerance battery backup 14–8 cliques 14–9 hot swap 14–9 multiple BYNETS 14–8 multiple channel and network connections 14–8 redundant power supplies and fans 14–8 server isolation 14–8 Hash indexes 8–9
Index –4
Hashing data distributing 8–14 primary indexes 8–14 secondary indexes 8–14 Host Utility Locks. See HUT locks Hot standby nodes definition 3–7 use 3–7 Hot swap components 14–9 definition 14–9 HUT lock types Exclusive 15–11 Group Read 15–11 Read 15–11 Write 15–11 HUT locks characteristics of 15–11 used by Teradata Archive/Recovery 15–11
I IBM IMS/DC 2–8 Identity column column attribute 8–15 unique row number generator 8–15 Implicit access rights 17–12 Implicit application development 6–2 Indexes comparison of primary and secondary 8–6 hash 8–9 join 8–7 primary 8–3 secondary 8–6 specification 8–10 SQL statements related to 8–10 strengths and weakness 8–11 types of 8–2 uses 8–2 International character set support client character sets 4–3 client character sets overview 4–2 data translation 4–3 diacritical marks 4–7 extended support 4–9 extended support overview 4–9 internal character sets 4–4 Japanese support 4–8 overview 4–1
Introduction to Teradata Warehouse
standard support 4–7 standard support for compatible languages 4–7 system dictionary data 4–4, 4–6
J Japanese language support. See International character set support, Japanese support JDBC driver 13–7 Teradata Tools and Utilities 2–10 Join dependency 12–4 Join indexes aggregate 8–7 covering and partially covering 8–7 multi-table 8–7 multi-table, partially covering 8–7 single table 8–7 sparse 8–8 strengths and weaknesses of types of join indexes 8–12 Joins and the SELECT statement 5–11 definition 12–3 Journals down AMP 14–6 permanent 14–6 transient 14–6
K Keys alternate 12–3 candidate 12–3, 12–7 foreign 12–3, 12–8, 17–3 parent 12–8 primary 8–3, 12–3, 12–8, 17–3
L Lock levels database 15–8 row hash 15–8 table 15–8
Lock types access 15–9 exclusive 15–9 read 15–9 write 15–9 Locks deadlocks 15–10 HUT 15–11 levels 15–8 types 15–9 Logical Units. See LUNs Logon ASE 18–8 logon string 17–5 logon string operands 17–6, 18–7 password security 17–10 sessions 18–7 Single Sign On 17–7 SQL statements related to logon control 17–12 Logon encryption 17–9 LUNs RAID 3–5 vprocs 3–5
M Macros definition 6–5, 11–5 processing 11–5 resource usage 19–8 single and multi-user 11–5 SQL statements related to 6–5, 6–6, 11–5 use 6–6 Mainframe utilities 2–8 Massively Parallel Processing. See MPPs Micro Operating System Interface. See MOSI Micro Teradata Director Program. See MTDP MOSI definition 13–5 network-attached systems 13–5 MPPs architecture 3–2 hardware platform 3–2 workstation connections 3–18 MTDP definition 13–5 interface 13–5 network-attached systems 13–5 Multi-table join indexes strengths and weaknesses 8–12 use 8–7
Introduction to Teradata Warehouse
Index –5
Multi-table, partially covering join indexes 8–7 Multivalued dependence 12–3
Optimizer purpose 3–9 SQL request implementation 3–12
N NetBackup 2–13, 16–2 NetVault 2–13, 16–2 Network data encryption, CLIv2 17–9 Network-attached systems MOSI 13–5 multiple connections 14–8 supported operating systems 13–5 Network-attached systems, LAN 2–2 Non-partitioned primary index 8–5 Non-unique primary index. See NUPI Non-unique secondary index. See NUSI Normal forms 1NF 12–5 2NF 12–5 3NF 12–6 4NF 12–7 5NF 12–7 BCNF 12–7 Boyce-Codd 12–7 definition 12–2 fifth 12–7 first 12–5 fourth 12–7 second 12–5 third 12–6 Normalization normal forms 12–2 purpose 12–2 NUPI, strengths and weaknesses 8–11 NUSI, strengths and weaknesses 8–12
O ODBC communications interface 13–7 Teradata Tools and Utilities 2–9 OLE DB provider 2–9 Open Teradata Backup for Windows clients 2–13, 16–2 NetBackup 2–13 NetVault 2–13 Teradata Tools and Utilities 2–13
P Parallel Data Extensions. See PDE Parallel Upgrade Tool. See PUT Parent key 12–8 Parent table 12–8 Parser PE element 3–9 purpose 3–9 request processing 3–11 Parsing Engines. See PEs Partitioned primary index 8–5 Passwords attributes 17–10 DBC.SysSecDefaults table 17–10 security 17–10 user-level attributes 17–11 PDE MPP system enabling 3–15 task management with Teradata MultiTool 16–11 TPA and non-TPA 3–15 vprocs 3–15 pdisks 3–5 PE elements dispatcher 3–9 generator 3–9 optimizer 3–9 parser 3–9 session control 3–9 Performance Monitor/Application Programming Interface. See PM/API Performance monitoring. See System performance monitoring Permanent journals 14–6 PEs migration 3–6 purpose 3–8 request processing 3–11 SELECT statement processing 3–14 session control 3–8 vproc migration 14–2 vprocs 3–8 Phases of 2PL 15–4 PL/I application development language 6–4 Teradata Tools and Utilities
2–10
Index –6
Introduction to Teradata Warehouse
PM/API and resource usage 19–10 CLIv2 19–10 performance monitoring 19–10 third-party software support 6–13 Policies elements 17–14 security 17–14 Preprocessor2 application development 6–4, 16–25 C 2–10 COBOL 2–10 cursors 5–16 PL/I 2–10 Teradata Tools and Utilities 2–9 Primary indexes comparison of partitioned and non-partitioned 8–5 comparison with secondary 8–6 data distribution to AMPs 8–3 partitioned and non-partitioned 8–5 relationship with primary keys 8–3 unique and non-unique 8–3 Primary keys and system integrity 17–3 definition 12–3 first normal form 12–5 relationship with primary indexes 8–3 second normal form 12–5 third normal form 12–6 with respect to referential integrity 8–4, 12–8 Priority Scheduler account performance groups 18–8 Priority Scheduler Administrator 16–8, 19–3 resource management 16–7 Priority Scheduler Administrator Priority Scheduler 16–8 Teradata Manager 16–8, 19–3 Processor node 3–2 PUT and installation 2–6 operational modes 2–6
Q QCD applications 16–15 schema 16–15 Teradata Index Wizard 16–15 Teradata Visual Explain 16–16
Queries BTEQ 16–22 configuration 16–5 management 16–9 Preprocessor2 16–25 sessions 16–5 strategic 1–3 tactical 1–3 Teradata SQL Assistant 16–23 Query Capture Database. See QCD Query Configuration 16–5 Query facilities BTEQ 16–22 Preprocessor2 16–25 Teradata SQL Assistant 16–23 Query management 16–9 Query Sessions 16–5
R RAID LUNs 3–5 RAID1 14–8 storage technology 3–5 vdisks 3–5 Read HUT lock 15–11 Read lock 15–9 Recovery definition 15–3 down AMP 15–13 system and media 15–12 transaction 15–12 Referenced table (parent) 12–9 Referencing table (child) 12–9 Referential constraints checks 12–13 definition 12–11 level of enforcement 12–11 Referential integrity and system integrity 17–3 batch referential integrity constraint 12–11 benefits of 12–10 implementation 12–8 referencing and referenced tables 12–9 referential constraint 12–11 referential integrity constraints 12–11 rules 12–12 terms 12–8 Referential integrity constraints level of enforcement 12–11 types of 12–11
Introduction to Teradata Warehouse
Index –7
Referential integrity terminology child table 12–8 foreign key 12–8 parent key 12–8 parent table 12–8 primary key 12–8 Relational database terminology alternate key 12–3 candidate key 12–3 constraint 12–4 determinant 12–3 foreign key 12–3 functional dependencies 12–3 join dependency 12–4 joins 12–3 multivalued dependence 12–3 primary key 12–3 transitive dependence 12–3 Relational databases definition 7–3 relational model 7–2 set theory terminology 7–3 Relational model and relational databases 7–2 and theory of sets 7–2 Request processing 3–11 Request scheduling 16–9 Resource access control client identifiers 17–5 logon policies 17–5 Single Sign On 17–7 user identifiers 17–5 Resource usage categories of data 19–7 collection rate control 19–8 definition 19–7 macros 19–8 monitoring 19–7 summary mode 19–9 tables and views 19–7 Roles and profiles definition 18–6 use 18–6 Row hash locks 15–8 Rows row hash locks 15–8 tuples 7–3
Index –8
S Secondary indexes comparison with primary index 8–6 subtables 8–6 unique and non-unique 8–6 Security audits and accountability 17–15 categories 17–4 data access control 17–12 DBC.SysSecDefaults table 17–10 definition 17–2 logon encryption 17–9 network data encryption 17–9 passwords 17–10 policies 17–14 policy considerations 17–14 policy elements 17–14 resource access control 17–5 SQL statements related to logon control 17–12 Teradata Gateway 17–9 TPD 17–6 Security policies 17–14 considerations 17–14 elements 17–14 SELECT statement and joins 5–11 cursor declaration 5–16 options 5–10 processing 3–13 request data 5–10 set operators 5–10 Session control PE 3–8 purpose 3–9 Sessions how to establish 18–7 logon 18–7 management 18–7 Set operators and the SELECT statement 5–10 Set theory and relational databases 7–3 and the relational model 7–2 Set theory terminology attribute 7–3 relation 7–3 tuple 7–3 Shared Information Architecture 2–4, 17–12 Single Sign On Gateway Global utility 17–7 logon control 17–7
Introduction to Teradata Warehouse
Single-table join indexes strengths and weaknesses 8–12 use 8–7 SMPs architecture 3–2 boardless BYNET 3–4 hardware platform 3–2 workstation connections 3–18 Software fault tolerance AMP clusters 14–4 fallback tables 14–3 Table Rebuild utility 14–7 Teradata Archive/Recovery utility 14–7 vproc migration 14–2 Space allocation databases 18–2 users 18–2 Sparse join indexes strengths and weaknesses 8–13 use 8–8 SQL advantages of 5–2 aggregate function 5–12 cursors 5–16 data types 5–6 EXPLAIN 6–10 ordered analytical function 5–13 scalar function 5–12 SELECT statement 5–10 SELECT statement processing 3–13 statement components 5–9 statement execution 5–9 statement punctuation 5–8 statements related to Data Dictionary 9–8 statements related to disk I/O integrity checking 3–17 statements related to indexes 8–10 statements related to logon control 17–12 statements related to macros 6–5, 6–6, 11–5 statements related to stored procedures 6–7, 6–9 statements related to transactions 15–5, 15–6 statements related to triggers 11–7 statements related to UDFs 5–15 statements related to views 11–2 subordinate languages 5–3 user-functions 5–14 SQL functional families DCL 5–4 DDL 5–3 DML 5–4 Standard language support. See International character set support, standard support Strategic queries 1–3
Subtables in secondary indexes 8–6 Supported operating systems channel-attached systems 13–3 network-attached systems 13–5 Symmetric Multi-Processing. See SMPs System administration accounting 18–7 database creation 18–4 maintenance 18–10 performance monitoring 19–10 roles and profiles 18–6 space allocation 18–2 user creation 18–4 System console DBW 3–19 platform 3–18 purpose 3–18 System integrity and referential integrity 17–3 and tables 17–3 definition 17–2 System Management Facility 19–10 System performance monitoring performance monitoring 19–10 PM/API 19–10 resource usage 19–7 system management facility 19–10 system status 19–6 TDPTMON 19–10 Teradata Manager 19–2 Teradata Performance Monitor 19–3 System resource management Ferret utility 16–7 Priority Scheduler 16–7 Teradata DQM 16–9 Teradata MultiTool 16–11 System status configuration 19–6 states 19–6
T Table level locks 15–8 Table Rebuild utility 14–7 Tables and system integrity 17–3 child 12–8 constraints 7–4 DBC.AMPUsage 18–8 DBC.SysSecDefaults 17–10 fallback 14–3
Introduction to Teradata Warehouse
Index –9
global temporary 7–4 locks 15–8 parent 12–8 permanent 7–4 referenced (parent) 17–3 referenced table (parent) 12–9 referencing (child) 12–9, 17–3 relations 7–3 resource usage 19–7 temporary 7–4 volatile temporary 7–5 Tactical queries 1–3 Target Level Emulation. See TLE TDP channel-attached systems 13–3 definition 13–3 functions 13–3 Teradata Tools and Utilities 2–8 TDPTMON 19–10 Temporary tables global 7–4 volatile 7–5 Teradata Administrator database administration 2–9, 19–4 Teradata Utility Pack 2–9 Teradata Archive/Recovery utility HUT locks 15–11 software fault tolerance 14–7 use 16–2 Teradata Database ANSI transaction semantics 15–4 ANSI-compliant data types 5–6 architecture 3–1 CLIv2 13–4 communications interfaces 13–2 methods of attachment 2–2, 13–2 purpose 2–3 PUT installation software 2–6 referential integrity 12–8 shared information architecture 2–4, 17–12 status 19–6 Teradata Gateway 2–5 Teradata mode transactions 15–6 third-party 6–13 transaction semantics 15–4 Teradata Director Program. See TDP Teradata DQM managing access 16–9 query management 16–9 request scheduling 16–9 Teradata Tools and Utilities 2–12
Teradata Dynamic Query Manager. See Teradata DQM Teradata FastExport, data export 2–11, 16–4 Teradata FastLoad, client/server load utility 2–11,
16–3 Teradata file system Cylinder Read 3–16 disk I/O integrity checking 3–16 purpose 3–16 Teradata Gateway encryption 17–9 security 17–9 server software 2–5 Teradata Index Wizard and Teradata Visual Explain 16–13 demographics 16–14 QCD 16–15 Teradata Tools and Utilities 2–12 use 16–13 Teradata Manager alerts/events management 19–3 Priority Scheduler Administrator 19–3 system monitoring 19–2 Teradata Administrator 2–9, 19–4 Teradata Performance Monitor 19–3 Teradata Statistics Wizard 16–8 Teradata Tools and Utilities 2–12 Teradata mode transactions 15–6 Teradata MultiLoad, client/server load utility 2–11,
16–3 Teradata MultiTool DIP 16–11 PDE tasks 16–11 Teradata Tools and Utilities 2–10 use 16–11 vproc manager 16–11 Teradata Performance Monitor functions 19–3 system performance monitoring 19–3 Teradata Tools and Utilities Teradata Tools and Utilities 2–12 Teradata SQL data types 5–6 non-ANSI compliant development 2–2 see also SQL Teradata SQL Assistant on Windows PC 16–23 Teradata Tools and Utilities 2–10 use 16–23 Teradata Statistics Wizard statistics collection 16–8 Teradata Tools and Utilities
2–12
Index –10
Introduction to Teradata Warehouse
Teradata stored procedures benefits 11–3 cursors 5–16 definition 11–3 elements 11–4 SQL statements related to 6–7, 6–9 use 11–3 Teradata System Emulation Tool. See TSET Teradata Tools and Utilities BTEQ 2–8, 2–9 C 2–10 C preprocessor 2–10 CICS 2–8 CLIV2 2–9 CLIv2 2–8 COBOL preprocessor 2–10 data connector 2–11 Host Utility Console 2–8 IBM IMS/DC 2–8 JDBC 2–10 mainframe 2–8 ODBC 2–9 OLE DB provider 2–9 Open Teradata Backup 2–13 PL/I 2–10 Preprocessor2 2–9 TDP 2–8 Teradata Administrator 2–9 Teradata Archive/Recovery 2–8, 2–13, 14–7,
16–2 Teradata DQM 2–12 Teradata FastExport 2–11 Teradata FastLoad 2–11 Teradata Index Wizard 2–12 Teradata Manager 2–12 Teradata MultiLoad 2–11 Teradata MultiTool 2–10 Teradata Performance Monitor 2–12 Teradata SQL Assistant 2–10 Teradata Statistics Wizard 2–12 Teradata Tools and Utilities Access Modules 2–11 Teradata TPump 2–11 Teradata Utility Pack 2–9 Teradata Visual Explain 2–13 Teradata Warehouse Builder 2–11 TS/API 2–9 TSET 2–13 Teradata Tools and Utilities Access Modules 2–11 Teradata TPump continuous data load utility 2–11 data load utility 16–4 Teradata Utility Pack 2–9
Teradata Visual Explain and Teradata Index Wizard 16–13 comparison of execution plans 16–16 QCD 16–16 Teradata Tools and Utilities
2–13 Teradata Warehouse Builder 2–11 Third-party software compatible 6–13 PM/API 6–13 TS/API products 6–13 UDFs 5–14 TLE and TSET 16–18 supported on server 16–18 use 16–18 TPA, services 3–15 TPD security 17–6 tdpids 17–6 Transactions 2PL 15–4 ANSI mode 15–5 control using 2PL 15–4 deadlock resolution 15–10 definition 15–4 recovery 15–12 rollback in ANSI mode 15–5 rollback Teradata mode 15–6 semantics 15–4 SQL statements related to 15–5, 15–6 Teradata mode 15–6 Transient journals 14–6 Transitive dependence 12–3 Triggers definition 11–6 restrictions 11–10 SQL statements related to 11–7 use 11–7 Trusted Parallel Application. See TPA TS/API Teradata Tools and Utilities 2–9 third-party product 6–13 TSET and TLE 16–19 supported on client 16–19 Teradata Tools and Utilities
2–13 use 16–19 Two-Phase Locking. See 2PL
Introduction to Teradata Warehouse
Index –11
U UDFs creation 5–14 SQL statements related to 5–15 third-party 5–14 Unique primary index. See UPI Unique secondary index. See USI UPI primary index characteristics 8–3 strengths and weaknesses 8–11 User-Defined Functions. See UDFs Users account string identifiers 18–8 creation 18–4 space allocation 18–2 USI characteristics 8–6 strengths and weaknesses 8–11 Utilities Ferret 16–7 Gateway Control 17–9 Gateway Global 16–6 Open Teradata Backup 16–2 overview 18–10 Priority Scheduler 16–7 Teradata Archive/Recovery 16–2 Teradata FastExport 2–11, 16–4 Teradata FastLoad 2–11, 16–3 Teradata MultiLoad 2–11, 16–3 Teradata MultiTool 16–11 Teradata Statistics Wizard 16–8 Teradata TPump 16–4
Vprocs AMPs 3–8 definition 3–8 functionality 3–8 LUNs 3–5 maximum per system 3–8 PDE 3–15 PEs 3–8 types of 3–8 vproc manager in Teradata MultiTool 16–11 vproc migration 14–2
W WinCLI 13–7 Workstations AWS 3–18 PC with X Windows 3–18 platform specific 3–18 system console 3–18 UNIX 3–18 Write HUT lock 15–11 Write lock 15–9
V vdisks 3–5 Views data control access 17–13 definition 11–2 in Data Dictionary 9–6 resource usage 19–7 restrictions 11–2 SQL statements related to 11–2 users 9–7 Virtual processors. See Vprocs Volatile temporary tables 7–5 Vproc migration cliques 14–9 hardware fault tolerance 14–9 software fault tolerance 14–2
Index –12
Introduction to Teradata Warehouse